Available via license: CC BY 4.0
Content may be subject to copyright.
SensPS: Sensing Personal Space Comfortable
Distance between Human-Human Using
Multimodal Sensors
Ko Watanabe1[0000−0003−0252−1785],
Nico Förster2[0009−0005−6312−3096] , and
Shoya Ishimaru3[0000−0002−5374−1510]
1German Research Center for Artifcial Intelligence(DFKI), Kaiserslautern, Germany
2RPTU Kaiserslautern-Landau, Kaiserslautern, Germany
3Osaka Metropolitan University, Osaka, Japan
Abstract. Personal space, also known as peripersonal space, is crucial
in human social interaction, influencing comfort, communication, and
social stress. Estimating and respecting personal space is essential for
enhancing human-computer interaction (HCI) and smart environments.
Personal space preferences vary due to individual traits, cultural back-
ground, and contextual factors. Advanced multimodal sensing technolo-
gies, including eye-tracking and wristband sensors, offer opportunities to
develop adaptive systems that dynamically adjust to user comfort levels.
Integrating physiological and behavioral data enables a deeper under-
standing of spatial interactions. This study aims to develop a sensor-
based model to estimate comfortable personal space and identify key
features influencing spatial preferences. Here we show that multimodal
sensors, particularly eye-tracking and physiological wristband data, can
effectively predict personal space preferences, with eye-tracking data
playing a more significant role. Our experimental study involving con-
trolled human interactions demonstrates, that the Transformer model
achieves the highest predictive accuracy (F1 score: 0.87) for estimating
personal space. Eye-tracking features, such as gaze point and pupil diam-
eter, emerge as the most significant predictors, while physiological signals
from wristband sensors contribute marginally. These findings highlight
the potential for AI-driven personalization of social space in adaptive
environments. Our results suggest that multimodal sensing can be lever-
aged to develop intelligent systems that optimize spatial arrangements
in workplaces, educational institutions, and public settings. Future work
should explore larger datasets, real-world applications, and additional
physiological markers to enhance model robustness.
Keywords: multimodal sensors ·personal space ·eye-tracking ·wrist-
band sensor ·machine learning ·deep learning
1 Introduction
Personal space [11, 14, 31], also known as Peripersonal space, is a fundamental
aspect of human social interaction, representing the comfortable distance indi-
arXiv:2502.07441v1 [cs.HC] 11 Feb 2025
2 Watanabe et al.
viduals maintain from others. It is a highly subjective concept shaped by cul-
tural background, personality traits, and situational contexts [2, 38]. Respecting
personal space is essential for fostering interpersonal comfort, reducing social
stress [33], and enhancing communication [43, 44], especially in collaborative
settings. Understanding and accommodating personal space preferences in HCI
and smart environments [15, 25] offer exciting opportunities to improve user ex-
periences and optimize social interactions through adaptive, intelligent systems.
This study explores the estimation of comfortable personal space using mul-
timodal sensor data to enable automated systems to maintain appropriate dis-
tances between individuals. By utilizing advanced sensing technologies, such as
eye-tracking glasses and wristband sensors, we aim to develop adaptive systems
that dynamically respond to users’ spatial preferences, enhancing comfort and
productivity in workplaces, educational institutions, and public spaces.
To achieve this, we conducted an experimental study involving pairs of partic-
ipants in controlled interaction scenarios. Data were collected on gaze patterns,
physiological indicators, and subjective comfort assessments. This approach com-
prehensively explains personal space dynamics by integrating objective sensor
measurements with subjective perceptions. Our research addresses two primary
questions:
RQ1: Can multimodal sensors estimate comfortable personal space?
RQ2: What is the best model for estimating comfortable personal space?
RQ3: What are the key features for estimating comfortable personal space?
By integrating physiological data with human-centered design principles, this
research contributes to the HCI field and paves the way for intelligent systems
that enhance human interactions in physical and virtual environments.
2 Related Work
This section reviews the related work of eye-tracking technologies, wristband
sensors, cognitive state estimation with multimodal sensors, and personal space
estimation.
2.1 Eye-Tracking Technologies
Research on eye-tracking has been carried out across various fields, including HCI
and psychology. Studies have demonstrated that eye-tracking can be utilized to
estimate factors such as confidence [5], personality [3], attention [41], and cogni-
tive load [46]. Additionally, eye-tracking is employed in diverse interaction tasks
like gaze-based typing [10, 26], menu navigation [23], and object selection [13].
The choice of eye-tracking device varies depending on the data type and task. Ex-
amples include PC-mounted eye-tracking [5, 16, 17], eye-tracking glasses [9, 28],
head-mounted displays [29], and webcam-based eye-tracking [4, 34, 35].
SensPS: Sensing Personal Space Using Multimodal Sensors 3
Each eye-tracking methodology has its distinct strengths and weaknesses [12,
36]. For example, PC-mounted eye-tracking systems are easy to set up and op-
erate, making them user-friendly, but they lack portability and are restricted
to a fixed location. Eye-tracking glasses provide portability and mobility but
require the user to wear a specialized device. Head-mounted displays also offer
portability but can be complex to set up. Webcam-based eye-tracking systems
are easy to set up and use, yet they are less accurate than other methods.
Considering these strengths and weaknesses, selecting the appropriate eye-
tracking device and task type is important when designing an eye-tracking sys-
tem. Our study focuses on estimating comfortable personal space, so we use
eye-tracking glasses. This choice is due to their portability, allowing partici-
pants to move freely, and higher accuracy and robustness than camera-based
eye-tracking.
2.2 Wristband Sensors
Wristband sensors are wearable devices that can measure physiological signals
such as heart rate [40], skin conductance [27], and galvanic skin response [6].
These sensors have been widely used in research and practical applications due
to their ability to provide continuous, real-time data in a non-invasive manner.
Studies have demonstrated that wristband sensors are used for various tasks,
such as estimating stress [27], concentration [40], workload [7], and mind wan-
dering [8]. Additionally, wristband sensors are used for interaction tasks, such as
object interaction [24], text entry [19], and step aware voice instructions [1].
This study uses a wristband sensor to measure physiological signals such as
heart rate and skin conductance. These features are significant for estimating
cognitive state, which is a key factor for estimating personal space.
2.3 Personal Space
Personal space, also known as peripersonal space, refers to the comfortable dis-
tance individuals maintain from others, which varies subjectively among indi-
viduals [20, 31].
Coello et al. [14] explored the concepts of Interpersonal Space (IPS). IPS
is the area individuals maintain between themselves and others during social
interactions. When this space is encroached upon, it often leads to discomfort,
prompting individuals to increase the distance to regain comfort. Our study
emphasizes PPS to investigate whether physiological signals can predict personal
space preferences.
Candini et al. [11] examined the direct link between physiological responses
and the regulation of interpersonal space. Their study employed an ecological
experimental setup where participants’ skin conductance response (SCR) was
measured as a confederate approached or withdrew from them at varying dis-
tances. The results showed a significant increase in SCR when participants were
4 Watanabe et al.
Empatica E4Pupil Core
Fig. 1: The devices used in this study: the Pupil Core eye-tracking glasses and
the Empatica E4 wristband sensor.
exposed to close interpersonal distances, especially during approaching move-
ments. Additionally, the study found a functional relationship between individ-
ual SCR reactivity and preferred IPS, indicating that autonomic responses play
a role in the perception and regulation of interpersonal boundaries. This research
highlights that autonomic arousal, often below conscious awareness, is vital in
shaping social space preferences.
Our study uses physiological signals to estimate personal space, focusing on
discomfort distance.
3 Methodology
In this section, we describe the device selection, data preprocessing, and machine
learning and deep learning models.
3.1 Device Selection
In this study, we employ the Pupil Core eye-tracking glasses 4and the Empatica
E4 wristband sensor 5. Figure 1 illustrates these devices.
The Pupil Core is a high-performance eye-tracking device renowned for its
accuracy and dependability in capturing gaze data. It features binocular eye
tracking with a sampling rate of up to 200 Hz, ensuring precise and real-time data
collection. It is equipped with high-resolution cameras (1920x1080 pixels) and a
90-degree field of view, which provides comprehensive eye movement tracking.
The Pupil Core supports 2D and 3D eye tracking, making it versatile for various
research applications. Weighing only 35 grams, it is lightweight and comfortable
for extended use. The adjustable headband ensures a secure fit across different
head sizes. The device is compatible with various software tools from Pupil Labs,
such as Pupil Capture and Pupil Player, facilitating data recording, visualization,
and analysis. It offers robust connectivity options, including USB and Wi-Fi,
allowing seamless integration with other devices and systems. This enables easy
real-time data transfer and processing, enhancing research workflow efficiency.
Overall, the Pupil Core’s advanced features and user-friendly design make it
4https://www.pupil-labs.com/products/pupil-core/
5https://www.empatica.com/en-int/research/e4/
SensPS: Sensing Personal Space Using Multimodal Sensors 5
an ideal choice for researchers seeking high-quality eye-tracking data in diverse
settings.
The Empatica E4 wristband is an advanced wearable device for real-time
physiological data collection. It has multiple sensors to measure various phys-
iological signals, including heart rate, electrodermal activity (EDA), skin tem-
perature, and motion via a 3-axis accelerometer. The E4 wristband includes
a photoplethysmography (PPG) sensor for heart rate monitoring and an EDA
sensor for skin conductance. These sensors provide high-resolution data with
sampling rates of up to 4 Hz for EDA and 64 Hz for PPG. This ensures precise
and detailed physiological measurements. The device is designed for comfort and
ease of use, featuring a lightweight and ergonomic design suitable for continuous
wear. It offers robust connectivity options, including Bluetooth, enabling seam-
less data transmission to connected devices. The E4 wristband is compatible
with the Empatica Research Portal, a cloud-based platform for data manage-
ment, visualization, and analysis. It allows researchers to access real-time data
and download raw data for further analysis.
3.2 Data Preprocessing
Data Cleaning: This section details the preprocessing steps for the eye-tracking
and wristband sensor data. First, we synchronized the data with the task du-
ration by trimming it to the relevant period. For each trial, we recorded the
start and end timestamps of the task. These timestamps were used to trim the
data from the Pupil Core and Empatica E4 sensors to match the task duration.
After trimming, we applied preprocessing to eliminate noise and artifacts. For
the eye-tracking data, we used Pupil Labs’ built-in tools for noise and artifact
removal. For the Empatica E4 sensor data, we removed NaN values and nor-
malized the dataset. We also discarded any incomplete or missing data resulting
from wireless disconnections.
Feature Extraction: We extracted features from the data across statistical,
temporal, and frequency domains. Features such as fixation, saccade, and blinks
were extracted for the eye-tracking data. For the Empatica E4 wristband data,
features included EDA, skin temperature, and accelerometer readings.
Sliding Window: A sliding window approach was employed to extract features
from the data. The features extracted are akin to those used in Discaas [45] and
Waistonbelt [30]. The window size was set to ten seconds, with a five-second
overlap. Features were extracted from each window segment. These features were
then used to train machine learning and deep learning models.
Labeling: In this study, we categorized comfortable distance into two classes:
comfort and discomfort. This was achieved by converting the Likert scale results
into binary labels based on a predefined threshold. Specifically, we classified data
6 Watanabe et al.
as comfort when the Likert score was ten and as discomfort when it was less than
ten. We aim to use the features extracted from the sliding window technique to
classify these binary labels accurately.
3.3 Machine Learning and Deep Learning Models
Machine Learning: For machine learning, we use support vector machine
(SVM), decision tree (DT), and random forest (RF) to perform binary classi-
fication. A support vector machine (SVM) is a supervised learning model that
finds the optimal hyperplane that maximizes the margin between the two classes.
Decision tree (DT) is a non-parametric supervised learning method for classifi-
cation and regression. It splits the data into subsets based on the value of input
features, creating a tree-like model of decisions. Random forest (RF) is an en-
semble learning method that constructs multiple decision trees during training
and outputs the mode of the classes for classification. It helps improve accuracy
and control overfitting. All three methods are applied to classify the data into
comfort and discomfort.
Deep Learning: VGG16 [37] is a popular neural network for image classifica-
tion, with 16 layers: 13 for convolution and 3 for fully connected tasks. It uses
small filters to capture image details and pooling layers to simplify data, making
it efficient and effective.
MobileNet [22] is a lightweight neural network for mobile, using fewer pa-
rameters to save resources. It includes a base model, pooling, and a dense layer
for binary classification, optimized with Adam and binary cross-entropy loss.
MobileNetV2 [32] and V3 [21] are similar but more efficient, with V3 offering
better performance for tasks like image classification and object detection.
The Transformer architecture [42] is a deep learning model widely used for
various tasks, including natural language processing. The Transformer model
consists of several key components:
Multi-Head Attention Layer: This component allows the model to fo-
cus on different parts of the input sequence simultaneously. It enhances the
model’s ability to capture complex dependencies within the data.
Feed-Forward Network (FFN): This is a simple neural network with two
layers, where the first layer uses a ReLU activation function. It processes the
output from the attention layer to further transform the data.
Layer Normalization: This technique normalizes inputs across features,
which helps stabilize and accelerate the training process by ensuring consis-
tent input distributions.
Dropout Layers: These layers are used as a regularization technique to
prevent overfitting. They work by randomly setting a fraction of input units
to zero during training, which helps the model generalize better to new data.
Creating a Transformer model involves constructing and compiling the model
to effectively capture dependencies and relationships in the input data, making
it suitable for various tasks.
SensPS: Sensing Personal Space Using Multimodal Sensors 7
Coordinator
Participant
2.0m
1.5m
1.0m
0.5m
Empatica E4
PupilCore
Fig. 2: The experimental setup. Coordinator is the participant who is not wearing
the eye-tracking glasses and wristband sensor. Participant is the participant
who is wearing the eye-tracking glasses and wristband sensor. In each trial, the
coordinator approaches the Participant by 0.5 meter.
4 Data Collection
In this section, we describe the experimental setup and data collection process.
4.1 Participants
We recruited ten participants for this study, aged between 21 and 28 years (Mean
= 25.1). The sample consisted of seven males and nine females. Participants came
from a diverse range of countries, including Albania, India, Turkey, USA, China,
Russia, Azerbaijan, Brazil, Germany, and Iran. This diversity was intentional to
minimize cultural bias in the results. All participants were either undergraduate
or graduate students at the University of Kaiserslautern-Landau. Informed con-
sent was obtained from all participants, in compliance with GDPR regulations,
explaining the use of their data.
4.2 Experimental Procedure
This study employs the Pupil Core eye-tracking glasses and the Empatica E4
wristband sensor. Figure 2 illustrates the experimental setup. One participant
is designated as stationary, while the other wears eye-tracking glasses and a
wristband sensor. The Pupil Core glasses are positioned on the forehead, with
cameras aligned to capture both eyes. The Empatica E4 wristband connects to
the computer via a USB cable.
Participants first review and sign the consent form. They then wear the Pupil
Core eye-tracking glasses and Empatica E4 wristband sensor and sit in front of
a computer screen. A calibration and validation process follows to ensure the
correct alignment of the devices. For the eye-tracking glasses, participants focus
on a fixation point for alignment. For the wristband sensor, participants press a
button to initiate data collection.
After calibration and validation, participants perform a task involving esti-
mating comfortable personal space. They are seated in a chair. During each trial,
8 Watanabe et al.
Table 1: Comparison of F1 Scores across various models: Support Vector Machine
(SVM), Decision Tree (DT), Random Forest (RF), MobileNet (MN), MobileNet
V2 (MNV2), MobileNet V3 (MNV3), Visual Geometry Group 16 (VGG16), and
Transformer.
Model SVM DT RF VGG16 MN MNV2 MNV3 Transformer
F1 Score 0.53 0.54 0.67 0.31 0.63 0.44 0.33 0.87
Comfortable Not Comfortable
Predicted
ComfortableNot Comfortable
Actual
70.0 30.0
35.0 65.0
Combined Confusion Matrix (%)
30
35
40
45
50
55
60
65
70
(a) Confusion Matrix of Random Forest
Comfortable Not Comfortable
Predicted
ComfortableNot Comfortable
Actual
90.0 10.0
15.0 85.0
Combined Confusion Matrix (%)
10
20
30
40
50
60
70
80
90
(b) Confusion Matrix of Transformer
Fig. 3: Confusion matrices for the top-performing models: Random Forest and
Transformer. The Transformer model shows a higher F1 Score in classifying
instances correctly, as evidenced by its confusion matrix.
the coordinator approaches the participant at distances of 2.0m, 1.5m, 1.0m, and
0.5m. At each distance, the coordinator initiates a discussion on a topic for two
minutes. After the discussion, the coordinator asks the participant to answer a
question about the distance. We use a Google Form for the questionnaire 6.
5 Result and Discussion
Table 1 indicates that the Transformer model significantly outperforms other
models with an F1 score of 0.87, demonstrating its superior capability in captur-
ing complex dependencies in the data. Random Forest follows with a respectable
F1 score of 0.67, making it this study’s best-performing traditional machine-
learning model. MobileNet and its variants show moderate performance. Mo-
bileNet achieves an F1 score of 0.63, while MobileNet V2 and V3 lag behind.
6https://forms.gle/ZbgSQbaAFRpNyHVe8
SensPS: Sensing Personal Space Using Multimodal Sensors 9
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
Importance
gaze_point_3d_y_skewness
gaze_point_3d_z_entropy
gaze_point_3d_x_mean
gaze_point_3d_z_meanFreq
gaze_point_3d_y_mean
gaze_point_3d_y_meanFreq
pupil_diameter_mad
gaze_point_3d_y_mad
gaze_point_3d_x_range
pupil_diameter_mean
gaze_point_3d_z_skewness
gaze_point_3d_z_kurtosis
gaze_point_3d_x_entropy
blink_duration_iqr
gaze_point_3d_y_entropy
gaze_point_3d_y_iqr
eda_psd
gaze_point_3d_y_kurtosis
gaze_point_3d_y_std
gaze_point_3d_z_rms
Feature
Top 20 Feature Importances
Fig. 4: Feature importance when applying Random Forest
The Decision Tree and Support Vector Machine models perform similarly, with
F1 scores of 0.54 and 0.53, respectively. VGG16 records the lowest F1 score at
0.31, indicating its limited effectiveness for this task.
Confusion matrices of the best-performing models are shown in Figure 3. The
Transformer model shows a higher F1 Score in classifying instances correctly, as
evidenced by its confusion matrix. As the results show, comfortable distance is
estimated to be better than discomfort distance.
Figure 4 shows the feature importance when applying Random Forest. The
Gini impurity calculates the feature importance. As the result shows, the eye-
tracking data is more important than the wristband sensor data. One reason
is that eye-tracking data is more sensitive to distance variation than wristband
sensor data. For example, when the distance is closer to the participant, the
eye-tracking data shows a smaller pupil diameter [39]. In this study, partici-
pants labeled uncomfortable distance as 0.5m, the closest distance. Therefore,
the model predicts uncomfortableness regarding the distance variation. Also, the
gaze point was an important feature of the model.
Wristband sensor data was not significant in this study. Among all features,
the EDA was the most important feature of the model. This can be because
the EDA is more sensitive to physiological change than other features. EDA is a
measure of the skin’s electrical activity, which is closely related to the autonomic
nervous system [18]. The autonomic nervous system controls the body’s response
to stress and other stimuli, and EDA is a reliable indicator of this response.
10 Watanabe et al.
In summary, the eye-tracking data is more important than the wristband
sensor data for estimating comfortable personal space. We achieved a F1 score
of 0.87 with the Transformer model, which is the best-performing model in this
study.
6 Limitations and Future Work
The number of participants in the study is limited, which may affect the general-
izability of the results. In this study, we only used ten participants. The number
of participants is not enough to generalize the results to the general popula-
tion. Additionally, the study does not account for individual differences such as
personality traits, cultural background, gender, and situational context, which
could influence the outcomes. This may allign to the bias of the participants.
The trial size in this study is also relatively small, as we only collected two
minutes of data per trial for each participant. This limited duration may not
capture the full range of variability in the participants’ responses and could
affect the robustness and reliability of the findings. The data size can be larger
by asking participants to perform the task for longer periods.
To improve the robustness and reliability of the findings, future studies should
consider applying a sliding window approach with longer durations and varying
overlap intervals. This method would allow for a more comprehensive data anal-
ysis by capturing a wider range of variability in the participants’ responses over
extended periods. The larger data size can achieve in applying different size of
window for the sliding window approach.
The task conducted in this study does not reflect natural conditions, as par-
ticipants were fully aware that they were part of an experiment, which may have
influenced their behavior and responses. Ideally, such experiments should be con-
ducted in more naturalistic settings, often called "in the wild," to obtain more
genuine and ecologically valid data. Additionally, the experiment did not con-
sider the different types of discussions that participants might engage in, which
could have varying impacts on their physiological and psychological responses.
The sensors utilized in this study are restricted to two types: eye-tracking
glasses, which monitor and record the participants’ gaze and pupil diameter,
and a wristband sensor, which measures various physiological signals such as
electrodermal activity (EDA) and heart rate. This limitation in sensor variety
may affect the comprehensiveness of the data collected, as other potentially
relevant physiological and behavioral signals are not captured.
To enhance the generalizability of the results, it is essential to increase the
sample size and include a more diverse participant pool in future work. Incor-
porating a broader range of individual differences, such as personality traits and
cultural backgrounds, will provide a more nuanced understanding of the factors
influencing outcomes. Extending the data collection duration and employing
advanced data analysis techniques, like the sliding window approach, will help
capture a more comprehensive range of participant responses. Conducting ex-
periments in naturalistic settings will improve ecological validity, and expanding
SensPS: Sensing Personal Space Using Multimodal Sensors 11
the variety of sensors used will ensure a more holistic capture of physiological
and behavioral signals. These steps will contribute to more robust, reliable, and
generalizable findings.
7 Conclusion
This study explored the estimation of comfortable personal space using multi-
modal sensors, integrating eye-tracking and wristband-based physiological data.
Our findings indicate that deep learning models, particularly the Transformer
model, effectively predict personal space preferences, achieving an F1 score of
0.87. Eye-tracking data plays a more significant role than wristband sensor data.
These results highlight the potential for intelligent systems to personalize spa-
tial arrangements in workplaces, educational institutions, and public settings,
enhancing user comfort and reducing social stress. However, the study has lim-
itations, including a small participant pool and controlled experimental condi-
tions that may not fully reflect real-world scenarios. Future research should ex-
pand participant diversity, explore naturalistic settings, and integrate additional
physiological and behavioral markers to improve model robustness. By advanc-
ing sensor-based estimations of personal space, this research contributes to the
development of adaptive HCI that dynamically respond to individual comfort
needs, paving the way for more socially aware intelligent systems.
Bibliography
[1] Arakawa, R., Lehman, J.F., Goel, M.: Prism-q&a: Step-aware voice assis-
tant on a smartwatch enabled by multimodal procedure tracking and large
language models. Proceedings of the ACM on Interactive, Mobile, Wearable
and Ubiquitous Technologies 8(4), 1–26 (2024)
[2] Beaulieu, C.: Intercultural study of personal space: A case study. Journal
of applied social psychology 34(4), 794–805 (2004)
[3] Berkovsky, S., Taib, R., Koprinska, I., Wang, E., Zeng, Y., Li, J., Kleitman,
S.: Detecting personality traits using eye-tracking data. In: Proceedings of
the 2019 CHI Conference on Human Factors in Computing Systems. p.
1–12. CHI ’19, Association for Computing Machinery, New York, NY, USA
(2019). https://doi.org/10.1145/3290605.3300451,https://doi.org/
10.1145/3290605.3300451
[4] Bhatt, A., Watanabe, K., Dengel, A., Ishimaru, S.: Appearance-based gaze
estimation with deep neural networks: From data collection to evaluation.
International Journal of Activity and Behavior Computing 2024(1), 1–15
(2024). https://doi.org/10.60401/ijabc.9
[5] Bhatt, A., Watanabe, K., Santhosh, J., Dengel, A., Ishimaru, S.: Estimating
self-confidence in video-based learning using eye-tracking and deep neural
networks. IEEE Access 12, 192219–192229 (2024). https://doi.org/10.
1109/ACCESS.2024.3515838
[6] Blanchard, N., Bixler, R., Joyce, T., D’Mello, S.: Automated physiological-
based detection of mind wandering during learning. In: Intelligent Tutoring
Systems: 12th International Conference, ITS 2014, Honolulu, HI, USA, June
5-9, 2014. Proceedings 12. pp. 55–60. Springer (2014)
[7] Brishtel, I., Ishimaru, S., Augereau, O., Kise, K., Dengel, A.: Assessing cog-
nitive workload on printed and electronic media using eye-tracker and eda
wristband. In: Companion Proceedings of the 23rd International Conference
on Intelligent User Interfaces. pp. 1–2 (2018)
[8] Brishtel, I., Khan, A.A., Schmidt, T., Dingler, T., Ishimaru, S., Dengel,
A.: Mind wandering in a multimodal reading setting: Behavior analysis &
automatic detection using eye-tracking and an eda sensor. Sensors 20(9),
2546 (2020)
[9] Bykowski, A., Kupiński, S.: Automatic mapping of gaze position coordi-
nates of eye-tracking glasses video on a common static reference image. In:
Proceedings of the 2018 ACM Symposium on Eye Tracking Research &
Applications. ETRA ’18, Association for Computing Machinery, New York,
NY, USA (2018). https://doi.org/10.1145/3204493.3208331,https:
//doi.org/10.1145/3204493.3208331
[10] Cai, S., Venugopalan, S., Tomanek, K., Kane, S., Morris, M.R., Cave,
R., Macdonald, R., Campbell, J., Casey, B., Kornman, E., Vance, D.E.,
Beavers, J.: Speakfaster observer: Long-term instrumentation of eye-gaze
typing for measuring aac communication. In: Extended Abstracts of the
SensPS: Sensing Personal Space Using Multimodal Sensors 13
2023 CHI Conference on Human Factors in Computing Systems. CHI
EA ’23, Association for Computing Machinery, New York, NY, USA
(2023). https://doi.org/10.1145/3544549.3573870,https://doi.org/
10.1145/3544549.3573870
[11] Candini, M., Battaglia, S., Benassi, M., di Pellegrino, G., Frassinetti, F.:
The physiological correlates of interpersonal space. Scientific Reports 11(1),
2611 (2021)
[12] Carter, B.T., Luke, S.G.: Best practices in eye tracking research. Interna-
tional Journal of Psychophysiology 155, 49–62 (2020)
[13] Cho, H., Sendhilnathan, N., Nebeling, M., Wang, T., Padmanabhan, P.,
Browder, J., Lindlbauer, D., Jonker, T.R., Todi, K.: Sonohaptics: An audio-
haptic cursor for gaze-based object selection in xr. In: Proceedings of the
37th Annual ACM Symposium on User Interface Software and Technol-
ogy. UIST ’24, Association for Computing Machinery, New York, NY, USA
(2024). https://doi.org/10.1145/3654777.3676384,https://doi.org/
10.1145/3654777.3676384
[14] Coello, Y., Cartaud, A.: The interrelation between peripersonal action space
and interpersonal social space: Psychophysiological evidence and clinical
implications. Frontiers in Human Neuroscience 15, 636124 (2021)
[15] Cook, D.J., Das, S.K.: How smart are our environments? an updated look
at the state of the art. Pervasive and mobile computing 3(2), 53–73 (2007)
[16] Dembinsky, D., Watanabe, K., Dengel, A., Ishimaru, S.: Eye movement
in a controlled dialogue setting. In: Proceedings of the 2024 Symposium
on Eye Tracking Research and Applications. ETRA ’24, Association for
Computing Machinery, New York, NY, USA (2024). https://doi.org/10.
1145/3649902.3653337,https://doi.org/10.1145/3649902.3653337
[17] Dembinsky, D., Watanabe, K., Dengel, A., Ishimaru, S.: Gaze generation
for avatars using gans. IEEE Access 12, 101536–101548 (2024). https:
//doi.org/10.1109/ACCESS.2024.3430835
[18] Desai, U., Shetty, A.D.: Electrodermal activity (eda) for treatment of neu-
rological and psychiatric disorder patients: a review. In: 2021 7th Interna-
tional Conference on Advanced Computing and Communication Systems
(ICACCS). vol. 1, pp. 1424–1430. IEEE (2021)
[19] Funk, M., Sahami, A., Henze, N., Schmidt, A.: Using a touch-sensitive
wristband for text entry on smart watches. In: CHI ’14 Extended Ab-
stracts on Human Factors in Computing Systems. p. 2305–2310. CHI
EA ’14, Association for Computing Machinery, New York, NY, USA
(2014). https://doi.org/10.1145/2559206.2581143,https://doi.org/
10.1145/2559206.2581143
[20] Geers, L., Coello, Y.: The relationship between action, social and multisen-
sory spaces. Scientific Reports 13(1), 202 (2023)
[21] Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang,
W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In:
Proceedings of the IEEE/CVF international conference on computer vision.
pp. 1314–1324 (2019)
14 Watanabe et al.
[22] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand,
T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural
networks for mobile vision applications. CoRR abs/1704.04861 (2017),
http://arxiv.org/abs/1704.04861, visited on May 19, 2023
[23] Kopácsi, L., Klimenko, A., Barz, M., Sonntag, D.: Exploring gaze-based
menu navigation in virtual environments. In: Proceedings of the 2024 ACM
Symposium on Spatial User Interaction. SUI ’24, Association for Comput-
ing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/
3677386.3688887,https://doi.org/10.1145/3677386.3688887
[24] Lee, C.J., Zhang, R., Agarwal, D., Yu, T.C., Gunda, V., Lopez, O., Kim,
J., Yin, S., Dong, B., Li, K., Sakashita, M., Guimbretiere, F., Zhang,
C.: Echowrist: Continuous hand pose tracking and hand-object interac-
tion recognition using low-power active acoustic sensing on a wristband.
In: Proceedings of the 2024 CHI Conference on Human Factors in Comput-
ing Systems. CHI ’24, Association for Computing Machinery, New York,
NY, USA (2024). https://doi.org/10.1145/3613904.3642910,https:
//doi.org/10.1145/3613904.3642910
[25] Lim, J., Koh, Y., Kim, A., Lee, U.: Exploring context-aware mental health
self-tracking using multimodal smart speakers in home environments. In:
Proceedings of the 2024 CHI Conference on Human Factors in Comput-
ing Systems. CHI ’24, Association for Computing Machinery, New York,
NY, USA (2024). https://doi.org/10.1145/3613904.3642846,https:
//doi.org/10.1145/3613904.3642846
[26] Lystbæk, M.N., Pfeuffer, K., Grønbæk, J.E.S., Gellersen, H.: Exploring
gaze for assisting freehand selection-based text entry in ar. Proc. ACM
Hum.-Comput. Interact. 6(ETRA) (May 2022). https://doi.org/10.
1145/3530882,https://doi.org/10.1145/3530882
[27] Menghini, L., Gianfranchi, E., Cellini, N., Patron, E., Tagliabue, M., Sarlo,
M.: Stressing the accuracy: Wrist-worn wearable sensor validation over dif-
ferent conditions. Psychophysiology 56(11), e13441 (2019)
[28] Meteier, Q., Mugellini, E., Angelini, L., Verdon, A.A., Senn-Dubey, C.,
Vasse, J.M.: Enhancing the metacognition of nursing students using eye
tracking glasses. In: Proceedings of the 2023 Symposium on Eye Tracking
Research and Applications. ETRA ’23, Association for Computing Machin-
ery, New York, NY, USA (2023). https://doi.org/10.1145/3588015.
3590115,https://doi.org/10.1145/3588015.3590115
[29] Minakata, K., Hansen, J.P., MacKenzie, I.S., Bækgaard, P., Rajanna, V.:
Pointing by gaze, head, and foot in a head-mounted display. In: Proceedings
of the 11th ACM Symposium on Eye Tracking Research & Applications.
ETRA ’19, Association for Computing Machinery, New York, NY, USA
(2019). https://doi.org/10.1145/3317956.3318150,https://doi.org/
10.1145/3317956.3318150
[30] Nakamura, Y., Matsuda, Y., Arakawa, Y., Yasumoto, K.: Waistonbelt x:
A belt-type wearable device with sensing and intervention toward health
behavior change. Sensors 19(20), 4600 (2019)
SensPS: Sensing Personal Space Using Multimodal Sensors 15
[31] Nandrino, J.L., Ducro, C., Iachini, T., Coello, Y.: Perception of peripersonal
and interpersonal space in patients with restrictive-type anorexia. European
Eating Disorders Review 25(3), 179–187 (2017)
[32] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2:
Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE con-
ference on computer vision and pattern recognition. pp. 4510–4520 (2018)
[33] Sehrt, J., Yilmaz, U., Kosch, T., Schwind, V.: Closing the loop: The effects of
biofeedback awareness on physiological stress response using electrodermal
activity in virtual reality. In: Extended Abstracts of the CHI Conference
on Human Factors in Computing Systems. CHI EA ’24, Association for
Computing Machinery, New York, NY, USA (2024). https://doi.org/10.
1145/3613905.3650830,https://doi.org/10.1145/3613905.3650830
[34] Shah, V., Moser, B.B., Watanabe, K., Dengel, A.: Webcam-based pupil di-
ameter prediction benefits from upscaling. arXiv preprint arXiv:2408.10397
(2024)
[35] Shah, V., Watanabe, K., Moser, B.B., Dengel, A.: Eyedentify: A dataset
for pupil diameter estimation based on webcam images. arXiv preprint
arXiv:2407.11204 (2024)
[36] Shi, Y.: A bibliometric analysis of eye tracking in user experience research.
In: International Conference on Human-Computer Interaction. pp. 178–193.
Springer (2024)
[37] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
[38] Sorokowska, A., Sorokowski, P., Hilpert, P., Cantarero, K., Frackowiak, T.,
Ahmadi, K., Alghraibeh, A.M., Aryeetey, R., Bertoni, A., Bettache, K.,
et al.: Preferred interpersonal distances: A global comparison. Journal of
Cross-Cultural Psychology 48(4), 577–592 (2017)
[39] Sulutvedt, U., Mannix, T.K., Laeng, B.: Gaze and the eye pupil adjust to
imagined size and distance. Cognitive Science 42(8), 3159–3176 (2018)
[40] Tanaka, N., Watanabe, K., Ishimaru, S., Dengel, A., Ata, S., Fujimoto, M.:
Concentration estimation in online video lecture using multimodal sensors.
In: Companion of the 2024 on ACM International Joint Conference on Per-
vasive and Ubiquitous Computing. p. 71–75. UbiComp ’24, Association for
Computing Machinery, New York, NY, USA (2024). https://doi.org/10.
1145/3675094.3677587,https://doi.org/10.1145/3675094.3677587
[41] Vainio, T., Karppi, I., Jokinen, A., Leino, H.: Towards novel urban planning
methods – using eye-tracking systems to understand human attention in
urban environments. In: Extended Abstracts of the 2019 CHI Conference on
Human Factors in Computing Systems. p. 1–8. CHI EA ’19, Association for
Computing Machinery, New York, NY, USA (2019). https://doi.org/10.
1145/3290607.3299064,https://doi.org/10.1145/3290607.3299064
[42] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the
31st International Conference on Neural Information Processing Systems.
p. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
16 Watanabe et al.
[43] Watanabe, K., Dengel, A., Ishimaru, S.: Metacognition-engauge: Real-time
augmentation of self-and-group engagement levels understanding by gauge
interface in online meetings. In: Proceedings of the Augmented Humans In-
ternational Conference 2024. p. 301–303. AHs ’24, Association for Comput-
ing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/
3652920.3653054,https://doi.org/10.1145/3652920.3653054
[44] Watanabe, K., Sathyanarayana, T., Dengel, A., Ishimaru, S.: Engauge: En-
gagement gauge of meeting participants estimated by facial expression and
deep neural network. IEEE Access 11, 52886–52898 (2023)
[45] Watanabe, K., Soneda, Y., Matsuda, Y., Nakamura, Y., Arakawa, Y., Den-
gel, A., Ishimaru, S.: Discaas: Micro behavior analysis on discussion by
camera as a sensor. Sensors 21(17), 5719 (2021)
[46] Zagermann, J., Pfeil, U., Reiterer, H.: Measuring cognitive load using
eye tracking technology in visual computing. In: Proceedings of the Sixth
Workshop on Beyond Time and Errors on Novel Evaluation Methods for
Visualization. p. 78–85. BELIV ’16, Association for Computing Machin-
ery, New York, NY, USA (2016). https://doi.org/10.1145/2993901.
2993908,https://doi.org/10.1145/2993901.2993908