Figure 2 - uploaded by Alessio Bellino
Content may be subject to copyright.
Source publication
We present SEQUENCE, a novel interaction technique for selecting objects from a distance. Objects display different rhythmic patterns by means of animated dots, and users can select one of them by matching the pattern through a sequence of taps on a smartphone. The technique works by exploiting the temporal coincidences between patterns displayed b...
Contexts in source publication
Context 1
... show how SEQUENCE could be applied in a smart home environment, we designed a prototype. In our scenario (Figure 2), we show how to control a smart TV and a light. Six SEQUENCE controls are used: two of them are physical whereas the remaining four are displayed on the smart TV. ...
Context 2
... SEQUENCE controls are used: two of them are physical whereas the remaining four are displayed on the smart TV. Regarding the physical controls, the first one is used for turning on and off the light, the second one for turning on and off the smart TV (see Figure 2). Once the smart TV is turned on, the four controls displayed on it are used to change channel (next and previous, Figure 3-bottom) and regulate the volume (increase and decrease, Figure 3-left). ...
Context 3
... all the boxes (green and gray) are needed to mark the time regularly at a predetermined interval, but active events of the user should occur only when the dot moves inside the green boxes. We handcraft a corresponding physical version using a polystyrene-like surface suitably tailored that hosts a series of LEDs connected to Arduino (see Figure 2). ...
Context 4
... is why for any binary user input during the activation of a rhythmic sequence, any SEQUENCE control provides feedback. If a dot inside the box was matched correctly, both dot and box disappear ( Figure 12). If an error occurs when matching a dot inside a box (e.g., the user performs an active event on a gray box), the entire control is reset, and the entire rhythmic sequence must be repeated from the beginning ( Figure 13). ...
Context 5
... fact, as discussed later in section 5.2, many (wearable) devices already present in our life (e.g. Figure 21) are normally equipped with touch sensors or mechanical buttons that could be leveraged for supporting SEQUENCE at no cost and right now. ...
Context 6
... to the data in Figure 15-right, the most appropriate SEQUENCE rhythm speed with an acceptable error rate (6.3%) would be 3.2s. At any rate, observing the data in Figure 2-left, we noted a suspicious difference between fixed and rotating visualization ways (1.34 vs 11.34% of error). We do not expect to have significant differences among data since we have just few users and trials. ...
Context 7
... performance. To show details on users who carried out this evaluation exhaustively, we summarize the performances of each of them in Figure 20. It shows activation time, errors and missed elements of all users for both visualization ways. ...
Context 8
... Ghomi et al.'s work, SEQUENCE aims at providing a complete interaction technique based on simple rhythms that are presented to users by using animated visual widgets -both on-screen ( Figure 3) and physical ( Figure 2). Therefore, the advantage of SEQUENCE is that users do not need to memorize different rhythmic patterns but can synchronize with the desired element by inferring the corresponding rhythm through its visual animated representation. ...
Context 9
... believe the touch modality to be very interesting because it enables, right now, the control of rhythmic widgets using a wide class of (wearable) devices equipped with touch sensors and mechanical buttons (e.g. Figure 21) such as Bluetooth headset, pocket button (e.g., Flik.io, Amazon Dash button), smartwatch, etc. ...
Context 10
... production costs, instead, the fixed visualization way is the cheapest alternative if we want to build a corresponding physical version. A corresponding physical version of fixed widgets, in fact, can be easily built just using a series of LEDs connected to Arduino as displayed in Figure 2. Instead, rotating widgets are more expensive to build since a step motor that must be synchronized with SEQUENCE system would be needed. ...
Context 11
... all our input variations displayed in section 3.3.8 work providing binary inputs, they are fundamentally different showing the flexibility of SEQUENCE. We believe the touch modality to be very interesting because it enables, right now, the control of rhythmic widgets using a wide class of (wearable) devices equipped with touch sensors and mechanical buttons (e.g. Figure 21) such as Bluetooth headset, pocket button (e.g., Flik.io, Amazon Dash button), smartwatch, etc. The touchless modalities have, instead, a noteworthy advantage: they allow the control of widgets without any intrusive devices. Eye control requires small movements, but during the eye blinks, SEQUENCE interface is not visible to the user (even if just for few instants). This could disturb users because they may need continuous visual feedback to remain synchronized with the desired control. At any rate, more effective evaluations should be carried out to investigate whether the problem is relevant or not. The control through the beating of lips, instead, could seem quite unusual and unnatural, but it seems an effective input modality for SEQUENCE. In fact, the marking rhythm with the lips is quite common: when talking or singing, in fact, people mark the rhythm opening and closing the lips. Moreover, the beatboxing [54] is the art to mimic drum machines with the mouth, lips, tongue, and voice. Professional musicians can reproduce even complex rhythms with this technique. Therefore, this leads us to think that lips may be used profitably also with SEQUENCE since it uses simple ...
Context 12
... is a fundamental aspect in human-computer interaction because it satisfies communication expectations of users when interacting with a system [44]. That is why for any binary user input during the activation of a rhythmic sequence, any SEQUENCE control provides feedback. If a dot inside the box was matched correctly, both dot and box disappear ( Figure 12). If an error occurs when matching a dot inside a box (e.g., the user performs an active event on a gray box), the entire control is reset, and the entire rhythmic sequence must be repeated from the beginning ( Figure 13). Figure 13. When user errors occur, the control is reset and the activation sequence must be repeated from the ...
Context 13
... show how SEQUENCE could be applied in a smart home environment, we designed a prototype. In our scenario (Figure 2), we show how to control a smart TV and a light. Six SEQUENCE controls are used: two of them are physical whereas the remaining four are displayed on the smart TV. Regarding the physical controls, the first one is used for turning on and off the light, the second one for turning on and off the smart TV (see Figure 2). Once the smart TV is turned on, the four controls displayed on it are used to change channel (next and previous, Figure 3-bottom) and regulate the volume (increase and decrease, Figure 3-left). The four controls displayed on the smart TV are set to support continuous control so that user can change channel and regulate the volume rapidly (e.g. Figure 4). Design of SEQUENCE control ...
Context 14
... show how SEQUENCE could be applied in a smart home environment, we designed a prototype. In our scenario (Figure 2), we show how to control a smart TV and a light. Six SEQUENCE controls are used: two of them are physical whereas the remaining four are displayed on the smart TV. Regarding the physical controls, the first one is used for turning on and off the light, the second one for turning on and off the smart TV (see Figure 2). Once the smart TV is turned on, the four controls displayed on it are used to change channel (next and previous, Figure 3-bottom) and regulate the volume (increase and decrease, Figure 3-left). The four controls displayed on the smart TV are set to support continuous control so that user can change channel and regulate the volume rapidly (e.g. Figure 4). Design of SEQUENCE control ...
Context 15
... that we expected rotating widgets to have better performance than fixed widgets -since meaningful animated user interfaces usually improve decision making [21] -the user evaluations presented in section 4.2 does not show significant difference among such alternatives and not allow us to establish what the best visualization way is. Considering production costs, instead, the fixed visualization way is the cheapest alternative if we want to build a corresponding physical version. A corresponding physical version of fixed widgets, in fact, can be easily built just using a series of LEDs connected to Arduino as displayed in Figure 2. Instead, rotating widgets are more expensive to build since a step motor that must be synchronized with SEQUENCE system would be needed. In conclusion, we prefer the fixed version just for economic ...
Context 16
... time distribution is quite similar for both visualization ways: more than 75% of the activations occur within 4 s. Minimum activation time for fixed visualization was 2.2 s, whereas that for rotating visualization was 2.5s. Subjects' performance. To show details on users who carried out this evaluation exhaustively, we summarize the performances of each of them in Figure 20. It shows activation time, errors and missed elements of all users for both visualization ways. Users 3 and 11 made no error for fixed condition. Users 11 and 12 made no error for rotating condition. No missed elements occurred for users 3, 4, 5, 10, 11, and 12 in both visualization conditions. Eight users out of 12 made less than 3% of errors in both fixed and rotating conditions. User 3 was the fastest whereas user 9 was the slowest. Surprisingly, user 9, even if the slowest, had a low rate of errors and missed elements, i.e., 1% of error and 2% of missed elements for fixed condition and 3% of both errors and mixed elements for rotating conditions. The worst performance in terms of errors and missed elements rate are obtained by users 8 and ...
Context 17
... Ghomi et al.'s work, SEQUENCE aims at providing a complete interaction technique based on simple rhythms that are presented to users by using animated visual widgets -both on-screen ( Figure 3) and physical ( Figure 2). Therefore, the advantage of SEQUENCE is that users do not need to memorize different rhythmic patterns but can synchronize with the desired element by inferring the corresponding rhythm through its visual animated representation. In this way, users can discover and learn what are the different rhythmic patterns associated to different elements instead of having arbitrary associations as in Ghomi et al.'s work. Moreover, SEQUENCE supports continuous activation, which is useful to trigger the same elements repeatedly as discussed in section 3.3.6. Finally, unlike Ghomi et al.'s work, we introduced different kinds of user inputs -touch and touchless -to demonstrate in practice the flexibility of the technique. In this paper we evaluated the touch input variation, but we are planning to evaluate and/or envision other touchless variations in next ...
Context 18
... the aggregated data, moreover, there is no relevant improvement of time of activation after the speed of 3.2 s. According to the data in Figure 15-right, the most appropriate SEQUENCE rhythm speed with an acceptable error rate (6.3%) would be 3.2s. At any rate, observing the data in Figure 2-left, we noted a suspicious difference between fixed and rotating visualization ways (1.34 vs 11.34% of error). We do not expect to have significant differences among data since we have just few users and trials. At any rate, we observed data of single users (Figure 16) to better reflect about the most appropriate setting of SEQUENCE rhythm speed. Users 2 and 5 are quite regular: time of activation decreases until speed of 2.4 s and, moreover, both errors and missed elements are 0% at that speed. This leads us to think that the most capable users reach the best performance at that speed. User 3 made a considerable number of errors (30%) for rotating-3.2s ...
Context 19
... we are interested in user performances rather than system performances, we carried out our evaluations using the most reliable input modality we designed, namely, the smartphone app. Smartphones, in fact, are quite effective when detecting touch, i.e., they are robust against false positive and true negative. In fact, rarely does a modern touch capacitive smartphone not detect touches, if users perform them. The two touchless input modalities we proposed, instead, can be affected by false positives (e.g., the system detects an eye blink, but the user never performed any blink) and true negatives (e.g., the user performs an eye blink, but the system does not detect any blink). Therefore, all the evaluations presented in this section were carried out using the smartphone app because we wanted to investigate SEQUENCE regardless of reliability of different input modalities. Moreover, another relevant reason to evaluate touch modality is the immediate applicability and flexibility of the technique. In fact, as discussed later in section 5.2, many (wearable) devices already present in our life (e.g. Figure 21) are normally equipped with touch sensors or mechanical buttons that could be leveraged for supporting SEQUENCE at no cost and right ...
Context 20
... designed two variants of circular rhythmic patterns ( Figure 11): the first one uses a fixed widget where dots move around it discretely; the second one uses a rotating widget so that the entire rhythmic pattern moves continuously in a circular way. Fixed widgets. In the fixed widgets, a dot moves discretely around the widget marking the time. There are two kinds of time unit boxes: green and gray. When the dot moves to green boxes, an active event should be performed by the user. When instead, the dot moves to gray boxes, no action (pause) should be performed by the user. Therefore, all the boxes (green and gray) are needed to mark the time regularly at a predetermined interval, but active events of the user should occur only when the dot moves inside the green boxes. We handcraft a corresponding physical version using a polystyrene-like surface suitably tailored that hosts a series of LEDs connected to Arduino (see Figure ...
Citations
... This is the author's version of the article that has been published in the proceedings of 23rd IEEE International Symposium on Mixed and Augmented Reality (ISMAR). The final version of this record is available at: 10.1109/IS-MAR62088.2024.00046 of display-guided interactions, facilitated by tapping [10] or touch gestures on the screen [19]. When it comes to hand-based motion matching interaction, PathSync [29] and TraceMatch [13,37] have shown potential using a computer vision-based tracking system, while WaveTrace [40] demonstrated applicability using a smartwatch. ...
... Synchrowatch [33] permits users to control smartwatches by matching rhythm patterns on the screen using a passive magnetic ring as a rhythm detection device. Finally, SEQUENCE [34] employs a novel design that displays rhythmic patterns through eight animated dots arranged circularly around the target, making target selection easier since the associated rhythm patterns are fully displayed to the user [35]. ...
... In this context, the underlying principle is that once users acquire proficiency in interacting with visual orbits, they can seamlessly apply the same mechanism to sound orbits. Finally, it is noteworthy that motion correlation techniques, including SoundOrbit, are often viewed as complementary rather than replacement methods in smart home scenarios [5,34]. These techniques aim to enhance existing interaction styles, like remote control, particularly for routine tasks, emphasizing immediate and convenient user interaction without disrupting established habits [42]. ...
SoundOrbit is a novel input technique that uses motion correlation to control smart devices. The technique associates controls with specific orbital sounds, made of cyclically increasing/decreasing musical scales, and the user can activate a control by mimicking the corresponding sound by body motion. Unlike previous movement-correlation techniques based on visual displays, SoundOrbit operates independent of visual perception, enabling the development of cost-effective smart devices that do not require visual displays. We investigated SoundOrbit by conducting two user studies. The first study evaluated the effectiveness of binaural sound spatialization to create a distinct orbiting sound. In comparison to a cyclic musical scale that is fixed in the apparent auditory space, we found that spatial effects did not improve users’ ability to follow the sound orbit. In the second study, we aimed at determining the optimal system parameters, and discovered that users synchronize better with slower speeds. The technique was found to be feasible and reliable for one and two orbits simultaneously, each orbit using a distinct sound timbre, but not for three orbits due to a high error rate.
... c) Eye Aspect Ratio (EAR) (Soukupová and Cech, 2016). d) Mouth Aspect Ratio (MAR) (Bellino, 2018). e) Examples of interactions between point 33 and other face points. ...
Aims:
Our study aimed to develop a machine learning ensemble to distinguish "at-risk mental states for psychosis" (ARMS) subjects from control individuals from the general population based on facial data extracted from video-recordings.
Methods:
58 non-help-seeking medication-naïve ARMS and 70 healthy subjects were screened from a general population sample. At-risk status was assessed with the Structured Interview for Prodromal Syndromes (SIPS), and "Subject's Overview" section was filmed (5-10 min). Several features were extracted, e.g., eye and mouth aspect ratio, Euler angles, coordinates from 51 facial landmarks. This elicited 649 facial features, which were further selected using Gradient Boosting Machines (AdaBoost combined with Random Forests). Data was split in 70/30 for training, and Monte Carlo cross validation was used.
Results:
Final model reached 83 % of mean F1-score, and balanced accuracy of 85 %. Mean area under the curve for the receiver operator curve classifier was 93 %. Convergent validity testing showed that two features included in the model were significantly correlated with Avolition (SIPS N2 item) and expression of emotion (SIPS N3 item).
Conclusion:
Our model capitalized on short video-recordings from individuals recruited from the general population, effectively distinguishing between ARMS and controls. Results are encouraging for large-screening purposes in low-resource settings.
... c) Eye Aspect Ratio (EAR)[55]. d) Mouth Aspect Ratio (MAR)[56]. e) Examples of interactions between point 33 and other face points. f) Matching pairs used to calculate Spearman's correlation coefficient.This preprint research paper has not been peer reviewed. ...
To prevent the development of schizophrenia, preclinical stages of the disorder, known as "at-risk mental states for psychosis" (ARMS), have been intensively researched for the past three decades. Despite the many advances in the field, identification of ARMS is still resource-consuming and presents important issues regarding accuracy. To address this, our study aimed to develop a machine learning ensemble to distinguish ARMS from control individuals based on facial expression extracted from brief video-recordings.
... In this context, different interaction techniques were studied. Some of these techniques employ rhythmic synchronization [1][2][3][4]: they work by displaying multiple animated controls that show different rhythms in visual form, and the user can select one of them by synchronizing with the corresponding rhythm. Controls can be physical (e.g., in Figure 1) or virtual (i.e., shown on a screen, such as those used in this study, see Figure 2). ...
... From the point of view of the input device, rhythmic synchronization techniques can be simpler compared to those based on movement correlation. While motion-correlation techniques require sensors that detect movement (e.g., cameras [6,7] or kinect [9]), rhythmic synchronization techniques require sensors as simple as a button, e.g., [1,4]. As matter of fact, previous studies showed that these techniques can support a wide variety of sensors [1,3]; in particular, any sensor capable of generating a binary input through which users can perform the required rhythm. ...
... While motion-correlation techniques require sensors that detect movement (e.g., cameras [6,7] or kinect [9]), rhythmic synchronization techniques require sensors as simple as a button, e.g., [1,4]. As matter of fact, previous studies showed that these techniques can support a wide variety of sensors [1,3]; in particular, any sensor capable of generating a binary input through which users can perform the required rhythm. As a result, these techniques are quite flexible. ...
Rhythmic-synchronization-based interaction is an emerging interaction technique where multiple controls with different rhythms are displayed in visual form, and the user can select one of them by matching the corresponding rhythm. These techniques can be used to control smart objects in environments where there may be interfering auditory stimuli that contrast with the visual rhythm (e.g., to control Smart TVs while playing music), and this could compromise users’ ability to synchronize. Moreover, these techniques require certain reflex skills to properly synchronize with the displayed rhythm, and these skills may vary depending on the age and gender of the users. To determine the impact of interfering auditory stimuli, age, and gender on users’ ability to synchronize, we conducted a user study with 103 participants. Our results show that there are no significant differences between the conditions of interfering and noninterfering auditory stimuli and that synchronization ability decreases with age, with males performing better than females—at least as far as younger users are concerned. As a result, two implications emerge: first, users are capable of focusing only on visual rhythm ignoring the auditory interfering rhythm, so listening to an interfering rhythm should not be a major concern for synchronization; second, as age and gender have an impact, these systems may be designed to allow for customization of rhythm speed so that different users can choose the speed that best suits their reflex skills.
... However, as one of the most basic interaction tasks, target selection can be challenging on these new interfaces. There are three reasons: 1) cross-device interaction for a large number of devices calls for association-free target selection techniques [6,10,23]. It is not practical to have a designated controller for each individual device or require users to associate with the devices each time before usage, especially when there is a large number of devices; 2) the interaction expressivity (e.g., audio, gesture) and form factor (e.g. ...
... BitID [49] only senses binary inputs), so traditional target selection techniques are not applicable on such interfaces. Aimed at these challenges, temporal synchronous target selection [6,27,34,50] has been proposed by researchers to enable association-free target selection on devices with different interaction interfaces. Instead of browsing and selecting the target device from a list on a screen, users can generate temporal synchronized signals with a temporal pattern (e.g., blinking) to select the corresponding target. ...
... This kind of technique has three advantages: 1) It does not require device association as long as the pattern for each target is unique, which can save the total interaction time; 2) Temporal signal can be generated on multi-modality interfaces, which enables subtle and accessible selection experience, so that users can choose the appropriate interface when in different scenarios. For example, users can tap fingers (touchscreen [6]), clap hands (audio [22]), tap foot (vibration [52]), contract muscles (EMG [3,35,45]), blink eye (EOG [5]), and even breath [16,17] to sync with the target pattern. As the interaction paradigm-generating binary changing signals in sync with the target pattern-remains the same, users would be able to transfer the interaction experience across different interfaces; 3) The selection technique's extremely low requirement of sensing resources makes it compatible with a wide variety of both new and existing sensors. ...
Temporal synchronous target selection is an association-free selection technique: users select a target by generating signals (e.g., finger taps and hand claps) in sync with its unique temporal pattern. However, classical pattern set design and input recognition algorithm of such techniques did not leverage users' behavioral information, which limits their robustness to imprecise inputs. In this paper, we improve these two key components by modeling users' interaction behavior. In the first user study, we asked users to tap a finger in sync with blinking patterns with various period and delay, and modeled their finger tapping ability using Gaussian distribution. Based on the results, we generated pattern sets for up to 22 targets that minimized the possibility of confusion due to imprecise inputs. In the second user study, we validated that the optimized pattern sets could reduce error rate from 23% to 7% for the classical Correlation recognizer. We also tested a novel Bayesian, which achieved higher selection accuracy than the Correlation recognizer when the input sequence is short. The informal evaluation results show that the selection technique can be effectively scaled to different modalities and sensing techniques.
We introduce a novel one-handed input technique for mobile devices that is not based on pointing, but on motion matching -where users select a target by mimicking its unique animation. Our work is motivated by the findings of a survey (N=201) on current mobile use, from which we identify lingering opportunities for one-handed input techniques. We follow by expanding on current motion matching implementations - previously developed in the context of gaze or mid-air input - so these take advantage of the affordances of touch-input devices. We validate the technique by characterizing user performance via a standard selection task (N=24) where we report success rates (>95%), selection times (~1.6 s), input footprint, grip stability, usability, and subjective workload - in both phone and tablet conditions. Finally, we present a design space that illustrates six ways in which motion matching can be embedded into mobile interfaces via a camera prototype application.