Fig 5 - uploaded by Alessandro Carfì
Content may be subject to copyright.
Source publication
Close human-robot interaction (HRI), especially in industrial scenarios, has been vastly investigated for the advantages of combining human and robot skills. For an effective HRI, the validity of currently available human-machine communication media or tools should be questioned, and new communication modalities should be explored. This article pro...
Contexts in source publication
Context 1
... our scenario, we recognize two states categories, namely menu and action. Menu states describe the GUI ( Figure 5 shows the GUI representation of some menu states), and when they are active, the interaction is limited to the menu navigation. Instead, action states implement the system functionalities, e.g., human teaching of a motion or robot execution of a task. ...
Context 2
... Graphical User Interface arranges the menu's options vertically, and a red selector is used to highlight the selected option (see Figure 5a). The FSM menu states describing the GUI are published on a ROS topic and converted, by a renderer, to a graphical representation. ...
Context 3
... architecture allows the user to control robot functionalities to record and playback end-effector trajectories. In particular, the main menu offers four options: record, playback, sequential playback, and macro mode (see Figure 5a). Whenever one of these options is selected, the corresponding menu is opened. ...
Context 4
... record menu (Figure 5b) displays a list of recorded robot tasks (i.e., endeffector trajectories). The user can overwrite each task by selecting it or creating a new one using the corresponding option. ...
Context 5
... playback menu (Figure 5c) lists all the saved tasks, which can be deleted by the human operator using the corresponding option. When a task is selected, the FSM transits to the playback action state, and the robot reproduces the associated trajectory. ...
Context 6
... sequential playback menu (Figure 5d) allows the user to combine saved tasks in sequences to handle more complex behaviours. The user can remove or substitute a task from the sequence by selecting it. ...
Context 7
... the macro mode menu, see Figure 5e, allows the user to associate one task to each of the three gestures G1, G2 and G3 (the correspondence taskgesture is ordered from top to bottom). The user can customize the mapping by selecting a specific slot and choosing a task from the list of available ones. ...
Context 8
... the recording is finished, the wooden block is back to position A. 3. The third task consists in selecting the macro option from the main menu and associating G1 and G2 (see Figure 4), or two buttons while using the touchscreen, respectively to the pre-recorded Move: A → B and the new Move: B → A. After selecting play, see Figure 5e, the participant has to activate the two actions, using the corresponding inputs, to move the wooden block from A to B and vice versa. 4. The last task consists in selecting the sequence option from the main menu to create a sequence of actions and then selecting play to start the reproduction. ...
Similar publications
This study investigated how displaying a robot's attention heatmap while the robot point gesture at it influences human trust and acceptance of its outputs. We conducted an experiment using two types of visual tasks. In these tasks, the participants were required to decide whether to accept or reject the answers of an AI or robot. The participants...
Citations
... The notable method is the use of inertial sensors on a smartwatch in order to track motions of the arm and hand [46]. However, inertial sensors alone can have difficulties to implicitly sense the state of the hand and, thus, are usually integrated with other sensors [47,48]. Electro-Myography (EMG) is another wearable approach where electrical currents in muscle cells are measured and provide some state of the hand [49,50]. ...
Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.
Construction robots play a pivotal role in enabling intelligent processes within the construction industry. User‐friendly interfaces that facilitate efficient human–robot collaboration are essential for promoting robot adoption. However, most of the existing interfaces do not consider contextual information in the collaborative environment. The situation where humans and robots work together in the same jobsite creates a unique environmental context. Overlooking contextual information would limit the potential to optimize interaction efficiency. This paper proposes a novel context‐aware method that utilizes a two‐stream network to enhance human–robot interaction in construction settings. In the proposed network, the first‐person view‐based stream focuses on the relevant spatiotemporal regions for context extraction, while the motion sensory data‐based stream obtains features related to hand motions. By fusing the vision context and motion data, the method achieves gesture recognition for efficient communication between construction workers and robots. Experimental evaluation on a dataset from five construction sites demonstrates an overall classification accuracy of 92.6%, underscoring the practicality and potential benefits of the proposed method.