Fig 2 - uploaded by Lorenzo J. Tardon
Content may be subject to copyright.
Source publication
In this paper, we present a system for the detection of fast gestural motion by using a linear predictor of hand movements. We also use the proposed detection scheme for the implementation of a virtual drumkit simulator. A database of drum-hitting motions is gathered and two different sets of features are proposed to discriminate different drum-hit...
Context in source publication
Context 1
... Gesture detection: in this step, the system assesses whether the data read belongs to a gesture of interest or not. In the first case, the data progresses to the next step, otherwise the data are discarding. – Gesture classification: when a gesture of interest is found, the system identifies the type of gesture performed by analyzing the other features considered. The manuscript is structured following the scheme in Fig. 1. Section 2 presents the user motion modeling and Section 3 presents the gesture detection and classification schemes. Then, Section 4 presents the experimental evaluation conducted along with the most relevant results found. Finally, the last section draws the conclusions found and potential future works. In this section the complete user motion model is presented. This model includes the user’s skeleton data (Section 2.1), the model for the identification of the presence of fast hitting- like gestures for real-time interaction (Section 2.2) and the definition of specific features for gesture discrimination (Section 2.3). For the purpose of modelling user motion, we have considered that our system is capable of tracking the position of up to 15 joints from the user’s skeleton, as shown in Fig. 2. This data, however, needs to be processed further, as it is strongly influenced by user position and height. In order to define a common reference point, the centre of mass → n 0 = (c x , c y , c z ) of the user silhouette is calculated ...
Similar publications
Nowadays, gestures are being adopted as a new modality in the field of Human-Computer Interaction (HMI), where the physical movements of the whole body can perform unlimited actions. Soundpainting is a language of artistic composition used for more than forty years. However, the work on the recognition of SoundPainting gestures is limited and they...
Citations
... There is a growing number of original ways of creating, interacting, and playing with music using different possibilities technology can offer. For several years now, new sys tems appear constantly in which the musical experience is more and more immersive and does not rely on physical instruments, but generic commercial or specific systems for motion detection [15, 16,24,34,40] or the interaction with elements of a physical desk (pencil, pen, table, etc.) [12] or other interaction means. In addition, more and more studies Ana M. Barbancho, Lorenzo J. Tardón and Isabel Barbancho are authors contributed equally to this work are emerging to understand music in relation to the way people interact with it [11,13]. ...
In this paper, a system to build music in an intuitive and accessible way, with Lego bricks, is presented. The system makes use of the new powerful and cheap possibilities that technology offers for making old things in a new way. The Raspberry Pi is used to control the system and run the necessary algorithms, customized Lego bricks are used for building melodies, custom electronic designs, software pieces and 3D printed parts complete the items employed. The system designed is modular, it allows creating melodies with chords and percussion or just melodies or perform as a beatbox or a melody box. The main interaction with the system is made using Lego-type building blocks. Tests have demonstrated its versatility and ease of use, as well as its usefulness in music learning for both children and adults.
... There is a growing number of original ways of creating, interacting, and playing with music using different possibilities technology can offer. For several years now, new systems appear constantly in which the musical experience is more and more immersive and does not rely on physical instruments, but generic commercial or specific systems for motion detection [15,16,24,34,40] or the interaction with elements of a physical desk (pencil, pen, table, etc.) [12] or other interaction means. In addition, more and more studies Ana M. Barbancho, Lorenzo J. Tardón and Isabel Barbancho are authors contributed equally to this work are emerging to understand music in relation to the way people interact with it [11,13]. ...
In this paper, a system to build music in an intuitive and accessible way, with Lego bricks, is presented. The system makes use of the new powerful and cheap possibilities that technology offers for making old things in a new way. The Raspberry Pi is used to control the system and run the necessary algorithms, customized Lego bricks are used for building melodies, custom electronic designs, software pieces and 3D printed parts complete the items employed. The system designed is modular, it allows creating melodies with chords and percussion or just melodies or perform as a beatbox or a melody box. The main interaction with the system is made using Lego-type building blocks. Tests have demonstrated its versatility and ease of use, as well as its usefulness in music learning for both children and adults.
... With the popularization of image processing models and sophisticated motion tracking sensors, several existing technologies have emphasized on building virtual musical instruments using 3D hand gestures in-air [6,14,30,31,34] and/or Microsoft's Kinect 3D sensor [17,20,29,36]. However, these sensor based virtual musical instruments are designed for playing instruments in-air, the experience of which is far from playing a real musical instrument. ...
With the rise in pervasive computing solutions, interactive surfaces have gained a large popularity across multi-application domains including smart boards for education, touch-enabled kiosks for smart retail and smart mirrors for smart homes. Despite the increased popularity of such interactive surfaces, existing platforms are mostly limited to custom built surfaces with attached sensors and hardware, that are expensive and require complicated design considerations. To address this, we design a low-cost, intuitive system called MuTable that repurposes any flat surface (such as table tops) into a live musical instrument. This provides a unique, close to real-time instrument playing experience to the user to play any type of musical instrument. This is achieved by projecting the instrument's shape on any tangible surface, sensor calibration, user taps detection, tap position identification, and associated sound generation. We demonstrate the performance of our working system by reporting an accuracy of 83% for detecting softer taps, 100% accuracy for detecting the regular taps, and a precision of 95.7% for estimating hand location.
... Hand pose estimation is to recognize the keypoint location of gestures from an image. Hand pose estimation and hand recognition technology [14,24,31] plays an important role in the fields of Human-Computer Interaction (HCI), Virtual Reality (VR), and Augmented ...
Due to severe articulation, self-occlusion, various scales, and high dexterity of the hand, hand pose estimation is more challenging than body pose estimation. Recently-developed body pose estimation algorithms are not suitable for addressing the unique challenges of hand pose estimation because they are trained without explicitly modeling structural relationships between keypoints. In this paper, we propose a novel cascaded hierarchical CNN(CH-HandNet) for 2D hand pose estimation from a single color image. The CH-HandNet includes three modules, hand mask segmentation, preliminary 2D hand pose estimation, and hierarchical estimation. The first module obtains a hand mask by hand mask segmentation network. The second module connects the hand mask and the intermediate image features to estimate the 2D hand heatmaps. The last module connects hand heatmaps with the intermediate image features and hand mask to estimate finger and palm heatmaps hierarchically. Finally, the extracted Finger(pinky,ring,middle,index) and Palm(thumb and palm) feature information are fused to estimate 2D hand pose. Experimental results on three datasets - OneHand 10k, Panoptic, and Eric.Lee, consistently shows that our proposed CH-HandNet outperforms previous state-of-the-art hand pose estimation methods.
... These data can be exploited by the most recent Machine Learning (ML) techniques in order to extract actionable information and automatize process that are often acquired by years of experience of the teachers. Several contributions in this field have been proposed where these data sources are exploited to study musicians and their musical performance using MOCAP [7,10,12,14], MYO [11,13] and KINECT [29,39] with the purpose of supporting the students individual practice. ...
Learning to play and perform violin is a complex task, that requires a high conscious control and coordination for the player. In this paper, our aim is to understand which technology and which motion features can be used to efficiently and effectively distinguish a professional performance from a student one trading off intrusiveness and accuracy. We collected and made freely available a dataset consisting of Motion Capture (MOCAP), Electromyography, Accelerometer, and Gyroscope (MYO), and Microsoft Kinect (KINECT) recordings of different violinists with different skills performing different exercises covering different pedagogical and technical aspects. We then engineered peculiar features starting from the different sources (MOCAP, MYO, and KINECT) and trained a data-driven classifier to distinguish among two different levels of violinist experience, namely Beginners and Experts. We then studied how much accuracy do we loose when, instead of using MOCAP data (the most intrusive and costly technology), MYO data (which is less intrusive than MOCAP), or the KINECT data (the less intrusive technology) are exploited. In accordance with the hierarchy present in the dataset, we study two different scenarios: extrapolation with respect to different exercises and violinists. Furthermore we study which features are the most predictive ones of the quality of a violinist to corroborate the significance of the results. Results, both in terms of accuracy and insight on the cognitive problem, support the proposal and support the use of the presented technique as an effective tool for students to monitor and enhance their home study and practice.
... At the same time, in the real application environment, a single ordinary camera is relatively common. It can only obtain the user's gesture image from a two-dimensional plane, extract the color, contour and texture information of the gesture, and construct the appearance model of the gesture [10]. ...
Objective. To explore the research and application of multifeature gesture recognition in virtual reality human-computer interaction and to explore the gesture recognition technology scheme to achieve better human-computer interaction experience. Methods. Through the study of the technical difficulties of gesture recognition, comparative static gesture feature recognition and feature fusion algorithms are applied, in the process of research on gesture partition, and adjust the contrast of characteristic parameters, combined with the feature of space-time dynamic gesture tracking trajectory and dynamic gesture recognition and gesture recognition effect under different scheme. Results. The central region was divided into 0 regions, and the central region was divided into 1-4 regions in counterclockwise direction. Compared with the traditional gesture changes, the overlapping problem in the four partition modes was reduced, the gesture was better displayed, and the operation and use of gesture processing were realized more efficiently. Conclusion. Gesture recognition requires the combination of static gesture feature information recognition, gesture feature fusion, spatiotemporal trajectory feature, and dynamic gesture trajectory feature to achieve a better human-computer interaction experience.
... Moreover, we want to avoid the need of any specific hardware. For example, the Microsoft Kinect is a popular platform for training models and collecting data for gesture recognition [3] [4] [5] [6], since it provides not only an RGB video but also depth data. The task of gesture recognition can be simplified by placing special markers on the person's body [7] or special gloves for hand gestures [8]. ...
Gesture recognition opens up new ways for humans to intuitively interact with machines. Especially for service robots, gestures can be a valuable addition to the means of communication to, for example, draw the robot’s attention to someone or something. Extracting a gesture from video data and classifying it is a challenging task and a variety of approaches have been proposed throughout the years. This paper presents a method for gesture recognition in RGB videos using OpenPose to extract the pose of a person and Dynamic Time Warping (DTW) in conjunction with One-Nearest-Neighbor (1NN) for time-series classification. The main features of this approach are the independence of any specific hardware and high flexibility, because new gestures can be added to the classifier by adding only a few examples of it. We utilize the robustness of the Deep Learning-based OpenPose framework while avoiding the data-intensive task of training a neural network ourselves. We demonstrate the classification performance of our method using a public dataset.
... Moreover, we want to avoid the need of any specific hardware. For example, the Microsoft Kinect is a popular platform for training models and collecting data for gesture recognition [3] [4] [5] [6], since it provides not only an RGB video but also depth data. The task of gesture recognition can be simplified by placing special markers on the person's body [7] or special gloves for hand gestures [8]. ...
Gesture recognition opens up new ways for humans to intuitively interact with machines. Especially for service robots, gestures can be a valuable addition to the means of communication to, for example, draw the robot's attention to someone or something. Extracting a gesture from video data and classifying it is a challenging task and a variety of approaches have been proposed throughout the years. This paper presents a method for gesture recognition in RGB videos using OpenPose to extract the pose of a person and Dynamic Time Warping (DTW) in conjunction with One-Nearest-Neighbor (1NN) for time-series classification. The main features of this approach are the independence of any specific hardware and high flexibility, because new gestures can be added to the classifier by adding only a few examples of it. We utilize the robustness of the Deep Learning-based OpenPose framework while avoiding the data-intensive task of training a neural network ourselves. We demonstrate the classification performance of our method using a public dataset.
... Some researchers have also used hand gesture recognition to detect whether a person has Alzheimer's disease [7] or to interpret hand-drawn diagrams [8]. In musical performances, some use the Kinect to recognize drum-hitting gestures [9]. In musical conducting, hand gestures are used to express the conductor's style and emotion and to make performers understand the conductor's guidance [10,11]. ...
Gesture recognition is a human−computer interaction method, which is widely used for educational, medical, and entertainment purposes. Humans also use gestures to communicate with each other, and musical conducting uses gestures in this way. In musical conducting, conductors wave their hands to control the speed and strength of the music played. However, beginners may have a limited comprehension of the gestures and might not be able to properly follow the ensembles. Therefore, this paper proposes a real-time musical conducting gesture recognition system to help music players improve their performance. We used a single-depth camera to capture image inputs and establish a real-time dynamic gesture recognition system. The Kinect software development kit created a skeleton model by capturing the palm position. Different palm gestures were collected to develop training templates for musical conducting. The dynamic time warping algorithm was applied to recognize the different conducting gestures at various conducting speeds, thereby achieving real-time dynamic musical conducting gesture recognition. In the experiment, we used 5600 examples of three basic types of musical conducting gestures, including seven capturing angles and five performing speeds for evaluation. The experimental result showed that the average accuracy was 89.17% in 30 frames per second.
... In many areas, it is interesting and essential for many applications. It is also the best way to analyze the motion and behavior with the human motion capture system, which is low-cost and able to reconstruct human motion almost in any environment in real-time [19]. Then it is widely used in sports training, sports analysis, animation, medical rehabilitation analysis, humanoid robots and virtual humans controller [7]. ...