Article

Understanding the Roles of Video and Sensor Data in the Annotation of Human Activities

Taylor & Francis
International Journal of Human-Computer Interaction
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Human activities can be recognized in sensor data using supervised machine learning algorithms. In this approach, human annotators must annotate events in the sensor data which are used as input to supervised learning algorithms. Annotating events directly in time series graphs of data streams is difficult. Video is often collected and synchronized to the sensor data to aid human annotators in identifying events in the data. Other work in human activity recognition (HAR) minimizes the cost of annotation by using unsupervised or semi-supervised machine learning algorithms or using algorithms that are more tolerant of human annotation errors. Rather than adjusting algorithms, we focus on the performance of the human annotators themselves. Understanding how human annotators perform annotation may lead to annotation interfaces and data collection schemes that better support annotators. We investigate the accuracy and efficiency of human annotators in the context of four HAR tasks when using video, data, or both to annotate events. After a training period, we found that annotators were more efficient when using data alone on three of four tasks and more accurate when marking event types when using video alone on all four tasks. Annotators were more accurate when marking event boundaries using data alone on two tasks and more accurate using video alone on the other two tasks. Our results suggest that data and video collected for annotation of HAR tasks play different roles in the annotation process and these roles may vary with the HAR task.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The efficacy and precision of human annotators, whether employing video, data, or both for annotating events across four human activity recognition (HAR) tasks [28] observed that annotators were more accurate in classifying kinds of events when employing video alone on all four tasks and more effective while using data alone on three of the four assignments. The annotations of event boundaries based on data alone were more accurate. ...
Article
Full-text available
The combination of Human-Computer Interaction (HCI) technology with biomimetic vision systems has transformational potential in animation design, particularly by incorporating biomechanical principles to create immersive and interactive experiences. Traditional animation approaches frequently lack sensitivity to real-time human motions, which can restrict engagement and realism. This study addresses this constraint by creating a framework that uses Virtual Reality (VR) and Augmented Reality (AR) to generate dynamic settings that include a variety of human activities, informed by biomechanical analysis. A biomimetic vision system is used to record these motions with wearable sensors, allowing for precise monitoring of user activity while considering biomechanical factors such as joint angles, force distribution, and movement patterns. The recorded data is preprocessed using Z-score normalization methods and extracted using Principal Component Analysis (PCA). This study proposed an Egyptian Vulture optimized Adjustable Long Short-Term Memory Network (EVO-ALSTM) technique for motion classification, specifically tailored to recognize biomechanical characteristics of human movements. Results demonstrate a significant improvement in precision (93%), F1-score (91%), accuracy (95%), and recall (90%) for the motion recognition system, highlighting the effectiveness of biomechanical insights in enhancing animation design. The findings indicate that integrating real-time biomechanical data into the animation process leads to more engaging and realistic user experiences. This study not only advances the subject of HCI but also provides the framework for future investigations into sophisticated animation technologies that use biomimetic and biomechanical systems.
... Vision based HAR has various applications [16] but it has fundamental shortcomings such as record in low vision scenarios [17] and unable to deal with sudden light flashes. Other sensor based methods gave a new direction in HAR due to the recent B Rishi Raj Sharma dr.rrsrrs@gmail.com 1 development in sensor technology [1,18,43]. Sensor based activity recognition obtains data from gyroscope, accelerometer, radar etc. [2,42]. It provides micro-level information about human behavior using the recorded sensors data. ...
Article
Full-text available
A contact-free people walk identification has numerous applications in surveillance and suspicious activity detection to take the precautionary actions. This paper presents a millimeter-wave radar-based automated system for walk type identification in which the received complex radar signals are decomposed using flexible analytic wavelet transform. The Correntropy and centered temporal correntropy features are computed for the decomposed components of radar signals and followed by students t-test based feature ranking. The classification is done using ensemble subspace discriminant classifier with classifier fusion. Six different types of walk namely, slow walk (SW), fast walk (FW), slow walk with hand in pocket (SWHP), slow walk with swinging hands (SWSH), walk with a limp (WL), and walk hiding bottle (WHB) are considered for walk identification. Six different combinations of different types of walk are formed to develop a robust system for accurate identification in different scenarios. The proposed method achieved 85.5% accuracy to classify all six classes and 100% accuracy to classify SW and FW. In terms of activity focused identification, 100% accuracy is achieved using the proposed system to classify SWHP, SWSH, WL, and WHB classes. The classification performance is better than the compared methods for the considered walk activity combinations in terms of accuracy, sensitivity, and specificity.
Article
Full-text available
In this commentary, we discuss the nature of reversible and irreversible questions, that is, questions that may enable one to identify the nature of the source of their answers. We then introduce GPT-3, a third-generation, autoregressive language model that uses deep learning to produce human-like texts, and use the previous distinction to analyse it. We expand the analysis to present three tests based on mathematical, semantic (that is, the Turing Test), and ethical questions and show that GPT-3 is not designed to pass any of them. This is a reminder that GPT-3 does not do what it is not supposed to do, and that any interpretation of GPT-3 as the beginning of the emergence of a general form of artificial intelligence is merely uninformed science fiction. We conclude by outlining some of the significant consequences of the industrialisation of automatic and cheap production of good, semantic artefacts.
Conference Paper
Full-text available
Developing systems for Human Activity Recognition (HAR) using wearables typically relies on datasets that were manually annotated by human experts with regards to precise timings of instances of relevant activities. However, obtaining such data annotations is often very challenging in the predominantly mobile scenarios of Human Activity Recognition. As a result, labels often carry a degree of uncertainty-label jitter-with regards to: i) correct temporal alignments of activity boundaries; and ii) correctness of the actual label provided by the human annotator. In this work, we present a scheme that explicitly incorporates label jitter into the model training process. We demonstrate the effectiveness of the proposed method through a systematic experimental evaluation on standard recognition tasks for which our method leads to significant increases of mean F1 scores.
Article
Full-text available
Modern smartphones and wearables often contain multiple embedded sensors which generate significant amounts of data. This information can be used for body monitoring-based areas such as healthcare, indoor location, user-adaptive recommendations and transportation. The development of Human Activity Recognition (HAR) algorithms involves the collection of a large amount of labelled data which should be annotated by an expert. However, the data annotation process on large datasets is expensive, time consuming and difficult to obtain. The development of a HAR approach which requires low annotation effort and still maintains adequate performance is a relevant challenge. We introduce a Semi-Supervised Active Learning (SSAL) based on Self-Training (ST) approach for Human Activity Recognition to partially automate the annotation process, reducing the annotation effort and the required volume of annotated data to obtain a high performance classifier. Our approach uses a criterion to select the most relevant samples for annotation by the expert and propagate their label to the most confident samples. We present a comprehensive study comparing supervised and unsupervised methods with our approach on two datasets composed of daily living activities. The results showed that it is possible to reduce the required annotated data by more than 89% while still maintaining an accurate model performance.
Conference Paper
Full-text available
Although the fourth industrial revolution is already in pro-gress and advances have been made in automating factories, completely automated facilities are still far in the future. Human work is still an important factor in many factories and warehouses, especially in the field of logistics. Manual processes are, therefore, often subject to optimization efforts. In order to aid these optimization efforts, methods like human activity recognition (HAR) became of increasing interest in industrial settings. In this work a novel deep neural network architecture for HAR is introduced. A convolutional neural network (CNN), which employs temporal convolutions, is applied to the sequential data of multiple intertial measurement units (IMUs). The network is designed to separately handle different sensor values and IMUs, joining the information step-by-step within the architecture. An evaluation is performed using data from the order picking process recorded in two different warehouses. The influence of different design choices in the network architecture, as well as pre- and post-processing, will be evaluated. Crucial steps for learning a good classification network for the task of HAR in a complex industrial setting will be shown. Ultimately, it can be shown that traditional approaches based on statistical features as well as recent CNN architectures are outperformed.
Article
Full-text available
Current activity recognition systems mostly work with static, pre-trained sensor configuration. As a consequence they are not able to leverage new sensors appearing in their environment (e.g. the user buying a new wearable devices). In this work we present a method inspired by semi-supervised graph methods that can add new sensors to an existing system in an unsupervised manner. We have evaluated our method in two well known activity recognition datasets and found that it can take advantage of the information provided by new unknown sensor sources, improving the recognition performance in most cases.
Article
Full-text available
Quality assessment in cricket is a complex task that is performed by understanding the combination of individual activities a player is able to perform and by assessing how well these activities are performed. We present a framework for inexpensive and accessible, automated recognition of cricketing shots. By means of body-worn inertial measurement units, movements of batsmen are recorded, which are then analysed using a parallelised, hierarchical recognition system that automatically classifies relevant categories of shots as required for assessing batting quality. Our system then generates meaningful visualisations of key performance parameters, including feet positions, attack/defence, and distribution of shots around the ground. These visualisations are the basis for objective skill assessment thereby focusing on specific personal improvement points as identified through our system. We evaluated our framework through a deployment study where 6 players engaged in batting exercises. Based on the recorded movement data we could automatically identify 20 classes of unique batting shot components with an average F1-score greater than 88%. This analysis is the basis for our detailed analysis of our study participants’ skills. Our system has the potential to rival expensive vision-based systems but at a fraction of the cost.
Conference Paper
Full-text available
Data preprocessing, feature selection and classification algorithms usually occupy the bulk of surveys on human activity recognition (HAR). This paper instead gives a brief review on the data acquisition which is a critical stage of the wearable data-driven-based HAR. The review focuses on the determination of sensor types, modality of sensor devices, sensor deployment and data collection. The work aims to provide a comprehensive and detailed guidance for the fundamental part in HAR, also to highlight the challenges related to the topics reviewed.
Conference Paper
Full-text available
In this paper we describe a multimodal-multisensor annotation tool for physiological computing; for example mobile gesture-based interaction devices or health monitoring devices can be connected. It should be used as an expert authoring tool to annotate multiple video-based sensor streams for domain-specific activities. Resulting datasets can be used as supervised datasets for new machine learning tasks. Our tool provides connectors to commercially available sensor systems (e.g., Intel RealSense F200 3D camera, Leap Motion, and Myo) and a graphical user interface for annotation.
Article
Full-text available
The last 20 years have seen an ever increasing research activity in the field of human activity recognition. With activity recognition having considerably matured so did the number of challenges in designing, implementing and evaluating activity recognition systems. This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition. It specifically focuses on activity recognition using on-body inertial sensors. We first discuss the key research challenges that human activity recognition shares with general pattern recognition and identify those challenges that are specific to human activity recognition. We then describe the concept of an activity recognition chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems. We detail each component of the framework, provide references to related research and introduce the best practise methods developed by the activity recognition research community. We conclude with the educational example problem of recognising different hand gestures from inertial sensors attached to the upper and lower arm. We illustrate how each component of this framework can be implemented for this specific activity recognition problem and demonstrate how different implementations compare and how they impact overall recognition performance.
Article
Full-text available
Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful tool in multimodality research.
Conference Paper
Full-text available
Anvil is a tool for the annotation of audiovisual material con- taining multimodal dialogue. Annotation takes place on freely definable, multiple layers (tracks) by inserting time-anchored elements that hold a number of typed attribute-value pairs. Higher-level elements (suprasegmental) consist of a sequence of elements. Attributes contain symbols or cross-level links to arbitrary other elements. Anvil is highly generic (usable with different annotation schemes), platform-independent, XML- based and fitted with an intuitive graphical user interface. For project integration, Anvil offers the import of speech transcrip- tion and export of text and table data for further statistical pro- cessing.
Conference Paper
Full-text available
Within the context of an endeavor to provide situated support for people with cognitive impairments in the kitchen, we developed and evaluated classifiers for recognizing 11 actions involved in food preparation. Data was collected from 20 lay subjects using four specially designed kitchen utensils incorporating embedded 3-axis accelerometers. Subjects were asked to prepare a mixed salad in our laboratory-based instrumented kitchen environment. Video of each subject’s food preparation activities were independently annotated by three different coders. Several classifiers were trained and tested using these features. With an overall accuracy of 82.9% our investigation demonstrated that a broad set of food preparation actions can be reliably recognized using sensors embedded in kitchen utensils.
Article
Full-text available
This paper describes hardware and software that can be used for the phonetic study of sign languages. The field of sign language phonetics is characterised, and the hardware that is currently in use is described. The paper focuses on the software that was developed to enable the recording of finger and hand movement data, and the additions to the ELAN annotation software that facilitate the further visualisation and analysis of the data.
Article
Full-text available
This paper considers scalable and unobtrusive activity recognition using on-body sensing for context-awareness in wearable computing. Common methods for activity recognition rely on supervised learning requiring substantial amounts of labeled training data. Obtaining accurate and detailed annotations of activities is challenging preventing the applicability of these approaches in real-world settings. This paper proposes new annotation strategies that substantially reduce the required amount of annotation. We explore two learning schemes for activity recognition that effectively leverage such sparsely labeled data together with more easily obtainable unlabeled data. Experimental results on two public datasets indicate that both approaches obtain results close to fully supervised techniques. The proposed methods are robust to the presence of erroneous labels occurring in real world annotation data.
Article
Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant's AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that's what he told IEEE Spectrum in an exclusive Q&A. • Ng's current efforts are focused on his company, Landing AI, which built a platform called LandingLens to help manufacturers improve visual inspection with computer vision. He has also become something of an evangelist for what he calls the data-centric AI movement, which he says can yield “small data” solutions to big issues in AI, including model efficiency, accuracy, and bias.
Article
Human activity recognition (HAR) is one of the most important and challenging problems in the computer vision. It has critical application in wide variety of tasks including gaming, human–robot interaction, rehabilitation, sports, health monitoring, video surveillance, and robotics. HAR is challenging due to the complex posture made by the human and multiple people interaction. Various artefacts that commonly appears in the scene such as illuminations variations, clutter, occlusions, background diversity further adds the complexity to HAR. Sensors for multiple modalities could be used to overcome some of these inherent challenges. Such sensors could include an RGB-D camera, infrared sensors, thermal cameras, inertial sensors, etc. This article introduces a comprehensive review of different multimodal human activity recognition methods where different types of sensors being used along with their analytical approaches and fusion methods. Further, this article presents classification and discussion of existing work within seven rational aspects: (a) what are the applications of HAR; (b) what are the single and multi-modality sensing for HAR; (c) what are different vision based approaches for HAR; (d) what and how wearable sensors based system contributes to the HAR; (e) what are different multimodal HAR methods; (f) how a combination of vision and wearable inertial sensors based system contributes to the HAR; and (g) challenges and future directions in HAR. With a more and comprehensive understanding of multimodal human activity recognition, more research in this direction can be motivated and refined.
Article
Although the last two decades have seen an increasing number of activity recognition applications with wearable devices, there is still a lack of tools specifically designed to support their development. The development of activity recognition algorithms for wearable devices is particularly challenging because of the several requirements that have to be met simultaneously (e.g., low energy consumption, small and lightweight, accurate recognition). Activity recognition applications are usually developed in a series of iterations to annotate sensor data and to analyze, develop and assess the performance of a recognition algorithm. This paper presents the Wearables Development Toolkit, an Integrated Development Environment designed to lower the entrance barrier to the development of activity recognition applications with wearables. It specifically focuses on activity recognition using on-body inertial sensors. The toolkit offers a repository of high-level reusable components and a set of tools with functionality to annotate data, to analyze and develop activity recognition algorithms and to assess their recognition and computational performance. We demonstrate the versatility of the toolkit with three applications and describe how we developed it incrementally based on two user studies.
Chapter
In this paper, we explore the benefits of our next-generation annotation and analysis tool NOVA in the domain of psychotherapy. The NOVA tool has been developed, tested and applied in behaviour studies for several years and psychotherapy sessions offer a great way to expand areas of application into a challenging yet promising field. In such scenarios, interactions with patients are often rated by questionnaires and the therapist’s subjective rating, yet a qualitative analysis of the patient’s non-verbal behaviours can only be performed in a limited way as this is very expensive and time-consuming. A main aspect of NOVA is the possibility of applying semi-supervised active learning where Machine Learning techniques are already used during the annotation process by giving the possibility to pre-label data automatically. Furthermore, NOVA provides therapists with a confidence value of the automatically predicted annotations. This way, also non-ML experts get to understand whether they can trust their ML models for the problem at hand.
Article
A difficulty in human activity recognition (HAR) with wearable sensors is the acquisition of large amounts of annotated data for training models using supervised learning approaches. While collecting raw sensor data has been made easier with advances in mobile sensing and computing, the process of data annotation remains a time-consuming and onerous process. This paper explores active learning as a way to minimize the labor-intensive task of labeling data. We train models with active learning in both offline and online settings with data from 4 publicly available activity recognition datasets and show that it performs comparably to or better than supervised methods while using around 10% of the training data. Moreover, we introduce a method based on conditional mutual information for determining when to stop the active learning process while maximizing recognition performance. This is an important issue that arises in practice when applying active learning to unlabeled datasets.
Article
Working dogs1 are significantly beneficial to society; however, a substantial number of dogs are released from time consuming and expensive training programs because of unsuitability in behavior. Early prediction of successful service dog placement could save time, resources, and funding. Our research focus is to explore whether aspects of canine temperament can be detected from interactions with sensors, and to develop classifiers that correlate sensor data to predict the success (or failure) of assistance dogs in advanced training. In a 2-year longitudinal study, our team tested a cohort of dogs entering advanced training in the Canine Companions for Independence (CCI) Program with 2 instrumented dog toys: a silicone ball and a silicone tug sensor. We then create a logistic model tree classifier to predict service dog success using only 5 features derived from dog-toy interactions. During randomized 10-fold cross validation where 4 of the 40 dogs were kept in an independent test set for each fold, our classifier predicts the dogs' outcomes with 87.5% average accuracy. We assess the reliability of our model by performing the testing routine 10 times over 1.5 years for a single suitable working dog, which predicts that the dog would pass each time. We calculate the resource benefit of identifying dogs who will fail early in their training, and the value for a cohort of 40 dogs using our toys and our methods for prediction is over 70,000.WithCCIs6trainingcenters,annualsavingscouldbeupwardsof70,000. With CCI's 6 training centers, annual savings could be upwards of 5 million per year.
Article
Eye contact is a crucial element of non-verbal communication that signifies interest, attention, and participation in social interactions. As a result, measures of eye contact arise in a variety of applications such as the assessment of the social communication skills of children at risk for developmental disorders such as autism, or the analysis of turn-taking and social roles during group meetings. However, the automated measurement of visual attention during naturalistic social interactions is challenging due to the difficulty of estimating a subject’s looking direction from video. This paper proposes a novel approach to eye contact detection during adult-child social interactions in which the adult wears a point-of-view camera which captures an egocentric view of the child’s behavior. By analyzing the child’s face regions and inferring their head pose we can accurately identify the onset and duration of the child’s looks to their social partner’s eyes. We introduce the Pose-Implicit CNN, a novel deep learning architecture that predicts eye contact while implicitly estimating the head pose. We present a fully automated system for eye contact detection that solves the sub-problems of end-to-end feature learning and pose estimation using deep neural networks. To train our models, we use a dataset comprising 22 hours of 156 play session videos from over 100 children, half of whom are diagnosed with Autism Spectrum Disorder. We report an overall precision of 0.76, recall of 0.80, and an area under the precision-recall curve of 0.79, all of which are significant improvements over existing methods.
Article
Trail surface information is critical in preventing from the mountain accidents such as falls and slips. In this paper, we propose a new mobile crowdsensing system that automatically infers whether trail segments are risky to climb by using sensor data collected from multiple hikers’ smartphones. We extract cyclic gait-based features from walking motion data to train machine learning models, and multiple hikers’ results are then aggregated for robust classification. We evaluate our system with two real-world datasets. First, we collected data from 14 climbers for a mountain trail which includes 13 risky segments. The average accuracy of individuals is approximately 80%, but after clustering the results, our system can accurately identify all the risky segments. We then collected an additional dataset from five climbers in two different mountain trails, which have 10 risky segments in total. Our results show that the model trained in one trail can be used to accurately identify all the risky segments in the other trail, which documents the generalizability of our system.
Conference Paper
Motivated by health applications, eating detection with off-the-shelf devices has been an active area of research. A common approach has been to recognize and model individual intake gestures with wrist-mounted inertial sensors. Despite promising results, this approach is limiting as it requires the sensing device to be worn on the hand performing the intake gesture, which cannot be guaranteed in practice. Through a study with 14 participants comparing eating detection performance when gestural data is recorded with a wrist-mounted device on (1) both hands, (2) only the dominant hand, and (3) only the non-dominant hand, we provide evidence that a larger set of arm and hand movement patterns beyond food intake gestures are predictive of eating activities when L1 or L2 normalization is applied to the data. Our results are supported by the theory of asymmetric bimanual action and contribute to the field of automated dietary monitoring. In particular, it shines light on a new direction for eating activity recognition with consumer wearables in realistic settings.
Conference Paper
We explore using the Outer Ear Interface (OEI) to recognize eating activities. OEI contains a 3D gyroscope and a set of proximity sensors encapsulated in an off-the-shelf earpiece to monitor jaw movement by measuring ear canal deformation. In a laboratory setting with 20 participants, OEI could distinguish eating from other activities, such as walking, talking, and silently reading, with over 90% accuracy (user independent). In a second study, six subjects wore the system for 6 hours each while performing their normal daily activities. OEI correctly classified five minute segments of time as eating or non-eating with 93% accuracy (user dependent).
Article
This paper investigates a new annotation technique that reduces significantly the amount of time to annotate training data for gesture recognition. Conventionally, the annotations comprise the start and end times, and the corresponding labels of gestures in sensor recordings. In this work, we propose a one-time point annotation in which labelers do not have to select the start and end time carefully, but just mark a one-time point within the time a gesture is happening. The technique gives more freedom and reduces significantly the burden for labelers. To make the one-time point annotations applicable, we propose a novel BoundarySearch algorithm to find automatically the correct temporal boundaries of gestures by discovering data patterns around their given one-time point annotations. The corrected annotations are then used to train gesture models. We evaluate the method on three applications from wearable gesture recognition with various gesture classes (10-17 classes) recorded with different sensor modalities. The results show that training on the corrected annotations can achieve performances close to a fully supervised training on clean annotations (lower by just up to 5% F1-score on average). Furthermore, the BoundarySearch algorithm is also evaluated on the ChaLearn 2014 multi-modal gesture recognition challenge recorded with Kinect sensors from computer vision and achieves similar results.
Conference Paper
We experiment with using sensors and a machine learning algorithm to detect and label turns in alpine skiing. Previous work in this area involves data from more sensors and turns are detected using either a physics-based model or custom signal processing algorithm. We recorded accelerometer and gyroscope data using a single sensor placed on a skier's knee. Left and right turns in the data were labeled for use in machine learner. Although skiing data proved to be difficult to label precisely, a classifier trained on 37 labelled examples correctly label all 13 examples from a different test data set with 2 false positives. This method allows for the use of a single sensor and may be generalizable to other applications.
Conference Paper
In Human Activity Recognition (HAR) supervised and semi-supervised training are important tools for devising parametric activity models. For the best modelling performance, typically large amounts of annotated sample data are required. Annotating often represents the bottleneck in the overall modelling process as it usually involves retrospective analysis of experimental ground truth, like video footage. These approaches typically neglect that prospective users of HAR systems are themselves key sources of ground truth for their own activities. We therefore propose an Online Active Learning framework to collect user-provided annotations and to bootstrap personalized human activity models. We evaluate our framework on existing benchmark datasets and demonstrate how it outperforms standard , more naive annotation methods. Furthermore, we enact a user study where participants provide annotations using a mobile app that implements our framework. We show that Online Active Learning is a viable method to bootstrap personalized models especially in live situations without expert supervision.
Article
Behavioral researchers spend considerable amount of time coding video data to systematically extract meaning from subtle human actions and emotions. In this paper, we present Glance, a tool that allows researchers to rapidly query, sample, and analyze large video datasets for behavioral events that are hard to detect automatically. Glance takes advantage of the parallelism available in paid online crowds to interpret natural language queries and then aggregates responses in a summary view of the video data. Glance provides analysts with rapid responses when initially exploring a dataset, and reliable codings when refining an analysis. Our experiments show that Glance can code nearly 50 minutes of video in 5 minutes by recruiting over 60 workers simultaneously, and can get initial feedback to analysts in under 10 seconds for most clips. We present and compare new methods for accurately aggregating the input of multiple workers marking the spans of events in video data, and for measuring the quality of their coding in real-time before a baseline is established by measuring the variance between workers. Glance's rapid responses to natural language queries, feedback regarding question ambiguity and anomalies in the data, and ability to build on prior context in followup queries allow users to have a conversation-like interaction with their data - opening up new possibilities for naturally exploring video data.
Conference Paper
We present a real-time fall detection and activity recognition system (FDAR) that can be easily deployed using Wii Remotes worn on human body. Features extracted from continuous accelerometer data streams are used for training pattern recognition models, then the models are used for detecting falls and recognizing 14 fine grained activities including unknown activities in realtime. An experiment on 12 subjects was conducted to rigorously evaluate the system performance. With the recognition rates as high as 91% precision and recall for 10-fold cross validation and as high as 82% precision and recall for leave one subject out evaluations, the results demonstrated that the development of real-time fall detection and activity recognition systems using low-cost sensors is feasible.
Article
The prime object of this book is to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data accumulated in their own laboratories or available in the literature.
Article
: Preliminary investigations on accelerometer-based activity recognition in construction have shown that it has good potential to be utilized for recognizing categories of work in a construction trade. Selecting the accelerometer locations is an important consideration in activity recognition studies, but currently it is decided primarily on the basis of comfort requirements. This article proposes a methodology for selecting the location of accelerometers using video annotations and decision trees. A video annotation tool is used to track the movement of body segments and decision tree algorithm helps to prioritize the relevant body segments for classifying activities. A two-phase experimental study was conducted to assess the methodology. In the first phase, video annotation studies were carried out on four bricklayers, and based on decision tree analysis three locations: right lower arm, left lower arm, and waist were selected. In the second phase, an activity recognition study was conducted on another group of bricklayers with accelerometers attached at the selected locations. The results of study show that the location of accelerometer has a significant influence on accuracy and the proposed methodology is effective in selecting accelerometer locations. In the current study only bricklaying activity was considered, however, the methodology is generic and has the potential to be applied to objectively evaluate accelerator placement location for a wide range of structured activities.
Conference Paper
Systems that automatically recognize human activities offer the potential of timely, task-relevant information and support. For example, prompting systems can help keep people with cognitive disabilities on track and surveillance systems can warn of activities of concern. Current automatic systems are difficult to deploy because they cannot identify novel activities, and, instead, must be trained in advance to recognize important activities. Identifying and labeling these events is time consuming and thus not suitable for real-time support of already-deployed activity recognition systems. In this paper, we introduce Legion:AR, a system that provides robust, deployable activity recognition by supplementing existing recognition systems with on-demand, real-time activity identification using input from the crowd. Legion:AR uses activity labels collected from crowd workers to train an automatic activity recognition system online to automatically recognize future occurrences. To enable the crowd to keep up with real-time activities, Legion:AR intelligently merges input from multiple workers into a single ordered label set. We validate Legion:AR across multiple domains and crowds and discuss features that allow appropriate privacy and accuracy tradeoffs.
Conference Paper
Sensor-enabled smartphones are opening a new frontier in the development of mobile sensing applications. The recognition of human activities and context from sensor-data using classification models underpins these emerging applications. However, conventional approaches to training classifiers struggle to cope with the diverse user populations routinely found in large-scale popular mobile applications. Differences between users (e.g., age, sex, behavioral patterns, lifestyle) confuse classifiers, which assume everyone is the same. To address this, we propose Community Similarity Networks (CSN), which incorporates inter-person similarity measurements into the classifier training process. Under CSN every user has a unique classifier that is tuned to their own characteristics. CSN exploits crowd-sourced sensor-data to personalize classifiers with data contributed from other similar users. This process is guided by similarity networks that measure different dimensions of inter-person similarity. Our experiments show CSN outperforms existing approaches to classifier training under the presence of population diversity.
Conference Paper
We present ChronoViz, a system to aid annotation, visualization, navigation, and analysis of multimodal time-coded data. Exploiting interactive paper technology, ChronoViz also integrates researcher's paper notes into the composite data set. Researchers can navigate data in multiple ways, taking advantage of synchronized visualizations and annotations. The goal is to decrease the time and effort required to analyze multimodal data by providing direct indexing and flexible mechanisms to control data exploration.
Conference Paper
The eWatch is a wearable sensing, notification, and computing platform built into a wrist watch form factor making it highly available, instantly viewable, ideally located for sensors, and unobtrusive to users. Bluetooth communication provides a wireless link to a cellular phone or stationary computer. eWatch senses light, motion, audio, and temperature and provides visual, audio, and tactile notification. The system provides ample processing capabilities with multiple day battery life enabling realistic user studies. This paper provides the motivation for developing a wearable computing platform, a description of the power aware hardware and software architectures, and results showing how online nearest neighbor classification can identify and recognize a set of frequently visited locations.
Article
Context-aware applications are applications that implicitly take their context of use into account by adapting to changes in a user's activities and environments. No one has more intimate knowledge about these activities and environments than end-users themselves. Currently there is no support for end-users to build context-aware applications for these dynamic settings. To address this issue, we present a CAPpella, a programming by demonstration ContextAware Prototyping environment intended for end-users. Users "program" their desired context-aware behavior (situation and associated action) in situ, without writing any code, by demonstrating it to a CAPpella and by annotating the relevant portions of the demonstration. Using a meeting and medicine-taking scenario, we illustrate how a user can demonstrate different behaviors to a CAPpella. We describe a CAPpella's underlying system to explain how it supports users in building behaviors and present a study of 14 endusers to illustrate its feasibility and usability.
ANOVA: Repeated measures (No. 84). Sage
  • E R Girden
The Wearables Development Toolkit (WDK)
  • J Haladjian
Image and video ground truth labeling
  • Mathworks
Activity recognition and monitoring using multiple sensors on different body positions
  • U Maurer
  • A Smailagic
  • D Siewiorek
  • M Deisher
  • Maurer U.
Regression-based, mistake-driven movement skill estimation in Nordic walking using wearable inertial sensors
  • A Derungs
  • S Soller
  • A Wesihaupl
  • J Bleuel
  • G Berschin
  • O Amft
  • Derungs A.