FIG 2 - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Content may be subject to copyright.
DeepLabCut work-flow. The diagram delineates the work-flow as well as the directory and file structures. The process of the work-flow is color-coded to represent the location of their output. The main steps are opening a python session, importing deeplabcut, creating a project, selecting frames, labeling frames, then training a network. Once trained, this network can be used to apply labels on new videos, or the network can be refined if needed. The process is enabled by interactive GUIs at several key steps, and all lines can be run from a simple terminal interface.
Source publication
Noninvasive behavioral tracking of animals during experiments is crucial to many scientific pursuits. Extracting the poses of animals without using markers is often essential for measuring behavioral effects in biomechanics, genetics, ethology & neuroscience. Yet, extracting detailed poses without markers in dynamically changing backgrounds has bee...
Contexts in source publication
Context 1
... is organized according to the following workflow (Fig 2). The user starts by creating a new project based on a project and username as well as some (initial) videos, which are required to create the training dataset (Step A-B). ...
Context 2
... (numerically) comprises an acceptable MAE depends on many factors (including the size of the tracked body parts, the labeling variability, etc.). Note that the test error can also be larger than the training error due to human variability (in labeling, see Figure 2 in [12]). ...
Context 3
... is organized according to the following workflow (Fig 2). The user starts by creating a new project based on a project and username as well as some (initial) videos, which are required to create the training dataset (Step A-B). ...
Similar publications
Machine learning (ML) and Artificial Intelligence (AI) are increasingly used in energy and engineering systems, but these models must be fair, unbiased, and explainable. In other words, it is crucial to have confidence in AI's trustworthiness. Machine learning techniques, like neural networks, have helped predict important parameters and improve mo...
We introduce CircuitQ, an open-source toolbox for the analysis of superconducting circuits implemented in Python. It features the automated construction of a symbolic Hamiltonian of the input circuit, as well as a dynamic numerical representation of this Hamiltonian with a variable basis choice. Additional features include the estimation of the T1...
Interactive multiobjective optimization methods incorporate preferences from a human decision maker in the optimization process iteratively. This allows the decision maker to focus on a subset of solutions, learn about the underlying trade-offs among the conflicting objective functions in the problem and adjust preferences during the solution proce...
Third-party resources ($e.g.$, samples, backbones, and pre-trained models) are usually involved in the training of deep neural networks, which brings backdoor threats. To facilitate the research and development of more secure training schemes, we propose a Python toolbox that implements representative and advanced backdoor attacks and defenses unde...
We present an introduction to the Quantum Toolbox in Python (QuTiP) in the context of an undergraduate quantum mechanics class and potential senior research projects. QuTiP provides ready-to-use definitions of standard quantum states and operators as well as numerous dynamic solvers and tools for visualization. The quantum systems described here ar...
Citations
... Both of them have achieved relatively high accuracy and achieved partial detection of key points, but there are also false detection cases affected by illumination. The DeepLabCut (DLC) algorithm [23] has been used for the pose estimation of experimental animals (e.g., mice and fruit flies) in high-definition videos by utilizing its advantages of providing a robust model and small sample labeling [24][25][26] . The multi-target tracking accuracy of DLC can reach more than 95%, and running efficiency is also improved. ...
... Animal Pose Estimation from 2D to 3D. Several approaches exist that estimate keypoints in 3D by either computing them from extracted 2D keypoints (Hu et al., 2021;Joska et al., 2021;Kearney et al., 2020;Martinez et al., 2017;Nath et al., 2018;Zhang et al., 2021;Tome et al., 2017) or inferring them directly from 2D images or videos using volumetric convolutional networks Iskakov et al., 2019;Mehta et al., 2017;Pavlakos et al., 2017). Our method falls into the first category. ...
Accurate tracking of the 3D pose of animals from video recordings is critical for many behavioral studies, yet there is a dearth of publicly available datasets that the computer vision community could use for model development. We here introduce the Rodent3D dataset that records animals exploring their environment and/or interacting with each other with multiple cameras and modalities (RGB, depth, thermal infrared). Rodent3D consists of 200 min of multimodal video recordings from up to three thermal and three RGB-D synchronized cameras (approximately 4 million frames). For the task of optimizing estimates of pose sequences provided by existing pose estimation methods, we provide a baseline model called OptiPose. While deep-learned attention mechanisms have been used for pose estimation in the past, with OptiPose, we propose a different way by representing 3D poses as tokens for which deep-learned context models pay attention to both spatial and temporal keypoint patterns. Our experiments show how OptiPose is highly robust to noise and occlusion and can be used to optimize pose sequences provided by state-of-the-art models for animal pose estimation.
... The purpose of this paper is to summarise popular markerless approaches for estimating joint angles, highlighting their strengths and limitations. I focus mainly on 2D applications, since the use of pose estimation for markerless 3D joint angle prediction is still in its infancy (see Nakano et al., 2020;Nath et al., 2019). ...
... Some of these methods even allow videos to be processed in real-time (Cao et al., 2017;Kane et al., 2020). One algorithm that has received particular attention is DeepLabCut (Mathis et al., 2018), which was initially designed for tracking animal behaviour, but can also be used to track human movement in 2D or 3D Nath et al., 2019). These and many other recent studies have demonstrated the potential value of markerless neural network approaches in the field of human movement science (see also Tome et al., 2018). ...
Kinematic analysis is often performed in a lab using optical cameras combined with reflective markers. With the advent of artificial intelligence techniques such as deep neural networks, it is now possible to perform such analyses without markers, making outdoor applications feasible. In this paper I summarise 2D markerless approaches for estimating joint angles, highlighting their strengths and limitations. In computer science, so-called “pose estimation” algorithms have existed for many years. These methods involve training a neural network to detect features (e.g. anatomical landmarks) using a process called supervised learning, which requires “training” images to be manually annotated. Manual labelling has several limitations, including labeller subjectivity, the requirement for anatomical knowledge, and issues related to training data quality and quantity. Neural networks typically require thousands of training examples before they can make accurate predictions, so training datasets are usually labelled by multiple people, each of whom has their own biases, which ultimately affects neural network performance. A recent approach, called transfer learning, involves modifying a model trained to perform a certain task so that it retains some learned features and is then re-trained to perform a new task. This can drastically reduce the required number of training images. Although development is ongoing, existing markerless systems may already be accurate enough for some applications, e.g. coaching or rehabilitation. Accuracy may be further improved by leveraging novel approaches and incorporating realistic physiological constraints, ultimately resulting in low-cost markerless systems that could be deployed both in and outside of the lab.
... We also trained neural networks to analyze string-pulling videos to detect ears, nose, and hands in Black mice. We used the Python based frame work of Deeplabcut toolbox Nath et al., 2018) to train the ResNet50 network for identifying ears, nose, and hands (He et al., 2016). Four separate networks were trained; three for the individual recognition of the ears, nose, and hands and one for the combined recognition of all three in a single step. ...
String-pulling by rodents is a behavior in which animals make rhythmical body, head, and bilateral forearm as well as skilled hand movements to spontaneously reel in a string. Typical analysis includes kinematic assessment of hand movements done by manually annotating frames. Here, we describe a Matlab-based software that allows whole-body motion characterization using optical flow estimation, descriptive statistics, principal component, and independent component analyses as well as temporal measures of Fano factor, entropy, and Higuchi fractal dimension. Based on image-segmentation and heuristic algorithms for object tracking, the software also allows tracking of body, ears, nose, and forehands for estimation of kinematic parameters such as body length, body angle, head roll, head yaw, head pitch, and path and speed of hand movements. The utility of the task and software is demonstrated by characterizing postural and hand kinematic differences in string-pulling behavior of two strains of mice, C57BL/6 and Swiss Webster.
... We also trained neural networks to analyze string-pulling videos to detect ears, nose, and hands in Black mice. We used the Python based frame work of Deeplabcut toolbox Nath et al., 2018) to train the ResNet50 network for identifying ears, nose, and hands (He et al., 2016). Four separate networks were trained; three for the individual recognition of the ears, nose, and hands and one for the combined recognition of all three in a single step. ...
String-pulling by rodents is a behavior in which animals make rhythmical body, head, and bilateral forearm as well as skilled hand movements to spontaneously reel in a string. Typical analysis includes kinematic assessment of hand movements done by manually annotating frames. Here, we describe a Matlab-based software that allows whole-body motion characterization using optical flow estimation, descriptive statistics, principal component, and independent component analyses as well as temporal measures of Fano factor, entropy, and Higuchi fractal dimension. Based on image-segmentation and heuristic algorithms for object tracking, the software also allows tracking of body, ears, nose, and forehands for estimation of kinematic parameters such as body length, body angle, head roll, head yaw, head pitch, and path and speed of hand movements. The utility of the task and software is demonstrated by characterizing postural and hand kinematic differences in string-pulling behavior of two strains of mice, C57BL/6 and Swiss Webster.
... DeepLabCut using the ResNet-50 neural network was trained on the annotated images for 1,030,000 iterations, then used to track the locations of the nose and digits in the full set of video segments. Frames with poor tracking were visually identified by manual inspection of the videos or L and D traces (see below), corrected using DeepLabCut's refinement GUI [43], and the model retrained until satisfactory tracking results were obtained. Sections of tracking that remained poor after refinement (i.e., exhibited large single-frame jumps or jitter) were excluded from analysis. ...
The small first digit (D1) of the mouse’s hand resembles a volar pad, but its thumb-like anatomy suggests ethological importance for manipulating small objects. To explore this possibility, we recorded high-speed close-up video of mice eating seeds and other food items. Analyses of ethograms and automated tracking with DeepLabCut revealed multiple distinct microstructural features of food-handling. First, we found that mice indeed made extensive use of D1 for dexterous manipulations. In particular, mice used D1 to hold food with either of two grip types: a pincer-type grasp, or a “thumb-hold” grip, pressing with D1 from the side. Thumb-holding was preferentially used for handling smaller items, with the smallest items held between the two D1s alone. Second, we observed that mice cycled rapidly between two postural modes while feeding, with the hands positioned either at the mouth (oromanual phase) or resting below (holding phase). Third, we identified two highly stereotyped D1-related movements during feeding, including an extraordinarily fast (~20 ms) “regrip” maneuver, and a fast (~100 ms) “sniff” maneuver. Lastly, in addition to these characteristic simpler movements and postures, we also observed highly complex movements, including rapid D1-assisted rotations of food items and dexterous simultaneous double-gripping of two food fragments. Manipulation behaviors were generally conserved for different food types, and for head-fixed mice. Wild squirrels displayed a similar repertoire of D1-related movements. Our results define, for the mouse, a set of kinematic building-blocks of manual dexterity, and reveal an outsized role for D1 in these actions.
... The method only requires a small amount of manual labelling of image frames, and in the best case, this training process only needs to be performed once. The successfully trained network can then be used to label new videos quickly (45 s for a 10 s video on a standard CPU), and near real-time tracking is also possible with GPU support (Nath et al., 2018). Given the challenges associated with imaging deep water running, it is likely that this approach could easily be modified to analyse kinematics in other human movements and measurement settings, simply by re-training the network using a suitable dataset. ...
... Given the challenges associated with imaging deep water running, it is likely that this approach could easily be modified to analyse kinematics in other human movements and measurement settings, simply by re-training the network using a suitable dataset. Moreover, using additional cameras, this approach could be used to perform 3D analyses (Nath et al., 2018). As stated by Colyer et al. (2018), the development of methods aided by artificial intelligence could revolutionise sports biomechanics and rehabilitation by broadening the applications of motion analysis to training or even competition environments. ...
Kinematic analysis is often performed with a camera system combined with reflective markers placed over bony landmarks. This method is restrictive (and often expensive), and limits the ability to perform analyses outside of the lab. In the present study, we used a markerless deep learning-based method to perform 2D kinematic analysis of deep water running, a task that poses several challenges to image processing methods. A single GoPro camera recorded sagittal plane lower limb motion. A deep neural network was trained using data from 17 individuals, and then used to predict the locations of markers that approximated joint centres. We found that 300–400 labelled images were sufficient to train the network to be able to position joint markers with an accuracy similar to that of a human labeler (mean difference < 3 pixels, around 1 cm). This level of accuracy is sufficient for many 2D applications, such as sports biomechanics, coaching/training, and rehabilitation. The method was sensitive enough to differentiate between closely-spaced running cadences (45–85 strides per minute in increments of 5). We also found high test–retest reliability of mean stride data, with between-session correlation coefficients of 0.90–0.97. Our approach represents a low-cost, adaptable solution for kinematic analysis, and could easily be modified for use in other movements and settings. Using additional cameras, this approach could also be used to perform 3D analyses. The method presented here may have broad applications in different fields, for example by enabling markerless motion analysis to be performed during rehabilitation, training or even competition environments.
A major challenge in human stroke research is interpatient variability in the extent of sensorimotor deficits and determining the time course of recovery following stroke. Although the relationship between the extent of the lesion and the degree of sensorimotor deficits is well established, the factors determining the speed of recovery remain uncertain. To test these experimentally, we created a cortical lesion over the motor cortex using a reproducible approach in four common marmosets, and characterized the time course of recovery by systematically applying several behavioral tests before and up to 8 weeks after creation of the lesion. Evaluation of in-cage behavior and reach-to-grasp movement revealed consistent motor impairments across the animals. In particular, performance in reaching and grasping movements continued to deteriorate until 4 weeks after creation of the lesion. We also found consistent time courses of recovery across animals for in-cage and grasping movements. For example, in all animals, the score for in-cage behaviors showed full recovery at 3 weeks after creation of the lesion, and the performance of grasping movement partially recovered from 4 to 8 weeks. In addition, we observed longer time courses of recovery for reaching movement, which may rely more on cortically initiated control in this species. These results suggest that different recovery speeds for each movement could be influenced by what extent the cortical control is required to properly execute each movement.
Eye tracking and other behavioral measurements collected from patient-participants in their hospital rooms afford a unique opportunity to study natural behavior for basic and clinical translational research. We describe an immersive social and behavioral paradigm implemented in patients undergoing evaluation for surgical treatment of epilepsy, with electrodes implanted in the brain to determine the source of their seizures. Our studies entail collecting eye tracking with other behavioral and psychophysiological measurements from patient-participants during unscripted behavior, including social interactions with clinical staff, friends, and family in the hospital room. This approach affords a unique opportunity to study the neurobiology of natural social behavior, though it requires carefully addressing distinct logistical, technical, and ethical challenges. Collecting neurophysiological data synchronized to behavioral and psychophysiological measures helps us to study the relationship between behavior and physiology. Combining across these rich data sources while participants eat, read, converse with friends and family, etc., enables clinical-translational research aimed at understanding the participants' disorders and clinician-patient interactions, as well as basic research into natural, real-world behavior. We discuss data acquisition, quality control, annotation, and analysis pipelines that are required for our studies. We also discuss the clinical, logistical, and ethical and privacy considerations critical to working in the hospital setting.
The body measurement of livestock is an important task in precision livestock farming. To reduce the cost of manual measurement, an increasing number of studies have proposed non-contact body measurement methods using depth cameras. However, these methods only use 3D data to construct geometric features for body measurements , which is prone to error on incomplete and noisy point clouds. This paper introduces a 2D-3D fusion body measurement method, developed in order to exploit the potential of raw scanned data including high-resolution RGB images and 3D spatial information. The keypoints for body measurement are detected on RGB images with a deep learning model. Then these keypoints are projected onto the surface of livestock point clouds by utilizing the intrinsic parameters of the camera. Combining the process of interpolation and the pose normalization method, 9 body measurements of cattle and 5 body measurements of pig (including body lengths, body widths, body heights, and heart girth) are measured. To verify the feasibility of this method, the experiments are performed on 103 cattle data and 13 pig data. Compared with manual measurements, the MAPEs (mean absolute percentage errors) of 5 cattle body measurements and 1 pig body measurement are reduced to less than 10%. Body widths are more susceptible to non-standard posture. The MAPEs of 2 cattle body widths are larger than 20% and the MAPE of 1 pig body width reaches 30%. In comparison with a previous girth measurement method, the presented method is more accurate and robust for the cattle dataset. The same approach can be adapted and implemented for non-contact body measurement for different livestock species.
Keywords: Precision livestock farming; 2D-3D fusion; Deep learning; Point cloud;