Science topic

Activity Recognition - Science topic

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions.
Questions related to Activity Recognition
  • asked a question related to Activity Recognition
Question
7 answers
The project I'm currently working on aims to create a deep learning model for Human Activity Recognition. I'm focusing on system design and implementation. Could someone please help me by sharing some papers or document links to better understand system design and implementation?
Thank you in advance for your assistance.
Relevant answer
Answer
means the design of the model, and how you train and validate your model, and then test it. it also includes the data preprocessing steps and feature engineering.
  • asked a question related to Activity Recognition
Question
2 answers
PKU MMD and NTU RGBD are large skeleton datasets widely used in human action recognition. There are plenty of codes available to process the NTU RGBD dataset. But i cannot find any code available to process PKU MMD data set and convert it to the same format as in NTU RGBD. if anybody knows the preprocessing steps and code to preprocess PKU MMD, please share it.
Thank you.
Relevant answer
Answer
The PKU-MMD dataset is a large skeleton-based action detection dataset. It contains 1076 long untrimmed video sequences performed by 66 subjects in three camera views. 51 action categories are annotated, resulting almost 20,000 action instances and 5.4 million frames in total. Similar to NTU RGB+D, there are also two recommended evaluate protocols, i.e. cross-subject and cross-view.
Regards,
Shafagat
  • asked a question related to Activity Recognition
Question
4 answers
hi community;
I created a model for HAR (human activity recognition ) , and I executed it on this dataset
and now I want to try it on another dataset; with a similar format
any helps?
Relevant answer
Answer
data set i found :
WISDM
PAMAP2
UniMib shar
usc had
  • asked a question related to Activity Recognition
Question
5 answers
  1. HAR dataset - human activity recognition I have used.
  2. What is the impact of activity recognition if the dataset is having small number of features?
Relevant answer
Answer
You may encounter over-fitting which may impede accurate identification of unseen features.
  • asked a question related to Activity Recognition
Question
2 answers
  1. I have worked on HAR and HAPT UCI datasets for human activity recognition which has 561 features.
  2. But with a huge dataset of low dimensional space (2 or 3 dimension), can we do activity recognition and what will be its impact?
Relevant answer
  • asked a question related to Activity Recognition
Question
5 answers
This was first published almost 10 years ago.
Now almost a decade later, has there been any progress? Chronic workplace inactivity has been a pandemic in developed societies for much longer than a decade. The healthcare and productivity costs of workplace inactivity are all increasingly well documented. Unfortunately, this sentence from 2012 probably still applies: "Employers often provide break time and specific areas for smoking, yet to do this for exercise may be considered distracting, counterproductive, and/or too expensive." .
Thank you for considering this discussion.
Relevant answer
Answer
Donald E. Watenpaugh Such a nice concern about working communities or individuals. Everybody have information about obesity but sitting for long can lead to muscle atrophy that observed in long bed rest patients or the astronauts. While sitting, anti-gravity bones and muscles relaxed and gravity influences decreases, similar phenomenon observed in space. As per the literature, bad lifestyle decreases bone density by 10% per 3-6 months. 6-9 hour sitting is playing havoc with workers life. Govt or company should provide compulsory walk or stretching sessions.
  • asked a question related to Activity Recognition
Question
7 answers
I'm working with an accelerometer-based dataset capturing accelerations from the human thigh. Most HAR processes will have window segmentation to break down the signal into samples, and pre-processing of the signals (i.e filtering) prior to feature extraction. By doing both processes on the same signal from my dataset and subtracting the difference, I've found that the resulting output signal is different depending on the order these two operations are carried out.
Which of these processes should be carried out first? In the literature, I have seen several instances of either approach being taken.
Relevant answer
Answer
I had to perform segmentation of human speech for the purpose of my PhD. In my experience, you're better off to keep your original signal/data, and segment it prior to other processing : your decision will be cleaner, and reversible at the same time.
My own real time segmentation was based on the detection of biological events (glottal closure) comparing two sliding models, one short, one long, and using a cumulated sum of Kullback-Leibler divergence estimates, looking for convexity jumps in this criterion (D2f(t) = f(t+1) - 2f(t) + f(t-1) estimates the second order derivative of signal f...).
Get to know the semantics of what you are looking for, the the models will fall in place smoothly
Hope this helps
Ref: my IEEE Journal article on real time speech segmentation and coding, posted on researchgate
  • asked a question related to Activity Recognition
Question
3 answers
I am looking for Dataset regarding PARM.. someone can share links or Email me on
Relevant answer
Answer
  • asked a question related to Activity Recognition
Question
6 answers
People with Dementia (PwD) have difficulty living their daily lives. And to help PwD, the caregiver is one of several solutions. However, caregivers also have many challenges in helping PwD. Because the memory and thinking decreased dramatically, PwD usually has many symptoms such as Agitation and Anxiety, repetitive questions, depression, hallucinations, sleep disturbances, etc. which make PwD refuse to be helped by caregivers. Therefore, approaches or methods that can help caregivers are needed so that their efforts to support PwD are successful. I have read "humanitude" which is one of the most successful methods. But are there other methods you might know about? Please share. Thank you.
Relevant answer
Answer
See: Johnson, C and Johnson, R. (2000) Alzheimer's disease as a trip back in time. American Journal of Alzheimer's Disease April or the short British Alzheimer's Society one page article. for an explanation of the time Travel model of AD. The updated version with more positive language in in the 2017 Behavioral Science journal article.
  • asked a question related to Activity Recognition
Question
4 answers
I am working on activity recognition using wearable sensor data. Actually, I am confused to correctly specify the window size for my activities. Here, I am considering a sliding window technique for my work.
The accurate window size plays a vital role in the detection of activity; it affects the features, and whenever any features get affected, it directly hinders the performance of a classifier. I am working on four activities (Ac1, Ac2, Ac3, and Ac4), which are totally different in nature. In the AC1, the average person’s time is at least12 s, the maximum being 20 s, to complete one cycle of AC1. On the other hand, AC2 and AC3 activities are not regular activities compared to AC1. User lasts for 4 to 6 seconds to complete one circle of these two activities. In the Ac4, the average person time is 10 second to complete the activity.
So, my question is what should be my window size for this kind of activities to correctly process? A reply would be greatly appreciated.
Relevant answer
Answer
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Activity Recognition
Question
5 answers
I am currently investigating means to assess human interaction interest on a mobile robotic platform before approaching said humans.
I already found several sources which mainly focus on movement trajectories and body poses.
Do you know of any other observable features which could be used for this?
Could you point me to relevant literature in this field?
Thanks in advance,
Martin
Relevant answer
Answer
you can read this research I hope it will be useful for you and good luck
  • asked a question related to Activity Recognition
Question
3 answers
I have collected the data from different exercise activities by using accelerometer sensor through MATLAB mobile sensor support application. I faced a problem when I process the data, which as follows:
  • Sensor recording are not synchronized and the time range is not accurate either. Sample rate of sensor is not exactly 50Hz and not consistent.
To make all the sensor data synchronized at the exact same sample rate, we interpolated and re-sampled the data. My Question is:
Is there any side-effect of this interpolation and re-sampling on data?
Relevant answer
Answer
After the long research and discussion, I have found this:
Some loss of data can occur, based on weighted average dynamics - if the weighted average points are not of a representative quantity.
Whilst the average of a entirely symmetrical and perfectly balanced set, indeed, would be partitioned correctly - You’re going to incur some small loss of information - depending on how spread out the points are.
You can counter-act that, with just increasing the interpolation points - But at that point - you’re incurring more costs of processing.
So, i guess the “side effect”, would mostly be a function of Interpolation points contra Accuracy contra parsing speed.
Another point of contention - that could arise - is representation of Co-ordinate points based on Dimensions.
Lower dimension points may inaccurately portray the dimensionality of the problematique - in so far as not being entirely accurate.
But in so far, as inferrence analytics goes and finding the average - sullying out averages from arbitrated interpolation points - enacts as a information strengthener - not a dampener.
Because it fullfills the process of Fisher Metrics of Aggregation - shuffling and re-sampling and running the process.
This is basically how most aggregate Simulation processes are meant to play out - like Monte Carlo and others.
You’re basically meant to aggregate to infinity and “parse in information as you go along” - so, shuffling and updating/resampling - just gives you more accuracy in so far as utilizing Bootstrapping (averaging runs).
So - i guess, the side-effect, would be increased accuracy of Sampling - as you’re better fullfilling the conditional of truly arbitrary sampling (in so far as to account for more information of the collective Distribution of the entire Metric space being analyzed).
But - other than that, you’re not really “losing” anything - in that sense.
Just adding to the process with more power and more data - takes more time.
At least, this is to my understanding of the problematique.
  • asked a question related to Activity Recognition
Question
8 answers
I would like to know the standard technique to give a performance value for a person affected with ASD as to compare to the normal.
Relevant answer
Answer
I like your studies.
In my Thar desert, Pakistani culture Camel milk a treatment for ASD (Autistic Spectrum Disorder). Always hope for the best.
Mujeeb
  • asked a question related to Activity Recognition
Question
1 answer
Hello,
I will need to implement Transport Mode Detection (TMD) on a smartphone accelerometers, gps, etc. in order to detect if the user is traveling on foot, on bicycle, by car, by bus, etc. Here are my two questions:
1) Is there some public data available to train and benchmark algorithms for this task?
2) Is there some commercially usable libraries/services implementing TMD? Something under MIT licence would be great but commercial solutions could also work.
Thanks for your time,
Bruno
Relevant answer
Answer
I found the Sussex-Huawei Locomotion Dataset
  • asked a question related to Activity Recognition
Question
5 answers
Hi, I am doing human activity recognition. In my task, feature scaling gives low accuracy, rather than keeping original feature values. But my feature values are not in either [-1 1] or [0 1]. So, why do I get low accuracy after feature scaling?
Relevant answer
Answer
For Scaling I would say,
Pros:
-Better performance i algorithms like SVMs and Neural networks
-faster conversions
Cons
-you may forget to look for the true coef. in regressions
-won't always guarantee better results
  • asked a question related to Activity Recognition
Question
6 answers
I am choosing between RNN and CNN to train an AI model, for a Video Images Human Activities Recognition System. Which of those (RNN, CNN) should be used?
Relevant answer
Answer
CNN
  • asked a question related to Activity Recognition
Question
3 answers
I am a research student working in the field of human action recognition. I need help or guidance in extracting skeleton joints and plotting them on images of following dataset
  • I need your help in extracting and plotting joints point (i.e. skeleton_pos.txt) in depth or RGB images of SBU Kinect Interaction Dataset.
  • Plotting skeleton joints using text files (a01_s01_e01_realworld & a01_s01_e01_screen) Mendeley Data - KARD - Kinect Activity Recognition Dataset.
  • I have used Skeleton visualization codes for Cornell Activity Datasets: CAD-60 & CAD-120
Relevant answer
Answer
Respected Mr. Abdessamad Tafraouti,
I use your suggested code but unfortunately It cannot work. Actually text files ( skeleton_pos.txt, a01_s01_e01_realworld & a01_s01_e01_screen ) contains joints locations but i am unable to use them.
  • asked a question related to Activity Recognition
Question
6 answers
Hi, I am trying to do Activity Recognition with a labelled dataset containing data coming from an accelerometer, 30 binary sensors and proximity beacons.
A row example from the dataset would be:
x, y, z, s1, s2, s3, s4, s5, b1, b2
where x,y,x are continuous values coming from the accelerometer, s1,..,s5 are sensor with values 1 or 0 and b1, b2 are proximity beacons represented by their rssi value.
My biggest question is: how to use all this data together?
I tried:
  • a cnn using only x, y, z
  • a cnn using the sensors data
But I was wondering if it was possible to do something more complex considering the different sources of data.
Relevant answer
Answer
Yes Doug P L Hunt it's exactly what I have done. And the time of the day feature improved the results of 20%!
  • asked a question related to Activity Recognition
Question
8 answers
Can anyone point me to an algorithm or a model that can detect body movement from the accelerometer data on a wristband.
Relevant answer
Answer
I suspect that your question is too general. My wrist movement is somewhat independent of my body movement. For example, I may be playing a guitar and so my wrist will be moving in a certain manner. However, while playing the guitar I may be either, sitting down, standing up, walking or dancing to the music that I am playing. From this you can see that my wrist movements do not define my body movements and so one can not be inferred from the other.
Another situation where this "confusion" occurs can be manufactured if you have a chair that rotates. You can then sit in the chair with your wrist still while a friend quickly rotates the chair and (if you don't get dizzy) then if you look at the reading from your wrist sensor you will see that it seems to indicate that the wrist is moving (and it is but so is your whole body) and so you can not assume that a movement registered at the wrist only comes from the wrist.
  • asked a question related to Activity Recognition
Question
5 answers
I use a dataset about activities that an old person was doing during a year. it has features of start time, end time and activity name as below:
08:52:12 - 08:55:38 - Washing hand/face
08:57:36 - 09:05:53 - Make coffee
09:07:38 - 09:12:52 - Washing hand/face
09:13:57 - 09:21:10 - Make sandwich
09:23:08 - 09:43:11 - Eating
..
I want to insert abnormal situations in which an activity lasts longer than usual or increase frequencies of doing an activity during a day.
i'm programming in python. what should i do?
if i want to insert an abnormal record, should i change the time of all of records that came after that record?
Relevant answer
Answer
Insert records of activities with start time and NULL value for end time
  • asked a question related to Activity Recognition
Question
2 answers
The frame activity is used for many applications to separate signals, detection and so on.
Relevant answer
Answer
Dear Ali,
Please go through some of the referred papers given below, which might help you in your research.
1. Adiga, M. T., & Bhandarkar, R. (2016, October). Improving single frequency filtering based Voice Activity Detection (VAD) using spectral subtraction based noise cancellation. In Signal Processing, Communication, Power and Embedded System (SCOPES), 2016 International Conference on (pp. 18-23). IEEE.
2. Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE signal processing letters, 6(1), 1-3.
3. Sohn, J., & Sung, W. (1998, May). A voice activity detector employing soft decision based noise spectrum adaptation. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on (Vol. 1, pp. 365-368). IEEE.
Thanks,
Sobhan
  • asked a question related to Activity Recognition
Question
4 answers
UCI Human Activity Recognition (HAR) Data set is easily available on internet as well as on kaggle if someone had worked on it then do let know.
  • asked a question related to Activity Recognition
Question
4 answers
I am trying to use the HAR dataset (see attached link) to test my activity recognition algorithm. However, before using it to test my own method, I am trying to make sense of the data, and understand the features that are extracted.
As a simple test I tried to reproduce them, starting with the mean. I calculated the arithmetic mean of the first row of values in the data/train/Inertial Signals/body_acc_x_train.txt file. If I understand the explanation correctly, this should be the first value of the first line of data/train/X_train.txt However when computing the mean, I obtained 0.00226869, whereas the value in the X_train.txt file is 0.28858
This same discrepancy occurs for the y and z values. If I omit the division by 128 (number of samples in a window) then the value is nearer (at least of the same order of magnitude), but still further off than floating point errors should account for (just to be sure, I used the bigfloat package in my Python code with a precision of 15 to ensure the rounding errors were not the problem on my side.
I understand this is a rather niche question and sent it to the admin of the data set, unfortunately, he's out of office until the end of August, so thought I'd ask here in case someone has experience with using this dataset.
Relevant answer
Answer
I didn't manage to figure it out and dropped the issue. The admin of the dataset did answer eventually, but his answer didn't really help me make sense of it, and I had already stopped using it. I suggest trying to get in touch with him, perhaps he can help you.
  • asked a question related to Activity Recognition
Question
6 answers
i want to recognize human activities in multiple camera environment. I am taking 2 camera views for experiment. I want to fuse information extracted both views to get a precise feature vector. But i am facing following confusions:
As  i am using supervised learning, I have to label activities of each person in each frame. In first camera view at some time t, two persons are observed as very close to each other so i label it as interacting. But in second camera view at same time t,it seems that those persons are not close to each other i.e distance is high.. so i labeled it as non- interacting.
How can i fuse two feature vectors (from 2 views) having 2 different labels at same time?
Relevant answer
Answer
thank you Samer for clarification. Now my confusion is resolved
  • asked a question related to Activity Recognition
Question
2 answers
I'm doing some research using smartphones to help control something. I cannot explain the difference between detect a gesture (moving left and then move right) with user's activity (for simple, call moving hand left and right).
Relevant answer
Answer
hi
i think gesture recognition is focus on special part of body like hand or head .
in gesture recognition, we dont need to follow all of the body parts , we know gestures
and know that which part of body contains in this gesture .
but in activity detection, it is posible someone might do it differently than I do.
in fact when to use gestures to command , should do it right and i or you should do it almost the same . but in activity we do not have to do alike.
  • asked a question related to Activity Recognition
Question
5 answers
Hello everyone,
Does anyone know if I can use video sequences from movies in my research? More specifically, is it legal to trim video segments from movies and run computer vision algorithms on them or would I have copyright issues when I am going to present my experimental results in a international conference? Is there a company that could grants permits for digital media? Does the same law applies for EU projects?
Relevant answer
Answer
I plan to analyze very specific video segments from very specific movies that some are available online, while others are not. Some of the movies that are on my list are: Metropolis (1927), Blade Runner (1982), Brazil (1985) and The Grand Budapest Hotel (2014).
  • asked a question related to Activity Recognition
Question
5 answers
I'm attempting an analysis on sampling rates and window sizes of accelerometer for human activity recognition (HAR). I'm looking for a good test for statistical significance. My data can't fulfill the sphericity assumption for the repeated measures two way ANOVA. As for the Friedman test, I have replicated 10-fold data which means it's not "unreplicated block data". Are there any alternatives I should look at? Or perhaps some way to adjust the dataset to fulfill the requirements of either test?
Thanks.
Relevant answer
Answer
These methods are somehow old. Now linear mixed effects models offer a wider range of error/random effect correlation patterns and fits data very well. I suggest it.
  • asked a question related to Activity Recognition
Question
2 answers
I'm doing recognition of activities from RF-signals which is device-free and I'm using USRP device.So what kind of activities do you think is suitable to measure with,because during the experiments they are many interference and noises.
Relevant answer
Answer
Could you be a bit more specific on the activities and how the measurements relate to the stress recognition ?
I ask this, because it is a bit complicated to answer your question without some further insight.
kind regards,
Marian
  • asked a question related to Activity Recognition
Question
4 answers
I'm working on the topic of analyzing important information for motion representation. For example you have a dataset of human typical activities as: walking, running, climbing, etc and you are asked to represent each class (type) with as sufficient information as possible. 
in other words, we want to find prototypes for each activity (ex: like in clustering). Knowing that each motion is consist of body features like movements of legs, hands, head, etc , Then do you think it'd be a practical idea if:
1- we consider all the features together and combine them like what we do in PCA or considering the total distance of an activity to others (like in clustering)?
or
2- considering features separately and find the features which describe each activity more efficiently. For example: considering leg movements for walking or running while hand movements in punching or handshake?
From my point of view, the first one is computationally efficient, while the latter might give more precise/semantic representation for each prototype of activity.
However i'm so interested to hear from experienced people in motion analysis about which way the prefer to handle the problem and why?
Relevant answer
Answer
Babak,
This is definitely an interesting question.  I am also unsure of what exactly the goal of your analysis would be, however, if you are interested in getting a sense for a prototype movement, rather than actual live movements, then I would suggest perhaps thinking about it from more of an artist/animator perspective.  What would be the minimum amount you would need to "draw" in order to show climbing?  or running?  or walking?  I think a animated 2D stick figure could easily represent all of these quite well.  This would make your task much easier in terms of the feature set size (degrees of freedom) and the computational complexity.
It would probably be computationally tractable to run a massive PCA/SVM/clustering operation on a huge set of animated 2D stick figures.  Of course, you would have to create the figures somehow...
Hope that is helpful.
Sean
  • asked a question related to Activity Recognition
Question
4 answers
We are working on the problem of outdoor activity recognition, for this purpose, we need  to test our approaches using a dataset that contains users' mobility traces, we need both  the continuous GPS recordings and the visited places. 
Relevant answer
Answer
Hi Mehdi. I was the chair of the ECML 2015 Discovery Challenge. We published the dataset of the competition in the UCI repository. It has gps traces of the taxi busy services for a period of one year. It has no labelled visited places...but you can easily label some of them (such as downtown, train station or airport) with a simple haversine distance function. The granulairty of the dataset is 15 seconds.
I hope that this helps.
Best,
Luis
  • asked a question related to Activity Recognition
Question
9 answers
I have a tracker that outputs the trajectory (x,y,z) of an object  (e.g., a can).
I want to use these trajectories to train a classifier (i.e., SVM) in order to infer the activity that the person manipulating the object is performing (e.g., drinking from a can or pouring from a can )
Which kind of features should I use to quantize these trajectories?
Relevant answer
Answer
Probably it also depends on the sensing interfase you going to use. For instance, you can partitioning a video area into a grid, then using Markov random fields or some spectral method for graphs. I hope this help...
  • asked a question related to Activity Recognition
Question
2 answers
Assuming:
   -  the number of users is unknown at run time.
   -  in all group activities users perform similar actions.
Relevant answer
Answer
HMM's are meant exclusively where sequence of events or states define your problem. hence represent each activity as a state and sequence of these states into activity. This helps
  • asked a question related to Activity Recognition
Question
4 answers
I'm using Quality Function Deployment (QFD) to perform network (Access Point) prioritisation and selection, semantically (context aware). Any ideas for advancement, extension or comparison?
Relevant answer
Quality function deployment (QFD) is a structured method to extract quantitative parameters from qualitative user requirements. It helps to prioritise the parameters to produce justified decisions. In this case, the network selection.
Whereas, QoS is a quantitatively measured performance of network. Specifically, the performance perceived by the users of the network.
  • asked a question related to Activity Recognition
Question
5 answers
Currently, we have acquired video data of human actions performing martial arts movements. We want to segment the video frames into different actions (sequentially). Can anyone suggest what the best method so far is for this problem? Some good links are also welcomed. Thank you.
Relevant answer
Answer
  • asked a question related to Activity Recognition
Question
4 answers
Is there a comprehensive taxonomy that can explain the state of the art of current abnormal events detection techniques from video?
Relevant answer
Answer
Hello. I imagine that it depends what you mean by abnormal event. From my point of view, this type of problem takes two forms. The first is an image processing problem where you look in a single frame for an object with a certain feature. This can be reduced to a parameter estimation problem. The second way is to view the problem as one of change detection, i.e., the abnormal event corresponds to a change in the dynamics of the underlying system (think of you car engine when something inside it breaks). There is a large literature on change detection, which can be approached in many ways. My experience of this type of problem is from the area of manoeuvring target tracking. Hopefully you can find some useful references for your work as quite a lot has been done in this area (see link), although the sensor is usually a radar system rather than a video sequence.
  • asked a question related to Activity Recognition
Question
3 answers
Are there articles on how religious norms can be subject to mutual recognition (as in goods and services) in religious marriage contracts?
Relevant answer
Answer
In Judaism there is often, especially in Orthodox weddings, a document that is in addition to the Ketubah, the marriage contract.  This document is called Tanayim, conditions, and it often contains religious aspects, such as the foods in the home, religious observance, training of children, etc. 
  • asked a question related to Activity Recognition
Question
6 answers
I have tried to biotinylate some Fab fragments but I am afraid the Biotinylation is happening also at the recognition site, thus impeding the binding of my target peptides.
I would like to have some advices, suggestions, how to deal with this problem.
Relevant answer
Answer
I use a biotin/ antibody ratio 4/1 (Mol to mol). Using this ratio is very unlikely that you see any alteration in antigen recognition. A ratio higher tan that (i.e. 6/1) frequently gives problems
  • asked a question related to Activity Recognition
Question
5 answers
I am looking for literature on how to carry out research on human activity recognition.  
Relevant answer
Answer
this one is also related to your quastion
  • asked a question related to Activity Recognition
Question
3 answers
Real time Activity Recognition using a tri-axial accelerometer
Using only an accelerometer and limited data storage capacity I would like to be able to determine what activity is being performed in real time. The focus for this activity recognition is animal activity.
Relevant answer
Answer
Do you want to classify the activity (running, jumping, sleeping) or just to detect any movement?
  • asked a question related to Activity Recognition
Question
4 answers
I want to know the last techniques of classification in images and videos to enhance the precision of classification.
Relevant answer
Answer
Dear Hussein,
There is no classifier which is always the best.
In the very cited paper
[
Poppe, R. (2010). A survey on vision-based human action recognition. Image and vision computing, 28(6), 976-990.‏
It can be uploaded from
]
you can find there (in chapter 3) a summary of many machine learning methods that are used for human action recognition.
You can see there methods form different categories, e.g.:
Discriminative classifiers such as SVM, RVM and AdaBoost
Nearest neighbor methods,
Discriminative models such as HMM and CRF
and many other kinds of ML methods.
There are also specific papers that describe a use of a certain ML method, such as:
HMM:
Yamato, J., Ohya, J., & Ishii, K. (1992, June). Recognizing human action in time-sequential images using hidden markov model. In Computer Vision and Pattern Recognition, 1992. Proceedings CVPR'92., 1992 IEEE Computer Society Conference on (pp. 379-385). IEEE.‏
SVM:
Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing human actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on (Vol. 3, pp. 32-36). IEEE.‏
Yaakov
  • asked a question related to Activity Recognition
Question
5 answers
I am working on Physical Activity Recognition using data acquired from smartphone sensors (gyroscope, magnetometer and accelerometer) and would like to compare different classifier performance on the dataset, but wondering which evaluation matrix would be best to use: True Positive Rate (TPR), False Positive Rate (FPR), Precision, Recall, F-score or overall classification accuracy? This is a six class problem (6 different activities).
Relevant answer
Answer
Rajeev.
TPR and FPR separately do not tell you anything. The same goes for Precision and Recall. F-score, Equal Error Rate and similar can give you some information, but it is limited and you have to be careful how you use those values.
Look at this previous question:
I'll copy part of my answer for that question here:
You have to first answer a simple question: Do You know what are the costs of the decisions?
If the costs are known, you can calibrate your classifiers (set the operating point), and compare costs of the classifiers.
If you do not know the costs, you may have some insight about a reasonable range of operating points of the target application. Lets say the target application is a surveillance system, and all positive responses will have to be shown to a human operator which is able to process one event per minute. In such case, it makes sense to compare detection rates at one false positive per minute. You can use other measures instead, but it makes sense to keep one type of error fixed and compare the other one.
If you have no knowledge about the possible operating point in the target application, you should show results in full range of possible operating points. Plot ROC, precision-recall, DET or something similar. You can also compute area under ROC or Precision-recall and get a numeric value which reflects performance over the whole range of regimes; however, such aggregation has its own problems.
  • asked a question related to Activity Recognition
Question
8 answers
I would like to reduce sequences of about 50 RGB frames of 640x480 pixels to get a representation of the data I could feed into an deep neural network. The goal is to recognize a performed activity/gesture.
I have seen many examples for individual images with static gestures but I struggle to find practical examples using whole sequences with a dynamic gesture.
I have worked through the tutorial here* and they use the MNIST dataset to train the network. So their input are images of the size 28x28 pixel. I would like to use my data as input but I don't really know how to reduce and how much reduction is enough/necessary.
What I did until now is remove the background and then perform edge detection using the openCV Canny edges algorithm** which works fine but still leaves me with a lot of data.
I tried using image flow to generate something like a heatmap but I am not very happy with the results. I read about DTW or Space Time Shapes, but have not yet found a way to apply the theory.
So, do you have any hints, tips or links to papers, tutorials, presentations or whatever to help me reducing the video sequences without loosing to much data? I would prefer practical examples.
Thank you!
Relevant answer
Answer
Seems like an interesting problem, but I don't think that a deep neural net is what you really want to use. However, here are some bibliography links I have on the subject of Spatial-Temporal analysis for Gesture Interpretations:
You can also use some of these datasets to train:
Another thing you might want to look into is the area of Content-Based Video Retrieval, that deals with the same problem of reducing image sets into features that could be indexed and later be retrieved. While not a lot of cohesive literature around, here is a paper I like to use as introduction:
Hopefully this helps!
  • asked a question related to Activity Recognition
Question
3 answers
Im trying to work for gait analysis of humans.
Relevant answer
Answer
Thank you for the links. How about using grab-cut for detecting humans and graph-cut method for detecting and extraction of sihouettes
  • asked a question related to Activity Recognition
Question
2 answers
Does anyone have experience in using Microsoft's face tracking SDK which utilizes AAM's? Any comments on robustness?
Relevant answer
Answer
I have not worked with Microsoft SDK for face tracking but the SDK available for skelton tracking, Depth tracking they are quite robust in nature which I am using it in my project
  • asked a question related to Activity Recognition
Question
10 answers
I am interested in modelling human activities using sensor data with HMMs and would like to incorporate prior knowledge during inference. The normal procedure is to model K different activities with K separate HMMs. To test an unknown sequence, compute its likelihood from each of the HMMs and the HMM with maximum value is assigned as the class label. This is all done under the assumption that priors over HMM are uniform.
A problem can arise when one of the class is rare or unusual and its prior probability may be very low in comparison to other classes and therefore the uniform priors may not be a good assumption. Therefore, I am interested in posterior probability and not just the likelihood to capture the combined effect. My observations are continuous (features extracted from sensor) and not discrete values. My questions are:
1. Can inference be done using a bayesian network type approach that include multiplication of prior with likelihood?
2. In my case the prior will be the count of activities available per HMM. Can that be estimated using a dirichlet prior to avoid zero-count problem for rare class (assuming I approximate an HMM for a rare class). Does that make sense?
3. The multivariate observation data is approximated using single gaussian (and not mixtures), in that case likelihood will be gaussian?, can it be mixed with dirichlet prior to compute posterior probability? or the likelihood is still multinomial as it represents K different outcomes from K different HMMs?
Sorry if I have mixed with some of the basic concepts, I am new and I seek guidance to move further.
Relevant answer
Answer
OK I will look at it
  • asked a question related to Activity Recognition
Question
8 answers
Considering that video recordings took place in a home based environment. Do you believe that skin detection could actually segment human persons from background objects with similar colour (e.g. a skin colored closet, or table)? Could anyone recommend a relevant publication?
Relevant answer
Answer
I dont think that skin detection going to play a major role in human action recognition..... because skin detection can only be used like in face detection like or face tracking options.... but it cant be efficiently used in action recognition......... If u r using skin detection for FG/BG extraction then its fine, but u will only extract the human being as a FG but no action recognition using skin detection in my opinion...
  • asked a question related to Activity Recognition
Question
19 answers
I have noticed that someone can find a lot of work on Human activity recognition, but just a few ones focus on human activity detection problem (also referred in literature as activity localization or action spotting). This renders human activity recognition useless for real-life applications, as most videos are unsegmented and cannot be annotated as global entities that contain just one action. Do you have any suggestions - ideas concerning how this problem might be solved?
Relevant answer
Answer
Konstantinos,
I agree that most work on human activity has assumed pre-segmented data and addressed only the recognition part of the task. There are exceptions, however. To toot our own horn, at Colorado State University (as part of DARPAs Mind's Eye program) we tackle both detection and recognition together and as an unsupervised learning problem. It has been hard so far to get the whole system published (but see O'Hara & Draper CVPR 2012 & O'Hara & Draper WACV 2013 for pieces). The system is keyed by motion: moving objects are tracked, and a sliding window cuts video snippets from the tracks of moving objects. At training time, the snippets are clustered to form "activities". At performance time, activities in new tracks are labeled according to these activities. The goal is take a pool of unlabeled videos, learn what activities occur multiple times, and then be able to spot those activities in novel (unlabeled) videos.
  • asked a question related to Activity Recognition
Question
6 answers
I am looking for activity recognition data sets that are captures through sensors (i.e. accelerometer, gyroscope etc) or using a smartphone. Most of the publicly available data sets contain data from normal activities of daily living (e.g. walking, running, cycling etc), however I am interested in data sets that shall also contain data from unusual/abnormal activities such as fall or stroke apart from normal activities. Currently I am using DLR Human Activity Recognition data set, and looking for some other similar data sets. I would appreciate if you would please direct me to any such data you are aware of.
Relevant answer
Answer
Not yet. But currently we are working on acquiring an official permit.
You can also check our project web site :