ArticlePDF Available

A New Approach to Estimate Concentration Levels with Filtered Neural Nets for Online Learning


Abstract and Figures

The COVID-19 pandemic heavily influenced human life by constricting human social activity. Following the spread of the pandemic, humans did not have a choice but to change their lifestyles. There has been much change in the field of education, which has led to schools hosting online classes as an alternative to face-to-face classes. However, the concentration level is lowered in the online learning class, and the student’s learning rate decreases. We devise a framework for recognizing and estimating students’ concentration levels to help lecturers. Previous studies have a limitation in that they classified attention levels using only discrete states. Due to the partial information from discrete states, the concentration levels could not be recognized well. This research aims to estimate more subtle levels as specified states by using a minimum amount of body movement data. The deep neural network is used to continuously recognize the human concentration model, and the concentration levels can be predicted and estimated by the Kalman filter. Using our framework, we successfully extracted the concentration levels, which can aid lecturers and can be expanded to other areas. To implement the framework, we recruited participants to take online classes. Data were collected and preprocessed using pose points, and an accuracy of 90.62 % was calculated by predicting the concentration level using the framework. Furthermore, the concentration level was approximated based on the Kalman filter. We found that webcams can be used to quantitatively measure student concentration when conducting online classes. Our framework is a great help for instructors to measure concentration levels, which can increase the learning efficiency. As a future work of this study, if emotion data and skin thermal data are comprehensively considered, a student’s concentration level can be measured more precisely.
This content is subject to copyright. Terms and conditions apply.
Research Article
A New Approach to Estimate Concentration Levels with Filtered
Neural Nets for Online Learning
Woodo Lee ,
Junhyoung Oh ,
and Jaekwoun Shim
Department of Physics, Korea University, Seoul, Republic of Korea
School of Cybersecurity, Korea University, Seoul, Republic of Korea
Center for Gifted Education, Korea University, Seoul, Republic of Korea
Correspondence should be addressed to Junhyoung Oh; and Jaekwoun Shim;
Received 25 October 2021; Accepted 8 April 2022; Published 21 April 2022
Academic Editor: Carlos Aguilar-Ibanez
Copyright ©2022 Woodo Lee et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e COVID-19 pandemic heavily influenced human life by constricting human social activity. Following the spread of the
pandemic, humans did not have a choice but to change their lifestyles. ere has been much change in the field of education, which
has led to schools hosting online classes as an alternative to face-to-face classes. However, the concentration level is lowered in the
online learning class, and the student’s learning rate decreases. We devise a framework for recognizing and estimating students’
concentration levels to help lecturers. Previous studies have a limitation in that they classified attention levels using only discrete
states. Due to the partial information from discrete states, the concentration levels could not be recognized well. is research aims
to estimate more subtle levels as specified states by using a minimum amount of body movement data. e deep neural network is
used to continuously recognize the human concentration model, and the concentration levels can be predicted and estimated by
the Kalman filter. Using our framework, we successfully extracted the concentration levels, which can aid lecturers and can be
expanded to other areas. To implement the framework, we recruited participants to take online classes. Data were collected and
preprocessed using pose points, and an accuracy of 90.62 % was calculated by predicting the concentration level using the
framework. Furthermore, the concentration level was approximated based on the Kalman filter. We found that webcams can be
used to quantitatively measure student concentration when conducting online classes. Our framework is a great help for in-
structors to measure concentration levels, which can increase the learning efficiency. As a future work of this study, if emotion data
and skin thermal data are comprehensively considered, a student’s concentration level can be measured more precisely.
1. Introduction
After the outbreak of the coronavirus in December 2019, it
has spread worldwide and has caused much confusion in
society [1]. e coronavirus is causing chaos in many parts of
society and has a great impact on the daily life of mankind.
Education is one of the most affected sectors as the coro-
navirus has persisted without any signs of improvement [2].
Most classes in elementary school, middle school, high
school, and university have come to be conducted in the
online learning method. In many schools that do not have
sufficient preparation for online learning, the educational
effectiveness is declining due to insufficient technical
preparation and lack of operational experience [3].
Online learning is classified into synchronous dis-
tance education and unsynchronous distance education
[4]. In synchronous distance education, lectures are
conducted in real-time using useful tools such as Zoom
or Google Meet [5]. In unsynchronous distance educa-
tion, instructors upload the recorded video to the system,
and students take the course at the desired time. Because
synchronous distance education is a real-time lecture, if
students turn on their cameras and show their faces, it is
possible to determine the minimum level of participation
in the class. For unsynchronous distance education,
many universities develop and use various learning
management systems such as Moodle and Blackboard
[6]. By using these systems, it is possible to determine
Volume 2022, Article ID 3053772, 8 pages
student participation by calculating the learning rate
Although the students participated in the class, they
might not be focused on the content of the class. Xu and
Yang found that when students learn through online
learning, the dropout rate can go up to 95% because they
desire to use their time for purposes other than educational
purposes [7]. Even if students participate in synchronous
distance education, the instructor cannot correctly deter-
mine the students’ concentration due to actions such as
taking other actions or turning the camera while attending
class. Even if the learning rate is calculated using multiple
learning systems, unsynchronous distance education has
many weaknesses. Students turn on the class and engage in
other activities, or they attack the system’s vulnerabilities to
adjust the speed of lectures and take them faster [8].
erefore, appropriate measures should be taken by
determining the students’ concentration in class. Typically,
lecturers have determined students’ concentration levels
based on their own experiences in online learning. For
example, they would make inferences about whether stu-
dents were concentrating on a lecture or not through various
visual cues, such as the focus of students’ eyes or their body
movements during interactions. However, according to a
study by Erol and Tekdal, when it comes to distance edu-
cation, teachers currently do not have sufficient resources to
supervise and evaluate students [9]. erefore, an automated
method for determining students’ concentration levels is
ere have been many attempts to measure students’
concentration levels using various methods, such as taking
skin temperature [10], recognizing visual attention and
students’ emotions [11], and detecting electroencephalo-
gram (EEG) signals [12–14]. However, these methods often
do not work well in online classes because teachers cannot
promptly interact with each student. In addition, these at-
tempts lack detail because their concentration levels are
classified as discrete states [15]. As students’ concentration
levels are simultaneously changing states, this information
may aid lecturers.
Here, we develop a new framework that consists of a
concentration level recognition network (CLRN) and Kal-
man filter (KF) [16] to overcome the limitations of existing
methods. e CLRN is based on supervised learning, which
is trained with the standard deviations of designated points
in positioning a human being and classified labels. e
CLRN provides the concentration levels as the probability of
“high concentration.” e concentration levels can be ob-
tained by the CLRN simultaneously, and the KF identifies
the patterns from the fluctuating levels. In addition, a future
concentration level can be estimated by applying the KF.
Ultimately, the concentration levels can be quantified by the
CLRN with KF, which can aid the lecturers in better un-
derstanding the concentration levels of his/her students.
We implemented this framework in practice using
videos of participants taking online lectures. First, the
standard deviations of the pose points were extracted as a
preprocessing step. en, CLRN was constructed, and a loss
function was grafted. Based on this, the concentration level
of the participants was predicted, and performance of 90.62
% was derived. Moreover, the concentration level was
completed by smoothing and approximating by applying KF
to the result. In this paper, there are various abbreviations,
and the list of abbreviations is summarized in Table 1.
2. Motivation and Related Work
e motive of our study is that a person’s body movements
can be a factor in recognizing his/her condition. Extracting
the status of a human being, such as their emotional state,
from body movements is an interesting research topic,
which has recently become more important [17]. Several
studies have extracted meaningful factors from the
movements of individuals. ey have examined whether
students are concentrating on a lecture or not by checking
various visual cues, such as the focus of the eyes or body
movements of the students [18]. Generally, eye movement
is a strong indicator for estimating the degree of con-
centration [19]. Research has demonstrated that concen-
tration is amplified when the eye movements of
participants maintain a central fixation [20]. In addition,
body movement has been previously researched; for ex-
ample, a model using the joints of the human body can
estimate the pose of individuals [21]. Kinetic movement of
an individual’s body has been identified for the assessment
of a patient’s recovery process [22]. ere have been
previous studies that extract high-value features based on
dynamic movements such as dance movement and aerobic
[23, 24]. e pose of individuals has also been researched
using video data, which can then be used to present a visual
flow of poses [25]. Furthermore, emotion has been rec-
ognized from body movement via machine learning and is
available in public data sets [26].
e studies mentioned above suggest that the relation-
ship between the movement of individuals and effect is very
close, and the relationship should be examined via a bidi-
rectional rather than a unidirectional cause-effect approach
[27]. Research to classify the various states of individuals by
human body posture has been conducted; however, only
binary states have been suggested as results [15]. Similar to
previous research, we propose that the standard deviations
of designated physical points comprise a core factor in
measuring concentration levels. We use OpenPose as a
backbone package; it is a well-known tool for analyzing body
movements by detecting designated points of a human body.
Several types of research have been used OpenPose; for
example, sign languages were recognized by a transfer
learning algorithm that utilized OpenPose [28]. When
humans are focused on some subjects, the standard devia-
tions of their movements will become lower because they
engage in less wasted effort. erefore, we designed the
CLRN based on deep learning to find the subtle changes in
the standard deviations. ere has been similar research to
recognize human states, such as emotion [29], via deep
learning. However, the approach is limited in the sense that
it does not provide simultaneous results. To address this
problem, we apply a KF to deal with continuous and si-
multaneous data.
3. Proposed Framework
Figure 1 shows the overview of our framework. In the first
step of the framework, the student’s video data recorded by a
webcam is preprocessed. e preprocessed data are labeled
by the two states based on the participants’ self-reported
intent. e labeled data are used for training the CLRN in the
recognition step. e CLRN is devised with supervised
learning for binary classification, and the data are prepared
with a binary class (the data are labeled as zero or one).
e trained CLRN recognizes the continuous concen-
tration levels, which are defined as recognition levels (Sr).
e KF is used for smoothing and filtering the highly
fluctuating Sr. In the estimation step, the KF provides an
approximation of the concentration levels, which are called
the estimation levels (Se).
Our method to recognize and estimate human con-
centration levels consists of three steps: preprocessing,
recognition, and estimation.
3.1. Step 1: Preprocessing. e first step of our framework is
to extract the standard deviations of the pose points from the
video data. e standard deviations (σ)of the Xand Y
coordinates are calculated for the top and middle parts,
respectively. Note that we assume the standard deviations of
the points are the core factor in measuring the concentration
levels. Table 2 shows the notations of the results in the
preprocessing step.
Algorithm 1 shows the process of the preprocessing step.
e standard deviations are obtained through this algorithm
and become the input data, the CLRN, which is discussed in
the following section.
3.2. Step 2: Recognition. Algorithm shows the overall
structure of the recognition step. rough the CLRN, the
recognition levels (Sr)are obtained. e CLRN consists of
four layers: two hidden, one input, and one output layer. e
role of the hidden layers is to find the hidden features in the
data. A network deeper than two layers does not improve the
performance of the framework. e rectified linear unit
(ReLU) is used as the activation function in both hidden
layers. A sigmoid is used to make sure the probability is
distributed relatively evenly from zero to one. ADAptive
Moment (ADAM) estimation optimizer [30] is applied, and
the initial learning rate is set as 0.1 %, which is the optimal
value for the CLRN. e binary cross-entropy loss is chosen
as the loss function (L)of the CLRN and is defined as
L 1
yilog 􏽢
􏼁 +1yi
 􏼁log 1􏽢
 􏼁􏼂 􏼃,(1)
where the number of data items is N, the labels are yi, and
the prediction values from our deep neural network (DNN)
are 􏽢
yi. Note that the values of yiare obtained from the
participants’ self-reported intent. As the output value is a
probability for binary classification, the binary cross-entropy
is an appropriate value to determine continuous concen-
tration levels.
3.3. Step 3: Estimation. e estimation step of the CLRN
includes a KF to establish Se. Algorithm 3 shows the overall
process. ere are three states in the algorithm: the pre-
diction state, sp(t); estimation state, se(t); and measurement
state, mt. e error covariance matrix (Pt)and the transition
weight matrix (A)are also defined. In the predicting step, A
and an external noise matrix (Q)are used, and lectures can
modify those matrices. Ais set to 1, and Qis set to 0 as an
ideal case.
As part of the updating step, the Kalman gain (K)is
obtained at each update. His a scale matrix, which is set to 1
by simplifying the problems. se(t+1)and Pt+1are updated
with K. Finally, in the estimating step, the next estimated
state se(t+1)is recurrently updated.
We assume that each of the distributions of Ψlow can be
decomposed into two dominant levels with a certain
function. e function is a bimodal distribution X, which is
written as
􏼁 Aexμk
( )2/2σ2
1, A1
􏼐 􏼑 +N2μ2,σ2
2, A2
􏼐 􏼑,
where σ1and σ2are the standard deviations, μ1and μ2are
the mean values, and xis the input data.
4. Implementation
ree participants were recruited for this experiment, and
each participant was recorded when they viewed an online
lecture, and they were required to mark the times when they
were concentrating on the lecture. is work involved hu-
man subjects or animals in its research. Approval of all
ethical and experimental procedures and protocols was
granted by the Institutional Review Board of the Korea
University Center for Gifted Education.
A webcam was used to record the 25-fps video data. For
recording video data for the distraction (nonconcentration)
case, the participants also marked the times when they were
e data from three participants are merged as a dataset
because estimating the levels for each participant, respec-
tively, could be biased per the characteristics of participants.
Moreover, we expect to find general properties of concen-
tration levels by using the merged data with our models. e
merged dataset is labeled as two cases based on the markers
of the participants. In total, 12 hours of video data
Table 1: Abbreviation list.
Abbreviation Definition
EEG Electroencephalogram
CLRN Concentration level recognition network
KF Kalman filter
ReLU Rectified linear unit
ADAM Adaptive moment
DNN Deep neural network
Complexity 3
(consisting of 1M images) were recorded. A total of eight
hours of data was marked for the concentration case; the
other four hours of data were taken for the distraction case.
For step 1(preprocessing), certain pose points in the
images are detected to measure the distribution of partici-
pant poses. Ten points of the human body are measured
every 50 frames, which are classified as the top part (0–4) and
the middle part (5–9). Figure 2 shows the points, and the
coordinate data of the points range from zero to one. To
detect the points, OpenPose [31–34] is used. OpenPose [31]
is a very recent open-source package for detecting the
keypoint of human poses. OpenPose is a real-time system for
the body, foot, hand, and facial keypoint detection and is an
x (t)σ (t)sr (t)
ψ (t)
The pre-processing part The recognition part The estimation part
Figure 1: Overview of our framework. e framework consists of a CLRN to recognize the features and a KF to estimate the levels.
Table 2: Data symbols and descriptions.
Symbol Description
Top σof the top part’s Xcoordinate
Top σof the top part’s Ycoordinate
Mid σof the middle part’s Xcoordinate
Mid σof the middle part’s Ycoordinate
Input: top.X, top.Y, mid.X, mid.Y
for each Data top.X, top.Y, mid.X, mid.Y
􏼈 􏼉 do
for Di50(i+1)
50iData do
end for
end for
Output: σX
Data preprocessing.
Input: σX
Input layer : R4
1st hidden layer :R8(activation function:ReLU)
2n d hidden layer :R8(activation function:ReLU)
Output layer : R1(activation function : Sigmoid)
Output: ConcentrationLevels sr(t)S
Input: sr(t)S
for all sr(t)Sdo
/Predicting /
sp(t) � A·se(t)
/Updating /
/Estimating /
se(t+1) � sp(t) + K· (mtH·sp(t))
Pt+1Ppre d
tK·H·Ppre d
end for
/Analyzing /
Fitting the distribution of se(t+1)
user defined function
Output: ConcentrationLevels (Ψ(t))
Figure 2: e points measured by OpenPose are shown. e
middle and upper body are measured with ten points, respectively.
appropriate package for continuously detecting these points.
In our case, we only used the upper body of individuals as
captured in the video data.
We then check the distributions of the pose points when
the individuals were concentrating or not, as shown in
Figure 3. e distribution in Figure 3(a) shows that the
entries are gathered more closely around the body points,
while those in Figure 3(b) are spread more widely. e
difference is visually noticeable in this example, but it cannot
be easily quantified to identify the concentration levels. In
the preprocessing step, the input data are already divided
into 50 frames, so the input data for the CLRN are not
separated into minibatches.
For step 2(recognition), CLRN performs the task of
predicting what participants marked while viewing the
online lecture. K-fold is applied to cover insufficient data.
e accuracy of 5-fold training ranged from 85% to 95% with
a median of 90.62%.
Figure 4 shows the difference of the σs among each
group. Nevertheless, there remain unexplained aspects, such
Y-a xi s
Number of Entries
Y-a xi s
Number of Entries
Figure 3: (a) e 2D histogram in the case of high-concentration; (b) the 2D histogram in the case of low-concentration.
0.00 0.01 0.02
0.00 0.01 0.02
0.00 0.01 0.02
0.00 0.01 0.02
Figure 4: (a)–(d) show the standard distribution for each respective part. e red histogram indicates the high concentration case, and the
blue histogram indicates the low concentration case.
Complexity 5
as ambiguous patterns, whose correlation with the con-
centration levels is unclear. To this end, neural networks are
applied to solve the problems as they are an appropriate
method for obtaining nonlinear combinations from features.
is allows us to identify hidden features that we cannot
otherwise describe.
For step 3 (estimation), trained CLRN recognizes con-
tinuous concentration level, smoothing and filtering it using
Kalman Filter, and finally approximates it. e students’
state starts from se(0) � 0.5 because the students’ concen-
tration level is assumed to be 50 % at the beginning. P00.9
is the system error, which comes from the DNN, which was
described in Section 3.
Figure 5 shows the estimation and measurement results
for 2.5 second intervals. It indicates that the students
maintained their concentration levels, and there were no
external disturbances when they observed the lectures. Even
though the measurements fluctuate widely every 2.5 seconds,
the KF enables users to track the levels smoothly, which are
shown as the green and the red dots, indicating the low
(Ψlow)and high (Ψhigh )concentration levels, respectively.
Figure 6 shows the distribution of Ψlow.μ1and μ2are
obtained as 0.09 and 0.16, respectively, which indicate that
the students are entangled in two concentration levels.
5. Conclusion and Future Work
Many schools have been semicompulsory for distance edu-
cation due to the coronavirus. However, distance education is
economical in terms of price effect and can educate many
students simultaneously. Furthermore, if distance education is
carried out, cooperative learning can be performed in an in-
teractive learning environment, and since home-based classes
are possible, the time and effort of commuting to school are
reduced. If the major disadvantage of distance education, the
concentration level is low, can be overcome through this study,
and more effective classes will be possible. We solve this
problem by developing a novel framework consisting of a
concentration level recognition network and a Kalman filter.
We devised the model for aiding lecturers in estimating stu-
dents’ concentration levels using webcams as part of online
classes. Our system presents the level every 2.5 seconds with
90.62%accuracy and estimates the next level of concentration
by using the KF. In contrast to the previous research, such as
VGG16 [35], our model takes a different approach to quantify
the levels by capturing the variance of the detected pose points
on individuals in the current state. Additionally, we estimate
and track the level for the next time window. Our model offers
a practical tool to monitor the level more precisely and aid
lecturers in estimating the level. Academically, our model
applies a novel approach to analyzing complex human states, in
this specific case, concentration level. As future work, we plan
to use not just body movement data but also emotion data [36]
and skin thermal data [10, 37] to enhance the prediction of
measuring human concentration levels. is paper will com-
bine and process the measuring method used and the con-
ventional techniques using deep learning. is work expects to
provide helpful information on students’ concentration levels
and thus assist lecturers.
Data Availability
No data are available because of privacy issue.
An earlier version of this manuscript was preprinted in the
arXiv [38], and several students participated to the earlier
Conflicts of Interest
e authors declare that they have no conflicts of interest.
Measurements 1
Estimations 1
Measurements 0
Estimations 0
Concentration Levels
Time (min)
Figure 5: Estimation and measurement levels are shown. e
measurement and estimation values are represented with the blue
and red dots, respectively, when the students are concentrating
highly. e black and green dots are the measurement and esti-
mation values, respectively, when the students express low con-
centration levels.
Number of Values
Low Concentration
Figure 6: Ψlow and the fit curve. e histogram shows the estimated
concentration levels from our model. e histogram contains the
data of all three participants when they are under low concentration
e authors would like to show their gratitude to Jakyung
Koo, Nokyung Park, and Pilgu Kang at Korea University for
their assistance in this research, which greatly improved the
manuscript. Even though they are not included in author-
ship in this paper, their broader assistance lays the foun-
dation of our research.
[1] W. McKibbin and R. Fernando, e economic impact of covid-
19. economics in the time of covid-19, Baldwin, B. Weder di
Mauro (red.), pp. 45–51, Centre for Economic Policy Research
(CEPR), London, UK, 2020.
[2] J. Daniel, “Education and the covid-19 pandemic,” Prospects,
vol. 49, no. 1, pp. 91–96, 2020.
[3] A. Pragholapati, Covid-19 Impact on Students, Universitas
Pendidikan Indonesia, Bandung, Indonesia, 2020.
[4] R. F. Branon and C. Essex, “Synchronous and asynchronous
communication tools in distance education,” TechTrends,
vol. 45, no. 1, p. 36, 2001.
[5] V. D. Soni, “Global impact of e-learning during covid 19,”
SSRN 3630073, 2020.
[6] P. Faisal and Z. Kisman, “Information and communication
technology utilization effectiveness in distance education
systems,” International Journal of Engineering Business
Management, vol. 12, 2020.
[7] B. Xu and D. Yang, “Motivation classification and grade
prediction for moocs learners,” Computational Intelligence
and Neuroscience, vol. 2016, Article ID 2174613, 7 pages, 2016.
[8] J. A. Alokluk, “e effectiveness of blackboard system, uses
and limitations in information management,” Intelligent In-
formation Management, vol. 10, no. 06, pp. 133–149, 2018.
[9] E. Koçoglu and D. Tekdal, “Analysis of distance education
activities conducted during covid-19 pandemic,” Educational
Research and Reviews, vol. 15, no. 9, pp. 536–543, 2020.
[10] S. Nomura, M. Hasegawa-Ohira, Y. Kurosawa, Y. Hanasaka,
K. Yajima, and Y. Fukumura, “Skin tempereture as a possible
indicator of studentˆ
s involvement in e-learning ses-
sions,” International Journal of Electronic Commerce Studies,
vol. 3, no. 1, pp. 101–110, 2012.
[11] P. Sharma, M. Eseng¨
on¨ul, S. R. Khanal, T. T. Khanal, V. Filipe,
and M. J. Reis, “Student concentration evaluation index in an
e-learning context using facial emotion analysis,” in Pro-
ceedings of the International Conference on Technology and
Innovation in Learning, Teaching and Education, pp. 529–538,
Springer, essaloniki, Greece, June 2018.
[12] A. S. Al-Musawi, “Concentration level monitoring in edu-
cation and healthcare,” Basic and Clinical Pharmacology and
Toxicology, vol. 124, no. s2, p. 36, 2018.
[13] S. Marouane, S. Najlaa, T. Abderrahim, and E. K. Eddine,
“Towards measuring learner’s concentration in e-learning
systems,” International Journal of Computer Techniques,
vol. 2, no. 5, pp. 27–29, 2015.
[14] N.-H. Liu, C.-Y. Chiang, and H.-C. Chu, “Recognizing the
degree of human attention using eeg signals from mobile
sensors,” Sensors, vol. 13, no. 8, pp. 10273–10286, 2013.
[15] R. Sacchetti, T. Teixeira, B. Barbosa, A. Neves, S. C. Soares,
and I. D. Dimas, “Human body posture detection in context:
the case of teaching and learning environments,” SIGNAL,
vol. 87, pp. 79–84, 2018.
[16] R. E. Kalman, “A new approach to linear filtering and pre-
diction problems,” IEEE, vol. 82, no. 1, pp. 35–45, 1960.
[17] H. Zacharatos, C. Gatzoulis, and Y. L. Chrysanthou, “Auto-
matic emotion recognition based on body movement analysis:
a survey,” IEEE computer graphics and applications, vol. 34,
no. 6, pp. 35–45, 2014.
[18] L. Lakshmi Priya Gg, “Student emotion recognition system
(sers) for e-learning improvement based on learner con-
centration metric,” Procedia Computer Science, vol. 85,
pp. 767–776, 2016.
[19] M. Li, L. Cao, Q. Zhai et al., “Method of depression classification
based on behavioral and physiological signals of eye movement,”
Complexity, vol. 2020, Article ID 4174857, 9 pages, 2020.
[20] M. M. Doran, J. E. Hoffman, and B. J. Scholl, “e role of eye
fixations in concentration and amplification effects during
multiple object tracking,” Visual Cognition, vol. 17, no. 4,
pp. 574–597, 2009.
[21] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint
training of a convolutional network and a graphical model for
human pose estimation,” Advances in Neural Information
Processing Systems, vol. 27, pp. 1799–1807, 2014.
[22] L. M. Pedro and G. A. de Paula Caurin, “Kinect evaluation for
human body movement analysis,” in Proceedings of the 2012
4th IEEE RAS & EMBS International Conference on Bio-
medical Robotics and Biomechatronics (BioRob), pp. 1856–
1861, IEEE, Rome, Italy, June 2012.
[23] X. Zhai, “Dance movement recognition based on feature
expression and attribute mining,” Complexity, vol. 2021,
Article ID 9935900, 12 pages, 2021.
[24] W. Fan and H. J. Min, “Accurate recognition and simulation
of 3d visual image of aerobics movement,” Complexity,
vol. 2020, Article ID 8889008, 11 pages, 2020.
[25] T. Pfister, J. Charles, and A. Zisserman, “Flowing convnets for
human pose estimation in videos,” in Proceedings of the IEEE
international conference on computer vision, pp. 1913–1921,
Araucano Park, December 2015.
[26] F. Ahmed, A. H. Bari, and M. L. Gavrilova, “Emotion rec-
ognition from body movement,” IEEE Access, vol. 8,
pp. 11761–11781, 2019.
[27] I. Rossberg-Gempton and G. D. Poole, “e relationship
between body movement and affect: from historical and
current perspectives,” e Arts in Psychotherapy, vol. 19, 1992.
[28] S.-K. Ko, C. J. Kim, H. Jung, and C. Cho, “Neural sign lan-
guage translation based on human keypoint estimation,”
Applied Sciences, vol. 9, no. 13, p. 2683, 2019.
[29] R. Santhoshkumar and M. K. Geetha, “Deep learning ap-
proach for emotion recognition from human body move-
ments with feedforward deep convolution neural networks,”
Procedia Computer Science, vol. 152, pp. 158–165, 2019.
[30] D. P. Kingma and J. Ba, “Adam: a method for stochastic
optimization,” 2014.
[31] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh,
“Openpose: realtime multi-person 2d pose estimation using
part affinity fields,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 43, no. 1, pp. 172–186, 2019.
[32] T. Simon, H. Joo, I. Matthews, and Y. Sheikh, “Hand keypoint
detection in single images using multiview bootstrapping,” in
Proceedings of the IEEE conference on Computer Vision and
Pattern Recognition, pp. 1145–1153, Honolulu, HI, USA, July
[33] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-
person 2d pose estimation using part affinity fields,” in
Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 7291–7299, June 2017.
[34] M. Mohammadpour, H. Khaliliardali, S. M. R. Hashemi, and
M. M. AlyanNezhadi, “Facial emotion recognition using deep
Complexity 7
convolutional networks,” in Proceedings of the 2017 IEEE 4th
international conference on knowledge-based engineering and
innovation (KBEI), p. 0017–0021, Tehran, Iran, December
[35] J. C Ruvinga and D. Malathi, “Human concentration level
recognition based on vgg16 cnn architecture,” International
Journal of Advanced Science and Technology, vol. 29, 2020.
[36] S. E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh,
“Convolutional pose machines,” in Proceedings of the IEEE
conference on Computer Vision and Pattern Recognition,
pp. 4724–4732, Las Vegas, USA, June 2016.
[37] L. Nummenmaa, E. Glerean, R. Hari, and J. K. Hietanen,
“Bodily maps of emotions,” Proceedings of the National
Academy of Sciences, vol. 111, no. 2, pp. 646–651, 2014.
[38] W. Lee, J. Koo, N. Park, P. Kang, and J. Shim, “A framework
for recognizing and estimating human concentration levels,”.
... Lee et al. [38] and Son et al. [27] use video input (i.e., webcams) in the concentration evaluation. Video analysing can estimate, e.g., the effects of body movement on concentration. ...
Full-text available
The ability to concentrate well is an important determinant of students’ learning outcomes but remains poorly understood. In this work we investigated whether there exists a mapping between students’ biosignals and perceived concentration levels. If we succeed in this mapping, a wearable can function as a Concentration Tracker, a novel feature that is missing from current wearables. For this, a wearable wristband was used to record students’ heart rate, heart rate variability, skin temperature, skin conductivity and acceleration from body changes. Additionally, students self-assessed their concentration levels using a smartphone application. We improved the accuracy by utilizing a big amount of unlabelled biodata from outside the study sessions. Our best boosted regression tree model predicted students’ concentration level with only 1.7% NMAE error. The predictions for a user not in the training set were much weaker; the best model, a convolutional neural network, achieved a prediction NMAE error of 30.7%. This implies that the users generated biosignals highly individually. Thus, models are not well transferable from one user to another without rooting them in user-specific data. Contrary to stress research, our results showed that skin conductivity had mostly a negative correlation with students’ concentration levels. Also diverging from stress reactions, skin temperature had mainly a positive correlation. Conductivity and temperature were the two dominant predictors. Further, the results suggest that an element of deep, effortless concentration was present in the learning experience of the subjects. Altogether, our work demonstrates that a concentration tracking wearable for improving learning is technically achievable.
Full-text available
There are complex posture changes in dance movements, which lead to the low accuracy of dance movement recognition. And none of the current motion recognition uses the dancer’s attributes. The attribute feature of dancer is the important high-level semantic information in the action recognition. Therefore, a dance movement recognition algorithm based on feature expression and attribute mining is designed to learn the complicated and changeable dancer movements. Firstly, the original image information is compressed by the time-domain fusion module, and the information of action and attitude can be expressed completely. Then, a two-way feature extraction network is designed, which extracts the details of the actions along the way and takes the sequence image as the input of the network. Then, in order to enhance the expression ability of attribute features, a multibranch spatial channel attention integration module (MBSC) based on an attention mechanism is designed to extract the features of each attribute. Finally, using the semantic inference and information transfer function of the graph convolution network, the relationship between attribute features and dancer features can be mined and deduced, and more expressive action features can be obtained; thus, high-performance dance motion recognition is realized. The test and analysis results on the data set show that the algorithm can recognize the dance movement and improve the accuracy of the dance movement recognition effectively, thus realizing the movement correction function of the dancer.
Full-text available
The structure of the deep artificial neural network is similar to the structure of the biological neural network, which can be well applied to the 3D visual image recognition of aerobics movements. A lot of results have been achieved by applying deep neural networks to the 3D visual image recognition of aerobics movements, but there are still many problems to be overcome. After analyzing the expression characteristics of the convolutional neural network model for the three-dimensional visual image characteristics of aerobics, this paper builds a convolutional neural network model. The model is improved on the basis of the traditional model and unifies the process of aerobics 3D visual image segmentation, target feature extraction, and target recognition. The convolutional neural network and the deep neural network based on autoencoder are designed and applied to aerobics action 3D visual image test set for recognition and comparison. We improve the accuracy of network recognition by adjusting the configuration parameters in the network model. The experimental results show that compared with other simple models, the model based on the improved AdaBoost algorithm can improve the final result significantly when the accuracy of each model is average. Therefore, the method can improve the recognition accuracy when multiple neural network models with general accuracy are obtained, thereby avoiding the complicated parameter adjustment process to obtain a single optimal network model.
Full-text available
The present study emphasizes the global impact of the e-learning process during COVID 19. The implementation of lockdown and social distancing has been enforced as one of the preventive measures to spread the coronavirus infection which has resulted in complete paralysis of global activities. Especially the education system which is completely shut and to proceed with the academic curriculum, there is a shift from the regular learning process to electronic learning. This can be cited with an increased number of online classes, conferences, meetings, etc. It can be noted that the world is completely dependent on information technologies during this crisis. Hence, the present study provides an insight into the process of electronic learning and its advantages along with the updated version of its usage. To best of our knowledge, there have been scanty scientific reports on this particular situation of the impact of e-learning during COVID 19. The present study is a compilation of the components of e-learning tools along with the future perspective on education using information science.
Full-text available
Learning by using online application facilities through the Internet is a new service for all users. However, there are many problems and obstacles faced by users, both by students and by lecturers in utilizing online application services via the Internet. The development of a distance education service management software application model provides facilities for academics to facilitate interaction between students and lecturers with using online-based information technology communication services through the Internet. In addition, the management of distance education services is able to provide reports desired by students and lecturers using online-based information technology communication services through the Internet. This study used a descriptive–analytic method by presenting a summary of interviews and survey results in the form of a questionnaire to the faculty member. The method used is a qualitative method because it provides an explanation using analysis. This research uses Moodle application as a Distance Education System. The distance learning model used in the Trilogi University is considered sufficient, in terms of learning goals, learning materials, interactivity, and rules. The results also show that both lecturers and students argue that learning with a distance education system can simplify work, speed up work, accurate work, and be more efficient because it is interactive and user-friendly. This research can be applied to other school units or universities so that distance education services can help academics interact easily, quickly, and accurately. For its development, it can be applied using cellular-based applications.
Full-text available
This paper presents a method of depression recognition based on direct measurement of affective disorder. Firstly, visual emotional stimuli are used to obtain eye movement behavior signals and physiological signals directly related to mood. Then, in order to eliminate noise and redundant information and obtain better classification features, statistical methods (FDR corrected t -test) and principal component analysis (PCA) are used to select features of eye movement behavior and physiological signals. Finally, based on feature extraction, we use kernel extreme learning machine (KELM) to recognize depression based on PCA features. The results show that, on the one hand, the classification performance based on the fusion features of eye movement behavior and physiological signals is better than using a single behavior feature and a single physiological feature; on the other hand, compared with previous methods, the proposed method for depression recognition achieves better classification results. This study is of great value for the establishment of an automatic depression diagnosis system for clinical use.
Full-text available
Automatic emotion recognition from the analysis of body movement has tremendous potential to revolutionize virtual reality, robotics, behavior modeling, and biometric identity recognition domains. A computer system capable of recognizing human emotion from the body can also significantly change the way we interact with the computers. One of the significant challenges is to identify emotion-specific features from a vast number of descriptors of human body movements. In this paper, we introduce a novel two-layer feature selection framework for emotion classification from a comprehensive list of body movement features. We used the feature selection framework to accurately recognize five basic emotions: happiness, sadness, fear, anger, and neutral. In the first layer, a unique combination of Analysis of Variance (ANOVA) and Multivariate Analysis of Variance (MANOVA) was utilized to eliminate irrelevant features. In the second layer, a binary chromosome-based genetic algorithm was proposed to select a feature subset from the relevant list of features that maximizes the emotion recognition rate. Score and rank-level fusion were applied to further improve the accuracy of the system. The proposed system was validated on proprietary and public datasets, containing 30 subjects. Different action scenarios, such as walking and sitting actions, as well as an action-independent case, were considered. Based on the experimental results, the proposed emotion recognition system achieved a very high emotion recognition rate outperforming all of the state-of-the-art methods. The proposed system achieved recognition accuracy of 90.0% during walking, 96.0% during sitting, and 86.66% in an action-independent scenario, demonstrating high accuracy and robustness of the developed method.
As we have questioned today in the world, especially Indonesia which is struggling against Coronavirus or commonly known by the name (COVID-19), which is already much needed by victims. Coronavirus or commonly known by the name (COVID-19) is an infectious disease caused by a newly discovered coronavirus.The purpose of this article is to identify the impact of the corona virus (covid-19) on college students. articles used in 2020 with the keyword "student" "covid-19" in accordance with the title. Search results found 6 articles that match.
The COVID-19 pandemic is a huge challenge to education systems. This Viewpoint offers guidance to teachers, institutional heads, and officials on addressing the crisis. What preparations should institutions make in the short time available and how do they address students’ needs by level and field of study? Reassuring students and parents is a vital element of institutional response. In ramping up capacity to teach remotely, schools and colleges should take advantage of asynchronous learning, which works best in digital formats. As well as the normal classroom subjects, teaching should include varied assignments and work that puts COVID-19 in a global and historical context. When constructing curricula, designing student assessment first helps teachers to focus. Finally, this Viewpoint suggests flexible ways to repair the damage to students’ learning trajectories once the pandemic is over and gives a list of resources.
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that using a PAF-only refinement is able to achieve a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.