ArticlePDF Available

Recognition of Eye-Written Characters Using Deep Neural Network

Authors:

Abstract

Eye writing is a human–computer interaction tool that translates eye movements into characters using automatic recognition by computers. Eye-written characters are similar in form to handwritten ones, but their shapes are often distorted because of the biosignal’s instability or user mistakes. Various conventional methods have been used to overcome these limitations and recognize eye-written characters accurately, but difficulties have been reported as regards decreasing the error rates. This paper proposes a method using a deep neural network with inception modules and an ensemble structure. Preprocessing procedures, which are often used in conventional methods, were minimized using the proposed method. The proposed method was validated in a writer-independent manner using an open dataset of characters eye-written by 18 writers. The method achieved a 97.78% accuracy, and the error rates were reduced by almost a half compared to those of conventional methods, which indicates that the proposed model successfully learned eye-written characters. Remarkably, the accuracy was achieved in a writer-independent manner, which suggests that a deep neural network model trained using the proposed method is would be stable even for new writers.
applied
sciences
Article
Recognition of Eye-Written Characters Using Deep Neural Network
Won-Du Chang 1, Jae-Hyeok Choi 1and Jungpil Shin 2,*


Citation: Chang, W.-D.; Choi, J.-H.;
Shin, J. Recognition of Eye-Written
Characters Using Deep Neural
Network. Appl. Sci. 2021,11, 11036.
https://doi.org/10.3390/
app112211036
Academic Editor: Andrés Márquez
Received: 28 September 2021
Accepted: 19 November 2021
Published: 22 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1Department of Computer Engineering, Pukyong National University, Busan 48513, Korea;
chang@pknu.ac.kr (W.-D.C.); tjzjs12@pukyong.ac.kr (J.-H.C.)
2School of Computer Science and Engineering, the University of Aizu, Aizuwakamatsu,
Fukushima 965-8580, Japan
*Correspondence: jpshin@u-aizu.ac.jp
Abstract:
Eye writing is a human–computer interaction tool that translates eye movements into
characters using automatic recognition by computers. Eye-written characters are similar in form
to handwritten ones, but their shapes are often distorted because of the biosignal’s instability or
user mistakes. Various conventional methods have been used to overcome these limitations and
recognize eye-written characters accurately, but difficulties have been reported as regards decreasing
the error rates. This paper proposes a method using a deep neural network with inception modules
and an ensemble structure. Preprocessing procedures, which are often used in conventional methods,
were minimized using the proposed method. The proposed method was validated in a writer-
independent manner using an open dataset of characters eye-written by 18 writers. The method
achieved a 97.78% accuracy, and the error rates were reduced by almost a half compared to those of
conventional methods, which indicates that the proposed model successfully learned eye-written
characters. Remarkably, the accuracy was achieved in a writer-independent manner, which suggests
that a deep neural network model trained using the proposed method is would be stable even for
new writers.
Keywords:
artificial neural network; biosignal analysis; electrooculogram; eye-tracking;
human–computer interface; pattern recognition
1. Introduction
Keyboards, mice, and touchscreens represent the most popular input devices for
human–computer interaction (HCI) in recent decades, and they are useful for general
everyday purposes. Additionally, novel types of interfaces for computer systems have
been developed for specialized applications, such as education, medical care, arts, control-
ling robots, and games utilizing gestures [
1
], voices [
2
], pens [
3
], and other devices [
4
,
5
].
Biosignal processing is drawing attention to these novel interfaces because it enables di-
rect interactions between body movements and a computer. Directly interacting with
computers through biosignals could significantly improve user experience.
Biosignals used for HCI include electroencephalograms (EEG), electromyograms
(EMG), and electrooculograms (EOG) [
6
8
]. EOG are directly related to eye movements
and can thus be used for eye-tracking. Because of the difference in electric potentials be-
tween the retina and cornea of the eye, the potential increases where the cornea approaches
as the eye moves [
9
]. Eye movements can be measured using optical or infrared cam-
eras
[1013]
. Camera-based methods have higher accuracy than EOG methods but suffer
from limitations such as their high cost, complicated setup, and inconsistent recognition
rates because of the variability in eyelid/eyelash movements among different individuals
and contrast differences depending on the surrounding environment [4].
Appl. Sci. 2021,11, 11036. https://doi.org/10.3390/app112211036 https://www.mdpi.com/journal/applsci
Appl. Sci. 2021,11, 11036 2 of 10
EOG-based eye tracking is relatively cheap and is not affected by lighting or the
physical condition of the eyelids. However, it is difficult to obtain clean data with this
method because various signals from the body are measured together with EOG, and
these are difficult to separate. Previous studies indicate that EOG-based eye-tracking often
attempts to estimate eye movements during a very short period (less than 1 s) with simple
and directional movements [1418]. Yan et al. recognized a maximum of 24 patterns with
this approach [
19
]. They classified eye movement in 24 directions with an average accuracy
of 87.1%, but the performance was unstable, and the eyes needed to be turned up to 75
,
which is unnatural and inconvenient.
Recently, the concept of eye writing was introduced to overcome the limitations of
conventional EOG-based methods. Eye writing involves moving the eyes such that the gaze
traces the form of a letter, which increases the amount of information transfer compared
to conventional EOG-based eye tracking. The degree of freedom is as high as the number
of letters we may write. Recent studies have shown that eye writing can be used to trace
Arabic numerals, English alphabets, and Japanese katakana characters [20,21].
To recognize eye-written characters, various pattern-recognition algorithms have been
proposed. Tsai et al. proposed a system for recognizing eye-written Arabic numerals and
four arithmetic symbols [
22
]. The system was developed using a heuristic algorithm and
achieved a 75.5% accuracy (believability). Fang et al. utilized a hidden Markov model
to recognize 12 patterns of Japanese katakana and achieved an 86.5% accuracy in writer-
independent validation [
20
]. Lee et al. utilized dynamic time warping (DTW) to achieve
an 87.38% accuracy for 36 patterns of numbers and English alphabets [
21
]. Chang et al.
showed that the accuracy could be increased by combining dynamic positional warping, a
modification of DTW, with a support vector machine (SVM) [
23
]. They achieved a 95.74%
accuracy for Arabic numbers, which was 3.33% points higher than when only DTW was
used for the same dataset.
Increasing the recognition accuracy is critical for eye writing when it is used as an
HCI tool. This paper proposes a method to increase the accuracy of the conventional
method using a deep neural network. The proposed method minimizes the preprocessing
procedures and automatically finds the features in convolutional network layers. Section 2
describes the datasets, preprocessing, and network structures of the proposed method, and
Section 3presents the experimental results. Section 4concludes the study and describes
future work.
2. Materials and Methods
2.1. Dataset
In this study, the dataset presented by Chang et al. [
23
] was used because it is one of
the few open datasets of eye-written characters. It contains eye-written Arabic numerals
collected from 18 participants (5 females and 13 males; mean age 23.5
±
3.28). The majority
of participants (17 out of 18) had no experiences of eye-writing before the experiment.
The Arabic numeral patterns were specifically designed for eye writing to minimize user
difficulty and reduce similarity among the patterns (Figure 1). The participants moved
their eyes to follow the guide-patterns in Figure 1during the experiment.
The data were recorded at a sampling rate of 2048 Hz, and they comprised EOG
signals at four locations around the eye (two on the left and right sides of the eyes, and
two above and below the right eye). All 18 participants were healthy and did not have any
eye diseases. The detailed experimental procedures can be found in [23].
The total number of eye-written characters was 540, with 10 Arabic numerals written
thrice by each participant. Figure 2shows examples of eye-written characters in the dataset.
The shapes in Figure 2are very different from the original pattern designs, because the
recording started when the participants look at the central point of a screen, and noises
and artifacts were included during the experiments. There were distortions caused by the
participants’ mistakes during the eye-writing.
Appl. Sci. 2021,11, 11036 3 of 10
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 11
Figure 1. Pattern designs of Arabic numbers [23]. The red dots denote the starting points. Re-
printed with permission.
The data were recorded at a sampling rate of 2048 Hz, and they comprised EOG sig-
nals at four locations around the eye (two on the left and right sides of the eyes, and two
above and below the right eye). All 18 participants were healthy and did not have any eye
diseases. The detailed experimental procedures can be found in [23].
The total number of eye-written characters was 540, with 10 Arabic numerals written
thrice by each participant. Figure 2 shows examples of eye-written characters in the da-
taset. The shapes in Figure 2 are very different from the original pattern designs, because
the recording started when the participants look at the central point of a screen, and noises
and artifacts were included during the experiments. There were distortions caused by the
participants’ mistakes during the eye-writing.
Figure 2. Eye-written characters [23].
2.2. Preprocessing
The data were preprocessed in four steps: (1) resampling, (2) eye-blink removal, (3)
resizing, and (4) normalization (Figure 3). First, the signal was resampled to 64 Hz because
high sampling rates are unnecessary for EOG analysis (the signal was originally recorded
Figure 1.
Pattern designs of Arabic numbers [
23
]. The red dots denote the starting points. Reprinted
with permission.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 11
Figure 1. Pattern designs of Arabic numbers [23]. The red dots denote the starting points. Re-
printed with permission.
The data were recorded at a sampling rate of 2048 Hz, and they comprised EOG sig-
nals at four locations around the eye (two on the left and right sides of the eyes, and two
above and below the right eye). All 18 participants were healthy and did not have any eye
diseases. The detailed experimental procedures can be found in [23].
The total number of eye-written characters was 540, with 10 Arabic numerals written
thrice by each participant. Figure 2 shows examples of eye-written characters in the da-
taset. The shapes in Figure 2 are very different from the original pattern designs, because
the recording started when the participants look at the central point of a screen, and noises
and artifacts were included during the experiments. There were distortions caused by the
participants’ mistakes during the eye-writing.
Figure 2. Eye-written characters [23].
2.2. Preprocessing
The data were preprocessed in four steps: (1) resampling, (2) eye-blink removal, (3)
resizing, and (4) normalization (Figure 3). First, the signal was resampled to 64 Hz because
high sampling rates are unnecessary for EOG analysis (the signal was originally recorded
Figure 2. Eye-written characters [23].
2.2. Preprocessing
The data were preprocessed in four steps: (1) resampling, (2) eye-blink removal, (3) re-
sizing, and (4) normalization (Figure 3). First, the signal was resampled to 64 Hz because
high sampling rates are unnecessary for EOG analysis (the signal was originally recorded at
2048 Hz). Second, eye blinks in the signals were removed using the maximum summation
of the first derivative within a window (MSDW) [
24
]. The MSDW filter generates two
sequences: the filtered signal of emphasizing eye blinks and selected window-sizes (
W
) at
each data point. The MSDW filter utilizes a set of a simple filter (
F
), known as the SDW
(summation of the first derivative within a window). An SDW filter with a window size of
Wis defined as follows:
FW(t)=S(t)S(tW), (1)
where
S(t)
is the
t
-th sample of the original signal and
W
is the width of the sliding window.
For every time t, the following steps are performed to obtain an MSDW output:
Appl. Sci. 2021,11, 11036 4 of 10
Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 11
at 2048 Hz). Second, eye blinks in the signals were removed using the maximum summa-
tion of the first derivative within a window (MSDW) [24]. The MSDW filter generates two
sequences: the filtered signal of emphasizing eye blinks and selected window-sizes (𝑊) at
each data point. The MSDW filter utilizes a set of a simple filter (𝐹), known as the SDW
(summation of the first derivative within a window). An SDW filter with a window size
of 𝑊 is defined as follows:
𝐹𝑡  𝑆𝑡𝑆𝑡𝑊, (1)
where 𝑆𝑡 is the 𝑡-th sample of the original signal and 𝑊 is the width of the sliding
window. For every time 𝑡, the following steps are performed to obtain an MSDW output:
(1) Calculate SDWs with different 𝑊s, considering a typical eye blink range.
(2) Choose the maximum SDW at time 𝑡 as the output of MSDW if it satisfies the con-
ditions below.
a. The numbers of local minima and maxima are the same within the range of
𝑡, 𝑡  𝑊;
b. All the first derivatives from time 𝑡 to 𝑡𝑊1 should be within 𝑆′𝑡  𝑊 
1 and 𝑆′𝑡, where 𝑆′𝑡 is the first derivatives at time 𝑡.
The ranges of the eye-blink region (𝑅) were determined using the following equation:
𝑅𝑇𝑀𝑎𝑥
𝑊

,𝑇𝑀𝑖𝑛, (2)
where 𝑊 is the window-size at time 𝑡 from MSDW filter output, and 𝑇𝑀𝑎𝑥 and
𝑇𝑀𝑖𝑛 are the time points of the ith local maximum and minimum, respectively [24].
The integer value 𝑗 is determined to maximize 𝑀𝑎𝑥 𝑀𝑖𝑛
. The detected regions were
removed and interpolated using the beginning and end points of each range. Third, all
the signals were resized to have the same length because the eye-written characters were
recorded in varying time period from 1.69 to 23.51 s, and the use of convolutional layers
require all signals to have the same length. All the signals were resized to length 𝐿, where
𝐿 is the mean of all signal lengths of raw data. Finally, the signals were normalized such
that they were within a 1 × 1 size box in 2D space, keeping the aspect ratio unchanged.
Figure 3. Preprocessing flow.
The preprocessing procedure in [23] was used after the following simplification: the
saccade detection and crosstalk removal were removed. This is because convolutional net-
works can extract features by themselves, and additional feature extraction methods often
decrease the accuracy.
Figure 3. Preprocessing flow.
(1)
Calculate SDWs with different Ws, considering a typical eye blink range.
(2)
Choose the maximum SDW at time
t
as the output of MSDW if it satisfies the condi-
tions below.
a.
The numbers of local minima and maxima are the same within the range of
[t,tW];
b.
All the first derivatives from time
t
to
tW+
1 should be within
S0(tW+1)
and S0(t), where S0(t)is the first derivatives at time t.
The ranges of the eye-blink region (
R
) were determined using the following equation:
R=nhTMaxijWT(Maxij),T(Mini)io, (2)
where
Wt
is the window-size at time
t
from MSDW filter output, and
T(Maxi)
and
T(Mini)
are the time points of the ith local maximum and minimum, respectively [
24
]. The integer
value
j
is determined to maximize
MaxijMini
. The detected regions were removed
and interpolated using the beginning and end points of each range. Third, all the signals
were resized to have the same length because the eye-written characters were recorded in
varying time period from 1.69 to 23.51 s, and the use of convolutional layers require all
signals to have the same length. All the signals were resized to length
L
, where is the mean
of all signal lengths of raw data. Finally, the signals were normalized such that they were
within a 1 ×1 size box in 2D space, keeping the aspect ratio unchanged.
The preprocessing procedure in [
23
] was used after the following simplification: the
saccade detection and crosstalk removal were removed. This is because convolutional
networks can extract features by themselves, and additional feature extraction methods
often decrease the accuracy.
2.3. Deep Neural Network
A deep neural network model was proposed to train and recognize the eye-written
characters (Figure 4). The model was designed with an inception architecture inspired
by GoogLeNet [
25
]. The inception model was simplified from its original state because
complicated networks were easily overfitted as the data size was limited (only 540 eye-
written characters). As is shown in the figure, the network structure consists of four
convolution blocks, and two fully connected blocks, sequentially. A convolution block is
assembled with four convolutional layers in parallel, a concatenate layer, and a max-polling
layer. The kernel size of the four convolutional layers in a convolution block were set to
1, 3, 5, and 7, and the numbers of filters were set differently according to the position of
the block. The numbers of the filters at the first and second blocks were set to 8 and 16,
respectively; the numbers of the filters at the third and fourth blocks were set to 32. The
pooling and stride sizes were set to 3 for all the convolution block. The number of nodes at
Appl. Sci. 2021,11, 11036 5 of 10
the first and second fully connected layers were set to 30 and 10, respectively, and a dropout
layer with the rate of 0.5 was attached before each fully connected layer. Parameters such
as filter size, number of filters, and dropout ratios were set experimentally.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 11
2.3. Deep Neural Network
A deep neural network model was proposed to train and recognize the eye-written
characters (Figure 4). The model was designed with an inception architecture inspired by
GoogLeNet [25]. The inception model was simplified from its original state because com-
plicated networks were easily overfitted as the data size was limited (only 540 eye-written
characters). As is shown in the figure, the network structure consists of four convolution
blocks, and two fully connected blocks, sequentially. A convolution block is assembled
with four convolutional layers in parallel, a concatenate layer, and a max-polling layer.
The kernel size of the four convolutional layers in a convolution block were set to 1, 3, 5,
and 7, and the numbers of filters were set differently according to the position of the block.
The numbers of the filters at the first and second blocks were set to 8 and 16, respectively;
the numbers of the filters at the third and fourth blocks were set to 32. The pooling and
stride sizes were set to 3 for all the convolution block. The number of nodes at the first
and second fully connected layers were set to 30 and 10, respectively, and a dropout layer
with the rate of 0.5 was attached before each fully connected layer. Parameters such as
filter size, number of filters, and dropout ratios were set experimentally.
Figure 4. Proposed network structure.
2.4. Ensemble Method
The training results varied in every trial because of the randomness of deep neural
networks. Therefore, we employed an ensemble method to reduce the uncertainty and
improve the accuracy of the networks. We trained the networks 10 times using the same
training data and obtained 10 models with different weights. In the testing/inferencing
phase, 10 outputs were obtained from all the trained models when an eye-written charac-
ter was input. The final output was defined as the median of the 10 outputs. Because each
Figure 4. Proposed network structure.
2.4. Ensemble Method
The training results varied in every trial because of the randomness of deep neural
networks. Therefore, we employed an ensemble method to reduce the uncertainty and
improve the accuracy of the networks. We trained the networks 10 times using the same
training data and obtained 10 models with different weights. In the testing/inferencing
phase, 10 outputs were obtained from all the trained models when an eye-written character
was input. The final output was defined as the median of the 10 outputs. Because each
output was a vector with a length of 10, the median operator was applied to the scalars at
the same positions. This mechanism is illustrated in Figure 5.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 11
output was a vector with a length of 10, the median operator was applied to the scalars at
the same positions. This mechanism is illustrated in Figure 5.
Figure 5. Ensemble method.
2.5. Evaluation
The proposed method was evaluated in a writer-independent manner using a one-
subject leave-out validation approach. This enabled us to maximize the amount of training
data while completely separating the test data from the training data. We used the char-
acters written by 17 writers as training data and those of the remaining one writer as the
test data. This was repeated 18 times such that all the writers’ characters were used as the
test data once.
The method was evaluated with recognition accuracies, precision, recall, and F1 score,
which were calculated as follows:
Accuracy = TP/P, (3)
Precision = TP/(TP + FP), (4)
Recall = TP/(TP + FN), (5)
F1 score = 2TP/(2TP + FP + FN), (6)
where P denotes the number of total characters, and TP (true positives) denotes the num-
ber of correctly classified characters. The accuracy was utilized to evaluate overall classi-
fication performance, and the other metrics were utilized to evaluate classification perfor-
mance for each character. FP (false positives) and FN (false negatives) were calculated for
each target letter group. FP was the number of characters in the other letter groups which
were classified as the target letter, and FN was the number of the characters in the target
group which were classified as other letter groups by the trained network.
The Adam optimizer was used with the AMSGrad option [26], and the learning rate
was set to 0.001. The training was repeated for over 200 epochs with a batch size of 128.
The learning rate and number of epochs were derived experimentally using data from all
the writers. This does not mean that we optimized the parameters, but we found that the
accuracy was stable with the parameters after a number of trials.
3. Results
The proposed method achieved a 97.78% accuracy for 10 Arabic numbers in the
writer-independent validation, showing 12 errors among 540 characters. Table 1 compares
the results of the proposed method and the conventional methods in the literature. This
indicates that the proposed method increased the accuracy by employing a deep neural
network, reducing the error rates by approximately half, from 4.26% as reported in [23] to
Figure 5. Ensemble method.
Appl. Sci. 2021,11, 11036 6 of 10
2.5. Evaluation
The proposed method was evaluated in a writer-independent manner using a one-
subject leave-out validation approach. This enabled us to maximize the amount of training
data while completely separating the test data from the training data. We used the charac-
ters written by 17 writers as training data and those of the remaining one writer as the test
data. This was repeated 18 times such that all the writers’ characters were used as the test
data once.
The method was evaluated with recognition accuracies, precision, recall, and F1 score,
which were calculated as follows:
Accuracy = TP/P, (3)
Precision = TP/(TP + FP), (4)
Recall = TP/(TP + FN), (5)
F1 score = 2TP/(2TP + FP + FN), (6)
where P denotes the number of total characters, and TP (true positives) denotes the number
of correctly classified characters. The accuracy was utilized to evaluate overall classification
performance, and the other metrics were utilized to evaluate classification performance
for each character. FP (false positives) and FN (false negatives) were calculated for each
target letter group. FP was the number of characters in the other letter groups which were
classified as the target letter, and FN was the number of the characters in the target group
which were classified as other letter groups by the trained network.
The Adam optimizer was used with the AMSGrad option [
26
], and the learning rate
was set to 0.001. The training was repeated for over 200 epochs with a batch size of 128.
The learning rate and number of epochs were derived experimentally using data from all
the writers. This does not mean that we optimized the parameters, but we found that the
accuracy was stable with the parameters after a number of trials.
3. Results
The proposed method achieved a 97.78% accuracy for 10 Arabic numbers in the
writer-independent validation, showing 12 errors among 540 characters. Table 1compares
the results of the proposed method and the conventional methods in the literature. This
indicates that the proposed method increased the accuracy by employing a deep neural
network, reducing the error rates by approximately half, from 4.26% as reported in [
23
] to
2.22%. We can directly compare the results because the same dataset was used in [
23
] and
in this study.
It is difficult to compare the current results to those of previous studies other than [
23
]
because of the different experimental conditions. Instead of a direct comparison, we
indirectly compared the results using the accuracy differences between the DTW and deep
neural network (DNN) with the dataset from [
23
] because DTW was also used in previous
studies. Notably, the error rates dropped significantly from 7.59% to 2.22% after employing
DNN by the proposed method. Although the number of patterns in [
20
,
21
] were larger
compared to those in our datasets, the accuracy also increased when we employed DNN
instead of DTW.
It is remarkable that the proposed method achieved higher accuracies than the pre-
vious methods in a writer-independent manner. The network was trained from the data
from 17 writers and tested with the data from an unknown writer, which means that a
similar accuracy was expected when a new writer’s data were tested with the pretrained
model. This proves the results to be trustworthy because the training and test dataset
were independent of each other. Deep neural networks are commonly trained with a
large amount of training data, which implies that the accuracy could be improved with a
bigger dataset.
Appl. Sci. 2021,11, 11036 7 of 10
Table 1. Recognition accuracies of eye-written characters with different methods.
Method Character Set
(Number of Patterns)
Number of
Participants
Writer-Dependent/
Independent
Accuracies
(Metrics)
Heuristic [22]
Arabic numbers, arithmetic symbols
(14) 11 Independent 75.5 (believability)
DTW [21]English alphabets
(26) 20 Dependent 87.38% (F1score)
HMM [20]
Japanese katakana
(12)
6
Independent 86.5% (F1 score)
DTW [20] 77.6% (F1score)
DNN-HMM [20]Dependent 93.8% (accuracy)
GMM-HMM [20] 93.5% (accuracy)
DTW [23]
Arabic numbers
(10) 18 Independent
92.41% (accuracy)
DPW [23] 94.07% (accuracy)
DTW-SVM [23] 94.08% (accuracy)
DPW-SVM [23] 95.74% (accuracy)
DNN (proposed) Arabic numbers
(10) 18 Independent 97.78% (accuracy)
Table 2and Figure 6show the accuracy of the proposed method for each character.
Number 2 shows the lowest F1 score of 95.41%, and numbers 0, 3, 5, and 6 show accuracies
of over 99.0%. The errors are not concentrated in a certain pair of characters, but they are
distributed broadly over the confusion matrix.
Figure 6. Confusion matrix of the proposed method.
Figure 7shows an entire list of misrecognized characters. Many of the misrecognized
characters had additional eye movements (a, b, h, i, j, k) or long-term fixations (c, d, e, j, k)
at certain points in the middle. Some characters were written differently to their references
and had distorted shapes (d, i, k).
Appl. Sci. 2021,11, 11036 8 of 10
Table 2. Accuracies of the proposed method over characters.
Precision Recall F1score
0 100.00 98.15 99.07
1 98.11 96.30 97.20
2 94.55 96.30 95.41
3 100.00 98.15 99.07
4 96.43 100.00 98.18
5 100.00 98.15 99.07
6 98.18 100.00 99.08
7 98.08 94.44 96.23
8 94.64 98.15 96.36
9 98.15 98.15 98.15
Table 3summarizes the recognition accuracies of the participants with different classi-
fiers. Evidently, the proposed method improved the accuracy for most of the participants.
There were two cases in which the accuracy decreased: participant #10 (Figure 7c–e) and
#17 (Figure 7i,j). The errors were because of the long-term fixation of the eyes during eye
writing, as shown in Figure 7.
Table 3. Recognition accuracy for each participant with different classifiers.
Participant Number DPW + SVM [23] DPW [23] DTW + SVM [23] DTW [23] Proposed
1 96.67 93.33 96.67 96.67 100.00
2 83.33 90.00 83.33 83.33 93.33
3 90.00 100.00 86.67 93.33 100.00
4 96.67 93.33 86.67 96.67 100.00
5 100.00 100.00 100.00 100.00 100.00
6 100.00 100.00 100.00 93.33 100.00
7 100.00 96.67 90.00 86.67 100.00
8 100.00 93.33 100.00 96.67 100.00
9 100.00 96.67 96.67 96.67 100.00
10 93.33 96.67 96.67 96.67 90.00
11 96.67 93.33 96.67 96.67 100.00
12 96.67 90.00 100.00 93.33 100.00
13 96.67 93.33 90.00 93.33 93.33
14 96.67 90.00 93.33 86.67 96.67
15 100.00 100.00 96.67 100.00 100.00
16 96.67 90.00 90.00 83.33 100.00
17 93.33 90.00 96.67 80.00 93.33
18 86.67 86.67 93.33 90.00 96.67
Avg 95.74 94.07 94.08 92.41 97.96
SD 4.83 4.21 5.18 6.03 3.26
It is evident that the accuracy increased through the trials (Table 4). There were
eight errors in the first trial for all participants but only two in the second and third trials.
This shows that the participants were accustomed to the eye-writing process after a short
practice, which is beneficial for an HCI tool.
Table 4. Number of errors and accuracies through the trials.
Trials 1 2 3
Number of errors 8 2 2
Accuracies 95.56 98.89 98.89
Appl. Sci. 2021,11, 11036 9 of 10
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11
7 98.08 94.44 96.23
8 94.64 98.15 96.36
9 98.15 98.15 98.15
Figure 6. Confusion matrix of the proposed method.
Figure 7 shows an entire list of misrecognized characters. Many of the misrecognized
characters had additional eye movements (a, b, h, i, j, k) or long-term fixations (c, d, e, j, k)
at certain points in the middle. Some characters were written differently to their references
and had distorted shapes (d, i, k).
Figure 7. Misrecognized eye-written characters from (a) to (k). A
B in each subfigure denotes the number A misclassified
as B.
Table 3 summarizes the recognition accuracies of the participants with different clas-
sifiers. Evidently, the proposed method improved the accuracy for most of the partici-
pants. There were two cases in which the accuracy decreased: participant #10 (Figure 7c–
e) and #17 (Figure 7i,j). The errors were because of the long-term fixation of the eyes dur-
ing eye writing, as shown in Figure 7.
Figure 7.
Misrecognized eye-written characters from (
a
) to (
k
). A
B in each subfigure denotes the
number A misclassified as B.
4. Conclusions
This paper proposed a method with a deep neural network model and ensemble
structure to recognize eye-written characters for eye-based HCI. The proposed method
achieved a 97.78% accuracy for Arabic numerals eye-written by 18 healthy participants,
which reduced the error rates to approximately half that of the conventional methods.
The performance of the proposed method could be effective for new users outside of our
dataset because the validation was conducted in a writer-independent manner. Similarly,
the accuracy is expected to increase if additional data are used to train the network.
One of the limitations of this study is that the proposed method was validated with
a single dataset of Arabic numbers only. In future work, the proposed method should
be validated with different datasets such as English alphabets and other complicated
characters such as Japanese and Korean characters. An automatic triggering system for eye
writing is another potential research topic for eye-based HCI.
Author Contributions:
Conceptualization, J.S. and W.-D.C.; methodology, W.-D.C.; implementation,
W.-D.C.; experimental validation, J.-H.C.; investigation, J.S.; writing—original draft preparation,
W.-D.C. and J.-H.C.; writing—review and editing, W.-D.C. and J.S. All authors have read and agreed
to the published version of the manuscript.
Funding:
This work was supported by a grant from the Basic Science Research Program through
the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-
2020R1F1A1077162); and Ministry of Trade, Industry & Energy (MOTIE, Korea) under Industrial
Technology Innovation Program. No.20016150, development of low latency multi-functional mi-
crodisplay controller for eye-wear devices.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Sonoda, T.; Muraoka, Y. A letter input system based on handwriting gestures. Electron. Commun. Jpn. Part III Fundam. Electron.
Sci. (Engl. Transl. Denshi Tsushin Gakkai Ronbunshi) 2006,89, 53–64. [CrossRef]
2.
Lee, K.-S. EMG-based speech recognition using hidden markov models with global control variables. IEEE Trans. Biomed. Eng.
2008,55, 930–940. [CrossRef] [PubMed]
Appl. Sci. 2021,11, 11036 10 of 10
3.
Shin, J. On-line cursive hangul recognition that uses DP matching to detect key segmentation points. Pattern Recognit.
2004,37, 2101–2112. [CrossRef]
4. Chang, W.-D. Electrooculograms for human–computer interaction: A review. Sensors 2019,19, 2690. [CrossRef]
5.
Sherman, W.R.; Craig, B.A. Input: Interfacing the Participants with the Virtual World Understanding. In Virtual Reality;
Morgan Kaufmann: Cambridge, MA, USA, 2018; pp. 190–256. ISBN 9780128183991.
6.
Wolpaw, J.R.; McFarland, D.J.; Neat, G.W.; Forneris, C.A. An EEG-based brain-computer interface for cursor control.
Electroencephalogr. Clin. Neurophysiol. 1991,78, 252–259. [CrossRef]
7.
Han, J.-S.; Bien, Z.Z.; Kim, D.-J.; Lee, H.-E.; Kim, J.-S. Human-machine interface for wheelchair control with EMG and its
evaluation. In Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(IEEE cat. No. 03CH37439), Cancun, Mexico, 17–21 September 2003; IEEE: Manhattan, NY, USA, 2003; Volume 2, pp. 1602–1605.
8.
Jang, S.-T.; Kim, S.-R.; Chang, W.-D. Gaze tracking of four direction with low-price EOG measuring device. J. Korea Converg. Soc.
2018,9, 53–60.
9.
Malmivuo, J.; Plonsey, R. Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields;
Oxford University Press
:
New York, NY, USA, 1995.
10.
Sáiz-Manzanares, M.C.; Pérez, I.R.; Rodríguez, A.A.; Arribas, S.R.; Almeida, L.; Martin, C.F. Analysis of the learning process
through eye tracking technology and feature selection techniques. Appl. Sci. 2021,11, 6157. [CrossRef]
11.
Scalera, L.; Seriani, S.; Gallina, P.; Lentini, M.; Gasparetto, A. Human–robot interaction through eye tracking for artistic drawing.
Robotics 2021,10, 54. [CrossRef]
12.
Wöhle, L.; Gebhard, M. Towards robust robot control in cartesian space using an infrastructureless head-and eye-gaze interface.
Sensors 2021,21, 1798. [CrossRef] [PubMed]
13.
Dziemian, S.; Abbott, W.W.; Aldo Faisal, A. Gaze-based teleprosthetic enables intuitive continuous control of complex robot arm
use: Writing & drawing. In Proceedings of the 6th IEEE International Conference on Biomedical Robotics and Biomechatronics,
Singapore, 26–29 June 2016; IEEE: Singapore, 2016; pp. 1277–1282.
14.
Barea, R.; Boquete, L.; Mazo, M.; López, E. Wheelchair guidance strategies using EOG. J. Intell. Robot. Syst. Theory Appl.
2002,34, 279–299. [CrossRef]
15.
Wijesoma, W.S.; Wee, K.S.; Wee, O.C.; Balasuriya, A.P.; San, K.T.; Soon, K.K. EOG based control of mobile assistive platforms for
the severely disabled. In Proceedings of the IEEE Conference Robotics and Biomimetics, Shatin, China, 5–9 July 2005; pp. 490–494.
16.
LaCourse, J.R.; Hludik, F.C.J. An eye movement communication-control system for the disabled. IEEE Trans. Biomed. Eng.
1990,37, 1215–1220. [CrossRef] [PubMed]
17.
Kim, M.R.; Yoon, G. Control signal from EOG analysis and its application. World Acad. Sci. Eng. Technol. Int. J. Electr. Electron.
Sci. Eng. 2013,7, 864–867.
18.
Kaufman, A.E.; Bandopadhay, A.; Shaviv, B.D. An Eye Tracking Computer User Interface. In Proceedings of the IEEE Symposium
on Research Frontiers in Virtual Reality, San Jose, CA, USA, 23–26 October 1993; pp. 120–121.
19.
Yan, M.; Tamura, H.; Tanno, K. A study on gaze estimation system using cross-channels electrooculogram signals. In Proceedings of the
International MultiConference of Engineers and Computer Scientists, Hong Kong, China, 12–14 March 2014; Volume I, pp. 112–116.
20.
Fang, F.; Shinozaki, T. Electrooculography-based continuous eye-writing recognition system for efficient assistive communication
systems. PLoS ONE 2018,13, e0192684. [CrossRef] [PubMed]
21.
Lee, K.-R.; Chang, W.-D.; Kim, S.; Im, C.-H. Real-time “eye-writing” recognition using electrooculogram (EOG). IEEE Trans.
Neural Syst. Rehabil. Eng. 2017,25, 37–48. [CrossRef]
22.
Tsai, J.-Z.; Lee, C.-K.; Wu, C.-M.; Wu, J.-J.; Kao, K.-P. A feasibility study of an eye-writing system based on electro-oculography.
J. Med. Biol. Eng. 2008,28, 39–46.
23.
Chang, W.-D.; Cha, H.-S.; Kim, D.Y.; Kim, S.H.; Im, C.-H. Development of an electrooculogram-based eye-computer interface for
communication of individuals with amyotrophic lateral sclerosis. J. Neuroeng. Rehabil. 2017,14, 89. [CrossRef] [PubMed]
24.
Chang, W.-D.; Cha, H.-S.; Kim, K.; Im, C.-H. Detection of eye blink artifacts from single prefrontal channel electroencephalogram.
Comput. Methods Programs Biomed. 2016,124, 19–30. [CrossRef] [PubMed]
25.
Szegedy, C.; Reed, S.; Sermanet, P.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–12.
26. Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of Adam and beyond. In Proceedings of the 6th International Conference on
Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–23.
Article
Brain visual dynamics encode rich functional and biological patterns of the neural system, and if decoded, are of great promise for many applications such as intention understanding, cognitive load quantization and neural disorder measurement. We here focus on the understanding of the brain visual dynamics for the Amyotrophic lateral sclerosis (ALS) population, and propose a novel system that allows these so-called 'lock-in' patients to 'speak' with their brain visual movements. More specifically, we propose an intelligent system to decode the eye bio-potential signal, Electrooculogram (EOG), thereby understanding the patients' intention. We first propose to leverage a deep learning framework for automatic feature learning and classification of the brain visual dynamics, aiming to translate the EOG to meaningful words. We afterwards design and develop an edge computing platform on the smart phone, which can execute the deep learning algorithm, visualize the brain visual dynamics, and demonstrate the edge inference results, all in real-time. Evaluated on 4,500 trials of brain visual movements performed by multiple users, our novel system has demonstrated a high eye-word recognition rate up to 90.47%. The system is demonstrated to be intelligent, effective and convenient for decoding brain visual dynamics for ALS patients. This research thus is expected to greatly advance the decoding and understanding of brain visual dynamics, by leveraging machine learning and edge computing innovations.
Article
Full-text available
Developing a hum–computer interface (HCI) is essential, especially for those that have spinal cord injuries or paralysis, because of the difficulties associated with the application of conventional devices and systems. Eye-writing is an HCI that uses eye movements for writing characters such that the gaze movements form letters. In addition, it is a promising HCI because it can be utilized even when voices and hands are inaccessible. However, eye-writing HCI has low accuracy and encounters difficulties in obtaining data. This study proposes a method for recognizing eye-written characters accurately and with limited data. The proposed method is constructed using a Siamese network, an attention mechanism, and an ensemble algorithm. In the experiment, the proposed method successfully classified the eye-written characters (Arabic numbers) with high accuracy (92.78%) when the ratio of training to test data was 2:1. In addition, the method was tested as the ratio changed, and 80.80% accuracy was achieved when the number of training data was solely one-tenth of the test data.
Article
Full-text available
In recent decades, the use of technological resources such as the eye tracking methodology is providing cognitive researchers with important tools to better understand the learning process. However, the interpretation of the metrics requires the use of supervised and unsupervised learning techniques. The main goal of this study was to analyse the results obtained with the eye tracking methodology by applying statistical tests and supervised and unsupervised machine learning techniques, and to contrast the effectiveness of each one. The parameters of fixations, saccades, blinks and scan path, and the results in a puzzle task were found. The statistical study concluded that no significant differences were found between participants in solving the crossword puzzle task; significant differences were only detected in the parameters saccade amplitude minimum and saccade velocity minimum. On the other hand, this study, with supervised machine learning techniques, provided possible features for analysis, some of them different from those used in the statistical study. Regarding the clustering techniques, a good fit was found between the algorithms used (k-means ++, fuzzy k-means and DBSCAN). These algorithms provided the learning profile of the participants in three types (students over 50 years old; and students and teachers under 50 years of age). Therefore, the use of both types of data analysis is considered complementary.
Article
Full-text available
In this paper, authors present a novel architecture for controlling an industrial robot via an eye tracking interface for artistic purposes. Humans and robots interact thanks to an acquisition system based on an eye tracker device that allows the user to control the motion of a robotic manipulator with his gaze. The feasibility of the robotic system is evaluated with experimental tests in which the robot is teleoperated to draw artistic images. The tool can be used by artists to investigate novel forms of art and by amputees or people with movement disorders or muscular paralysis, as an assistive technology for artistic drawing and painting, since, in these cases, eye motion is usually preserved.
Article
Full-text available
This paper presents a lightweight, infrastructureless head-worn interface for robust and real-time robot control in Cartesian space using head- and eye-gaze. The interface comes at a total weight of just 162 g. It combines a state-of-the-art visual simultaneous localization and mapping algorithm (ORB-SLAM 2) for RGB-D cameras with a Magnetic Angular rate Gravity (MARG)-sensor filter. The data fusion process is designed to dynamically switch between magnetic, inertial and visual heading sources to enable robust orientation estimation under various disturbances, e.g., magnetic disturbances or degraded visual sensor data. The interface furthermore delivers accurate eye- and head-gaze vectors to enable precise robot end effector (EFF) positioning and employs a head motion mapping technique to effectively control the robots end effector orientation. An experimental proof of concept demonstrates that the proposed interface and its data fusion process generate reliable and robust pose estimation. The three-dimensional head- and eye-gaze position estimation pipeline delivers a mean Euclidean error of 19.0±15.7 mm for head-gaze and 27.4±21.8 mm for eye-gaze at a distance of 0.3–1.1 m to the user. This indicates that the proposed interface offers a precise control mechanism for hands-free and full six degree of freedom (DoF) robot teleoperation in Cartesian space by head- or eye-gaze and head motion.
Article
Full-text available
Eye movements generate electric signals, which a user can employ to control his/her environment and communicate with others. This paper presents a review of previous studies on such electric signals, that is, electrooculograms (EOGs), from the perspective of human–computer interaction (HCI). EOGs represent one of the easiest means to estimate eye movements by using a low-cost device, and have been often considered and utilized for HCI applications, such as to facilitate typing on a virtual keyboard, moving a mouse, or controlling a wheelchair. The objective of this study is to summarize the experimental procedures of previous studies and provide a guide for researchers interested in this field. In this work the basic characteristics of EOGs, associated measurements, and signal processing and pattern recognition algorithms are briefly reviewed, and various applications reported in the existing literature are listed. It is expected that EOGs will be a useful source of communication in virtual reality environments, and can act as a valuable communication tools for people with amyotrophic lateral sclerosis.
Article
Full-text available
Human-computer interface systems whose input is based on eye movements can serve as a means of communication for patients with locked-in syndrome. Eye-writing is one such system; users can input characters by moving their eyes to follow the lines of the strokes corresponding to characters. Although this input method makes it easy for patients to get started because of their familiarity with handwriting, existing eye-writing systems suffer from slow input rates because they require a pause between input characters to simplify the automatic recognition process. In this paper, we propose a continuous eye-writing recognition system that achieves a rapid input rate because it accepts characters eye-written continuously, with no pauses. For recognition purposes, the proposed system first detects eye movements using electrooculography (EOG), and then a hidden Markov model (HMM) is applied to model the EOG signals and recognize the eye-written characters. Additionally, this paper investigates an EOG adaptation that uses a deep neural network (DNN)-based HMM. Experiments with six participants showed an average input speed of 27.9 character/min using Japanese Katakana as the input target characters. A Katakana character-recognition error rate of only 5.0% was achieved using 13.8 minutes of adaptation data.
Article
Full-text available
Background Electrooculogram (EOG) can be used to continuously track eye movements and can thus be considered as an alternative to conventional camera-based eye trackers. Although many EOG-based eye tracking systems have been studied with the ultimate goal of providing a new way of communication for individuals with amyotrophic lateral sclerosis (ALS), most of them were tested with healthy people only. In this paper, we investigated the feasibility of EOG-based eye-writing as a new mode of communication for individuals with ALS. Methods We developed an EOG-based eye-writing system and tested this system with 18 healthy participants and three participants with ALS. We also applied a new method for removing crosstalk between horizontal and vertical EOG components. All study participants were asked to eye-write specially designed patterns of 10 Arabic numbers three times after a short practice session. Results Our system achieved a mean recognition rates of 95.93% for healthy participants and showed recognition rates of 95.00%, 66.67%, and 93.33% for the three participants with ALS. The low recognition rates in one of the participants with ALS was mainly due to miswritten letters, the number of which decreased as the experiment proceeded. Conclusion Our proposed eye-writing system is a feasible human-computer interface (HCI) tool for enabling practical communication of individuals with ALS.
Conference Paper
Eye tracking is a powerful mean for assistive technologies for people with movement disorders, paralysis and amputees. We present a highly intuitive eye tracking-controlled robot arm operating in 3-dimensional space based on the user's gaze target point that enables tele-writing and drawing. The usability and intuitive usage was assessed by a “tele” writing experiment with 8 subjects that learned to operate the system within minutes of first time use. These subjects were naive to the system and the task and had to write three letters on a white board with a white board pen attached to the robot arm's endpoint. The instructions are to imagine they were writing text with the pen and look where the pen would be going, they had to write the letters as fast and as accurate as possible, given a letter size template. Subjects were able to perform the task with facility and accuracy, and movements of the arm did not interfere with subjects ability to control their visual attention so as to enable smooth writing. On the basis of five consecutive trials there was a significant decrease in the total time used and the total number of commands sent to move the robot arm from the first to the second trial but no further improvement thereafter, suggesting that within writing 6 letters subjects had mastered the ability to control the system. Our work demonstrates that eye tracking is a powerful means to control robot arms in closed-loop and real-time, outperforming other invasive and non-invasive approaches to Brain-Machine-Interfaces in terms of calibration time (<2 minutes), training time (<10 minutes), interface technology costs. We suggests that gaze-based decoding of action intention may well become one of the most efficient ways to interface with robotic actuators - i.e. Brain-Robot-Interfaces - and become useful beyond paralysed and amputee users also for the general teleoperation of robotic and exoskeleton in human augmentation.
Article
Eye movements can be used as alternative inputs for human-computer interface (HCI) systems such as virtual or augmented reality systems as well as new communication ways for patients with locked-in syndrome. In this study, we developed a real-time electrooculogram (EOG)-based eye-writing recognition system, with which users can write predefined symbolic patterns with their volitional eye movements. For the 'eye-writing' recognition, the proposed system first reconstructs the eye-written traces from EOG waveforms in real-time; then, the system recognizes the intended symbolic inputs with a reliable recognition rate by matching the input traces with the trained eye-written traces of diverse input patterns. Experiments with 20 participants showed an average recognition rate of 87.38 % (F1 score) for 29 different symbolic patterns (26 lower case alphabet characters and three functional input patterns representing Space, Backspace, and Enter keys), demonstrating the promise of our EOG-based eye-writing recognition system in practical scenarios.