ArticlePDF Available

A Hand-Gesture Recognition System Using Image Processing to Translate Indian Sign Language Alphabets to Text

Authors:

Abstract

Most interpersonal communication takes place through the use of languages. Specially gifted people are compelled to employ unconventional forms of communication, such as sign language. Communication amongst people with special needs is made possible by this, but communication with the general public is restricted. This gap in communication may lead to misinterpretation of information or tone. In order to address the aforementioned issues, a method of translating Indian sign language to text and speech is implemented in this work. It method records the input as serially rendered Indian Sign Language alphabets using a webcam or video file. Then, using the machine learning model, the data is processed to find distinct alphabets. Then, this can be translated into other languages, such as regional Indian languages. The entire workflow is enabled using technologies like opencv2 for image recognition and Keras for the AI/ML model.
Indian Journal of Natural Sciences www.tnsroindia.org.in ©IJONS
Vol.13 / Issue 76 / February / 2023 International Bimonthly (Print) ISSN: 0976 – 0997
53472
Yashoda
A Hand-Gesture Recognition System Using Image Processing to
Translate Indian Sign Language Alphabets to Text
Abhay MS1, Ashwin Hebbar1, Sayan Ghosh1 and K.Kalaiselvi2*
1Student, Kristu Jayanti College, Ban galore, Bengaluru, Karnataka, India.
2Faculty, Department of Computer Science, Kristu Jayanti College, Bangalure, Karnataka, India
Received: 24 Dec 2022 Revised: 04 Jan 2023 Accepted: 24 Jan 2023
*Address for Correspondence
K.Kalaiselvi
Faculty, Department of Computer Science,
Kristu Jayanti College, Bangalure,
Karnataka, India
This is an Open Access Journal / article distributed under the terms of the Creative Commons Attribution License
(CC BY-NC-ND 3.0) which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited. All rights reserved.
Most interpersonal communication takes place through the use of languages. Specially gifted people are
compelled to employ unconventional forms of communication, such as sign language. Communication
amongst people with special needs is made possible by this, but communication with the general public
is restricted. This gap in communication may lead to misinterpretation of information or tone. In order to
address the aforementioned issues, a method of translating Indian sign language to text and speech is
implemented in this work. It method records the input as serially rendered Indian Sign Language
alphabets using a webcam or video file. Then, using the machine learning model, the data is processed to
find distinct alphabets. Then, this can be translated into other languages, such as regional Indian
languages. The entire workflow is enabled using technologies like opencv2 for image recognition and
Keras for the AI/ML model.
Keywords: Image Processing, OpenCV, Tensorflow2, Keras, Machine Learning, Computer Intelligence,
Transfer Learning
INTRODUCTION
According to the Census of India 2011, there are approximately 2.68 million deaf people in India. It is worth noting
that this number may not include individuals with mild or moderate hearing loss, or those who do not self-identify
as deaf. The Census of India does not provide separate data on the number of people who are dumb. A sizable
section of the deaf and hard of hearing population in India uses Indian Sign Language (ISL), a natural language, as a
form of communication. ISL users frequently experience communication issues and social isolation as a result of the
hearing population's limited familiarity with the language. Growing efforts have been made in recent years to close
this communication gap by creating technology that instantly translates ISL into text and speech. One such
technology is a hand-gesture recognition system that translates and interprets ISL using image processing. The
ABSTRACT
RESEARCH ARTICLE
Indian Journal of Natural Sciences www.tnsroindia.org.in ©IJONS
Vol.13 / Issue 76 / February / 2023 International Bimonthly (Print) ISSN: 0976 – 0997
53473
capacity of people who use ISL to interact with the hearing population and fully participate in society might be
significantly improved by this kind of technology. We describe a hand-gesture recognition system in this study that
is intended exclusively for the conversion of ISL into text and voice. The system's conception and execution, as well
as the findings of the analyses is outlined. This technology has the potential to significantly enhance the lives of
people who use ISL as their primary form of communication and marks an important development in the field of ISL
translation.
Literature Review
There have been multiple previous approaches to this domain. Most of the approaches can be broadly categorised by
the way of detecting the gestures by the system-
1. Systems that use a video-based input device like a camera to capture the user's hand gestures utilising computer
vision.
2. Systems that read information from the user's arm and upper torso using physical sensors.
Sensor-based systems are constrained in how the user may move, but optical systems demand more processing
power and are more frequently plagued by noise. Despite achieving more accuracy, optical systems are less
expensive to create and employ since they may be used with any general-purpose computer with a respectable
processing power. Due to the ubiquitous availability of laptops and desktop computers with built-in webcams, this
endeavour is practically free. There are numerous such systems that use skeletal sensors, lidar, infrared, ultrasonic,
magnetic imaging, and cameras to recognise sign language. Most systems just employ hand alphabets, not complete
signals, which combine facial gestures, motion, and body language to create a full context. Numerous commercial
solutions that use a glove of some kind to recognise sign language and translate it into English have also been
granted patents. With millions of people experiencing communication challenges as a result of their handicap, work
in this sector has generally been stagnant, but particularly so for Indian Sign Language. Additionally, sensor-based
solutions lose their utility in low-income areas where it is more difficult to buy specialised equipment. The most
advantageous solution for this would be one that runs on a general-purpose computer and operating system.
METHODOLOGY
Hardware and Software Tools
In this paper, A hand-gesture recognition system specifically designed for the translation of ISL to text is presented.
The hardware components of the system include a camera for capturing images of hand gestures live and sending it
to the model for prediction, a processor for analysing the images and performing the required probability
calculations, and a display for outputting the translated text and speech. The software components consist of image
processing software OpenCV for interpreting the hand gestures and identifying the position, angle, and orientation
of the hands and sending it to the model, and an operating system to manage the hardware and software and
provide a user interface.
Data Collection and Annotation
In order to develop and evaluate the hand-gesture recognition system, A dataset of hand gestures annotated with
their corresponding ISL translations was required. For this, we obtained a dataset from Kaggle, a popular online
platform for data science and machine learning. The Kaggle dataset consisted of a large number of images of hand
gestures made by native ISL users, along with their corresponding ISL translations. The Kaggle dataset was carefully
curated to ensure that it contained a diverse range of hand gestures and covered a wide range of ISL translations.
The data was pre-annotated to label the hand gestures in the dataset with their corresponding ISL translations. This
process of data collection and annotation was essential for training and evaluating the performance of the hand-
gesture recognition system. Overall, the Kaggle dataset proved to be an invaluable resource for the research and
allowed to develop and evaluate a high-quality hand-gesture recognition system for the translation of ISL to text and
speech.
Abhay
Indian Journal of Natural Sciences www.tnsroindia.org.in ©IJONS
Vol.13 / Issue 76 / February / 2023 International Bimonthly (Print) ISSN: 0976 – 0997
53474
Details on the Dataset Used
In the dataset, there exists 1200 samples of each letter and number of the English Alphabet (A-Z & 1-9). Additionally,
we have also split the dataset into Training, Testing and Validations Sets. The Testing Set consists of 90% of all
images, while the Testing and Validation sets have 5% each dedicated for them. It should be noted that all the Test,
Validation and Training sets contain mutually exclusive images.
Model, Training and Evaluation
In this paper, A transfer learning approach is presented using a pre-trained ‘InceptionResnetV2’ Machine Learning
Model. Transfer Learning is a methodology used in Machine Learning to retrain an existing model to better suit a
given specific use case. InceptionResnetV2 is a Convolutional Neural Network that is pre-trained on more than a
million images. It is 164 layers deep and has the ability to classify upto1000 types of categories. We will be passing a
small subset of the training set for the model to adapt and specifically learn to recognise the dataset at hand. A
supervised learning method was used in the already pre-trained model, and a sizable number of hand gesture
examples and their accompanying ISL translations were fed into the system. These examples were used by the
system to train it to identify and categorise various hand gestures and to produce the correct ISL translation for each
gesture. A number of tests were employed to assess the system's performance. First, we tested the system's ability to
generalise from training data to cases it had never seen before without losing accuracy or confidence. High accuracy
on the test set demonstrated that the system had successfully learned from the training data. A thorough
investigation of the system's performance on several subsets of the data in addition to the train-test evaluation was
also done. This allowed us to spot any gaps in the system's capacity to interpret particular ISL translations or
recognise particular hand motions. Having a solid pre-trained model like the InceptionResnetV2 made things
simpler here, as the model learns and adapts quickly to the dataset at hand.
Re-training the model to suit our dataset
Since we have a pretrained InceptionResnetV2 model, The python library called Keras will be used, which is built on
top of Tensorflow2 to do the retraining. In this case, instead of the 1200 images for each of the 35 categories, we will
only be using 400 images instead. It was found that 5 epocs (rounds) of training for the model, with the given
dataset, reached a satisfactory level of minimising loss and maximising accuracy. Further increasing the number of
epochs did not really help with loss and accuracy, and the model stagnated beyond 5 epocs. We used a few
parameters, such as Training Loss, Validation Loss, and the results for the same will be discussed further below.
Limitations and Challenges
Although the methodology is quite good at predicting from the given dataset, the dataset itself is very close to ideal,
with decent lighting and accurate labelling for each image. How this model performs for images that have
particularly bad lighting conditions is yet to be evaluated. Our Model is not fast enough to do real-time translation,
at least for now. This requires further optimisation in our model and perhaps even a total rehauling of the model to
make it more lighter and have less parameters. InceptionResnetV2 is a very heavy model and is not fast enough to be
used in real-time translation applications with today’s hardware. Although it should be noted, that with handheld
devices getting more and more powerful with every generation, and some even getting a dedicated neural
processing hardware, it likely won’t be an issue in the future. There are also some disadvantages with transfer
learning, one of the main issues with transfer learning is “Negative Learning”, where, due to the differences of the
data between the pre-trained model and the new data that will be fed into it, the weights in the model might go
rogue and result in a bad model, which might not produce good results. Something like this is especially important
to keep in mind for critical applications like predicting Sign Language to text. As mistranslations can be fatal in
certain situations.
RESULTS
After training the model in the above-mentioned methodology, we have obtained the following results:
Abhay
Indian Journal of Natural Sciences www.tnsroindia.org.in ©IJONS
Vol.13 / Issue 76 / February / 2023 International Bimonthly (Print) ISSN: 0976 – 0997
53475
Loss and Accuracy
In the above image, the Loss and Accuracy of our pre-trained model is illustrated. As expected, the accuracy on the
training set was very high (Close to 100%) even on the first epoch. While on the validation set, we see a steady
increase in the accuracy, which is higher than 99.1% as we approach the 5th epoch. A similar trend can be seen in loss
as well, where the loss decreases with every epoch. Loss is the measure of how well the model fits a never-before-
seen data, and it is safe to say that we have loss numbers as low as 0.3 as we approach the 5th epoch.
Confusion Matrix, and Most Mis-predicted Classes
In the current Model, the letters V, O and C were the most mis-predicted. Here is a confusion matrix for all the
classes. As seen from Figs 4 and 5, almost all the mis-predictions are of the letter I, which is either mis-predicted as
V, O, or C. Upon further inspection, we think that this might be because of how the model interprets these images on
a fundamental level. The errors are more prevalent in the signs for the letters O and C, both of whose signs can be
shown with a single ‘stroke’, talking from a purely mathematical and 2-d image interpretational standpoint.
CONCLUSION
In this study, a method to predict the alphabet signs from the Indian Sign Language using Keras, a machine learning
package built on top of Tensorflow2, was explored. As an outcome, a practical application that can be deployed for
testing purposes in the creation of technology that benefits differently abled individuals in India and throughout the
world has been produced. A Transfer Learning approach was used to achieve the same. Other similar works which
aim to do the same, have taken a hardware approach, while some others will have the need for expensive camera
equipment to do the prediction. While our method works well with even something like a phone camera, and
webcam of a laptop, it still uses quite a heavy machine learning model to do the prediction which makes it highly
accurate and suitable for testing more of such technology for the differently abled.
Future Prospects
The future prospects for such a solution are endless. With a lighter and more specialised model, it is possible to make
this application run in real time, which will prove to be very helpful for the differently abled among us. Depending
on the model and implementation, the application can either take the form of a mobile or desktop application that
can be used to communicate better with everybody. It is also possible to use this model to adapt to other sign
languages from around the world to help the world. As we grow in population with time, there is always a need for
such a technology in all corners of the world, for people from all walks of life. With Technology becoming cheaper by
the day, a solution like this will aid in making the world become a more accessible place.
REFERENCES
1. Hernandez-Rebollar, J. L., Kyriakopoulos, N., & Lindeman, R. W. (2004, May). A new instrumented approach for
translating American Sign Language into sound and text. In Sixth IEEE International Conference on Automatic Face
and Gesture Recognition, 2004. Proceedings. (pp. 547-552). IEEE.
2. Escudeiro, P., Escudeiro, N., Reis, R., Lopes, J., Norberto, M., Baltasar, A. B., ... &Bidarra, J. (2015). Virtual sign–a
real time bidirectional translator of portuguese sign language. Procedia Computer Science, 67, 252-262.
3. Sarkar, B., Datta, K., Datta, C. D., Sarkar, D., Dutta, S. J., Roy, I. D., ... & Paul, A. (2009, December). A translator for
Bangla text to sign language. In 2009 Annual IEEE India Conference (pp. 1-4). IEEE.
4. Kunjumon, J., &Megalingam, R. K. (2019, November). Hand gesture recognition system for translating Indian
sign language into text and speech. In 2019 International Conference on Smart Systems and Inventive Technology
(ICSSIT) (pp. 14-18). IEEE.
5. Truong, V. N., Yang, C. K., & Tran, Q. V. (2016, October). A translator for American sign language to text and
speech. In 2016 IEEE 5th Global Conference on Consumer Electronics (pp. 1-2). IEEE.
6. Morrissey, S., & Way, A. (2005). An example-based approach to translating sign language.
Abhay
Indian Journal of Natural Sciences www.tnsroindia.org.in ©IJONS
Vol.13 / Issue 76 / February / 2023 International Bimonthly (Print) ISSN: 0976 – 0997
53476
7. Elmahgiubi, M., Ennajar, M., Drawil, N., &Elbuni, M. S. (2015, June). Sign language translator and gesture
recognition. In 2015 Global Summit on Computer & Information Technology (GSCIT) (pp. 1-6). IEEE.
8. Kaggle Dataset: By PrathumArikeri, licensed under CC BY-SA 4.0
https://www.kaggle.com/datasets/prathumarikeri/indian-sign-language-isl
Fig 1
: Overview of the methodology
Fig 2
: Loss& Fig 2.2: Accuracy
Fig 3
: Accuracy
Fig 4
: Most mis
-
predicted classes
Fig 5
: Confusion Matrix with all the classes (A
-
Z & 1
-
9)
Abhay
Article
Full-text available
Users of sign languages are often forced to use a language in which they have reduced competence simply because documentation in their preferred format is not available. While some research exists on translating between natural and sign languages, we present here what we believe to be the first attempt to tackle this problem using an example-based (EBMT) approach. Having obtained a set of English–Dutch Sign Language examples, we employ an approach to EBMT using the ‘Marker Hypothesis’ (Green, 1979), analogous to the successful system of (Way & Gough, 2003), (Gough & Way, 2004a) and (Gough & Way, 2004b). In a set of experiments, we show that encouragingly good translation quality may be obtained using such an approach.
Conference Paper
In the year 2001, Viola and Jones's study is a milestone in developing an algorithm capable of detecting human faces in real time. The original technique was only used for the face detection, but many researchers have applied it for the detection of many other objects such as eyes, mouths, car's number plates and traffic signs. Amongst them, the hand signs are also detected successfully. This paper proposed a system that can automatically detect static hand signs of alphabets in American Sign Language (ASL). To do that, we adopted the two combined concepts AdaBoost and Haar-like classifiers. In this work, to increase the accuracy of the system, we use a huge database for training process, and it generates impressive results. The translator was implemented and trained using a data set of 28000 samples of hand sign images, 1000 images for each hand sign of Positive training images in different scales, illumination, and the data set of 11100 samples of Negative images. All the Positive images were taken by the Logitech Webcam and the frames size were set on the VGA standard 640×480 resolution. Experiments show that our system can recognize all signs with a precision of 98.7%. Input of this system is live video and output is the text and speech.
Conference Paper
The hearing impaired community has traditionally found itself marginalized from the mainstream of the society. While, this can partly be attributed to the inability of the hearing impaired persons to fluently communicate orally, it is also true that traditional education has not been able to reach out to them by addressing their special needs and requirements. In this paper, we report on the development of a translator software, which partially offsets the absence of educational tools and appliances that the hearing impaired need for their education and communication. Using the translator, it is possible to translate text into gestures that a hearing impaired can understand. The hearing impaired can also gainfully utilize the software to develop written language skills. It can also be used as a tool to learn sign language.
A new instrumented approach for translating American Sign Language into sound and text
  • J L Hernandez-Rebollar
  • N Kyriakopoulos
  • R W Lindeman
Hernandez-Rebollar, J. L., Kyriakopoulos, N., & Lindeman, R. W. (2004, May). A new instrumented approach for translating American Sign Language into sound and text. In Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings. (pp. 547-552). IEEE.