ResearchPDF Available

Real-time Vernacular Sign Language Recognition using MediaPipe and Machine Learning

May 2021

May 2021

DOI:10.13140/RG.2.2.32364.03203

Authors:

Stevens Institute of Technology

The deaf-mute community have undeniable communication problems in their daily life. Recent developments in artificial intelligence tear down this communication barrier. The main purpose of this paper is to demonstrate a methodology that simplified Sign Language Recognition using MediaPipe’s open-source framework and machine learning algorithm. The predictive model is lightweight and adaptable to smart devices. Multiple sign language datasets such as American, Indian, Italian and Turkey are used for training purpose to analyze the capability of the framework. With an average accuracy of 99%, the proposed model is efficient, precise and robust. Real-time accurate detection using Support Vector Machine (SVM) algorithm without any wearable sensors makes use of this technology more comfortable and easy.

Content uploaded by Akshit Tayade

Content may be subject to copyright.

International Journal of Research Publication and Reviews Vol (2) Issue (5 ) (2021) Page 9-17

International Journal of Research Publication and Reviews

Journal homepage: www.ijrpr.com ISSN 2582-7421

* Corresponding author.

E-mail address: arpitahalder739@gmail.com

Real-time Vernacular Sign Language Recognition using MediaPipe and

Machine Learning

Arpita Haldera, Akshit Tayadeb

aUndergraduate Student, Computer Science and Engineering Department, Budge-Budge Institute of Technology, arpitahalder739@gmail.com,India

bUndergraduate Student,Electronics and Telecommunication Department, K.J Somaiya College of Engineering, tayadeakshit28@yahoo.com,India

A B S T R A C T

The deaf-mute comm unity have undeniable communication problems in their daily life. Recent developments in artificial intelligence tear down this

communication barrier. The main purpose of this paper is to de monstrate a methodology that simplified Sign Language Recogniti on using Me diaPipe’s open-

source framework and machine learning algorit hm. The predictive model is lightweight and adaptable to smart devices. Multiple sign language datasets suc h

as American, Indian, Italian and Turkey are used for training purpose to analyze the capability of the framework. With an average accuracy of 99%, the

proposed model i s efficient, preci se and rob ust. Real -time accurate detection using Support Vector Machine (SVM ) algorithm without any wearable sensors

make s use of th is te chnolo gy mo re com forta ble and easy.

Keywords:M achine Learning, Sign Language Recognition, MediaPipe, Feature extraction, Hand ge sture

1. Introduction

Sign Language significantly facilitates communicati on in the deaf community. Sign language is a language in which communicati on is based on visu al

sign pattern s to express one’s feelings. There is a communication gap when a deaf community wants to express their views, thought of speech and hearing

with normal people. Currently, two communities mostly rely on human -based translator which can be expensive and inconvenient. With the development

in areas of deep learning and computer vision, researchers have developed various automatic sign la nguage recognition methods that can interpret sign

gestures in an understandable way. This narrow d owns the communication gap between impaired and normal people. This also empowers deaf-mute

people to stand with an equal opportunity and improve personal growth.

In accordance with the report of the World Federation of the Deaf (W FD) over 5% of t he world’s population (  360 million people) has

hearing impairment including 328 million a dults a nd 32 Million children. Approximately there are about 300 sign language is i n use around the gl obe.

Sign language recognition is a challenging task as sign language alphabets are different for different sign languages. For instance, American Sign

Language (ASL) alphabet s vary widely from Indian Sign Language or Italian Sign Language. Thus Sign language varies from region to region. Moreover,

articulation of single as well as double hands is used to convey meaningful messages. Sign Language can be expressed by the compressed version, where

a single gesture is sufficient to describe a word. Now, sign language also has fingerspelling to describe each alphabet of the word using different signs

corresponding to a particular letter. As there are many wor ds still not standardized in sign language dictionaries, fingerspelling is often used to manifest a

word. There are still about 150,000 words in spoken English having no counterpart in ASL. Furthermore, any name of people, places, brands or titles

doesn’t have any standardized sign symbol. Besides, a user might not be aware of the exact sign of any particular word a nd in this scenario , fingerspelling

comes in handy and any word can be easily described.

10 International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17

Previous works included sensor-based Sign language Recognition (SLR) system, which was quite uncomfortable and more restrictive for

signers. Specialized hardware for example sensors [1], [2] were used which were an expensive option as well. Whereas, compute r vision-ba sed te chniqu es

uses bare hands without any sensors or coloured gloves. Du e to the use of single camera, computer-vision based technique is more cost-effective and

highly portable compared to sensor-based techniqu es. In computer-vi sion based methods, the most common approach for hand-tracking is skin colour

detection or background subtraction. Computer vision-based SLR system often deals with feature extraction example boundary modelling, contour,

segmentation of gestures and estimation of hand shapes. But, all these solutions are not li ghtweight enough to run in real-time devices l ike mobile phone

applications and thu s are restricted to platform equipped with robu st processors. Moreover, the challenge of ha nd-tracking remained persistent in all these

techniques. To address this drawback, our proposed methodology used a n approach that involves G oogle’s innovative, rapidly growin g and open source

project MediaPipe and a machine learning algorithm on top of this framework to get a faster, simpler, cost -effective, portable and easy to deploy pipeline

which can be used as a sign language recognition system.

2. Related Works

Relatively hand gesture recognition is a difficult problem to address in the field of machine learning. Classification method s can be divided into

supervised and unsupervised method. Based on these methods the SLR system can recognize static or dynamic sign gestures of hands. Murakami and

Taguchi [3] in the year 1991, published a resear ch article using neural network for the first time in sign la nguage recognition. Wit h th e devel opment in the

field of computer vision, numerous resear chers ca me up with novel approaches to help the physically challenged community. Using coloured gloves, a

real-time hand tra cking a pplication was developed by Wang and Popovic[4]. The colour pattern of the gloves was recognized by K-Nearest Neighbors

(KNN) technique but continuous feeding of hand streams is required for the system. However, Support Vector M echanism (SVM) ou tperformed this

algorithm in the research findings of Rekha et al.[5], Kurdyumov et al.[6], Tharwat et al.[7] and Baranwal and Nandi[8]. There are two types of Sign

Language Recognition: Isolated sign recognition and continuous sentence recognition. Likewise, whole sign level modelling and subun it sign level

modelling exist in the SLR system. Visual-descriptive and lingui stic-oriented are two approaches that lead to subunit level sign modelling. Elakkiya et

al.[9] combined SVM l earning and boosting algorithm to propose a framework for subunit recognition of alphabets. An a ccuracy of 9 7.6% was obtained

but the system fails to predict 26 alphabets. To extract features of 23 isolated Arabic sign language Ahmed and Aly[10] used the combination of PCA and

local binary patterns. Despite getting an accuracy of 99.97% in signer dependent mode, due to the usage of threshold operator the system fails to recognize

the constant grey-scale patterns in the signing area. In t he field of machine learning, recognizing hand gesture is relatively problematic to s olve. In most of

the initial attempts, a conventional convolutional network is used that detects handgestures from frames of images. R.Sharma et al.,[11] used 80000

individual numeric signs with more than 500 pictures per sign to train a machine learning model. Their system methodology comprises a training database

of pre-processed images for a hand-detection system and a gesture recognition system. Image pre-processing included feature extraction to normalize the

input information before training the machine learning model. The images are converted into grayscale for better object contour maintaining a

standardized resolution and then flattened into a smaller amount of one-dimensional components. The feature extraction technique helps to extract certain

features about the pixel data from images and feed them to CNN for easier training and more accurate prediction. Hand tracking in 2D a nd 3D space has

been performed by W.Liu et al.,[12].They used skin saliency where skin t ones within a specific range were extracted for better feature extraction and

achieved a classification accuracy of around 98%.

It is evident from a ll these p revious methods that to recognize hand gesture precisely with high accura cy, models r equire a large dataset and complicated

methodology with complex mathematical processing. Pre -processing of images plays a vital in the gesture tracking process. Therefore, for our project, we

used an open-source framework from Google known as Mediapipe which is capable of detecting human body part accurately.

3. Dataset

Table 1: Details of different sign language fingerspelling datasets used in this work

Database

Type

No. of classes

No. of images

Image Samples

American

Alpha bet s

156000

Indian

Alpha bet s

4972

International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17 11

Italian

Alpha bet s

12856

American

Numbers

1400

Turkey

Numbers

4124

4. Architecture

Figure 1: Proposed architecture to detect handgestures and predict sign language finger -spellings

1.1 Stage 1: Pre-Processing of Images to get Multi-hand Landmarks using MediaPipe

MediaPipe is a framework that enables developers for building multi-modal(video, audio, any times series data) cross-platform applied ML pipelines.

MediaPipe has a large collection of hu man body detection and tracking models which are trained on a massive and most diverse dataset of Google. As the

skeleton of nodes and edges or landmarks, they tra ck key points on different parts of the body. All co-ordinate points are three-dimension normalized.

Models build by Google developers using Tensorflow lite facilitates the flow of information easily a daptable and modifiable via graphs. MediaPipe

pipelines are composed of nodes on a graph which are generally specified in pbtxt file. These nodes are connected to C++ files. E xpan sion upon th ese file s

is the base calculator class in Mediapipe. Just like a video stream this class gets contracts of media streams from other nodes in the graph a nd ensures that

12 International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17

it is connected. Once, rest of the pipelines nodes are connected, the class generates its own output processed data . Packet objects encapsulating many

different types of information are used to send each stream of information to each calculator. Into a graph, side packets can al so be imposed, where a

calculator node can be introduced with auxiliary data like constants or static properties. This simplified structure in the pipeline of dataflow enables

additions or modifications with ease and the flow of data becomes more precisely controllable.

The Ha nd tra cking solution [13] has an ML pipeline at its backend consisting of two models working dependently with each other: a) Palm D etection

Model b) Land Landmark Model. T he Palm Detection Model provides an accurately cropped palm image and further is passed on to the landmark model.

This process diminishes the use of data augmentation (i.e. Rotations, Flipping, Scaling) that is done i n Deep Learning models a nd dedicates most of its

power for landmark localization. The traditional way is to detect the hand from the frame and then do landmark localization o ver the current frame. But in

this Palm Detector using ML pipeline challenges with a different strategy. Detecting hands is a complex procedure as you have to perform image

processing and thresholding and work wit h a variety of hand sizes which l eads to consumption of time. In stead of directly detecting hand from the current

frame, first, the Palm detector is trained which estimates bounding boxes around the rigid objects like palm and fists which is simpler than detecting hands

with coupled fingers. Secondly, an encoder-decoder is used as an extractor for bigger scene context.

Figure 2: 21 Hand Landmar ks

After the palm detection is skimmed over the whole image frame, subsequent Hand Landmark models comes into th e picture. This model preci sely

localize 21 3D hand-knuckle coordinates (i.e., x, y, z-axis) inside the detected hand regions. The model is so well trained and robust in hand detection that

it even maps coordinates to partially visible hand. Figure 2 shows the 21 landmark points detection by the Hand Landmark mode l.

Now that we ha ve a functional Palm and Hand detection model running, this model is passed over our dataset of various language. Considering the

American Sign Language dataset, we have a to z alphabets. So, we pass our detection model over every alphabet folder containing images and perform

Hand detection which yields us the 21 landmark points a s shown in Figure 2. T he obtained landmark points are then stored in a file of CSV format. A

simultaneous, elimination task is performed while extracting the landma rk points. Here, only the x and y coordinates detected by the Hand Landmark

model is considered for training the ML model. Depending upon the size of the dataset around 10 -15 minutes is required for Landmark extraction.

1.2 Stage 2: Data cleaning and normaliza tion

As in stage 1, we are only considering x and y coordinates from the detector, each i mage in the dataset is pa ssed through sta ge 1 to c ollect all the data

points u nder one file. This file is then scraped t hrough the pandas' library function to check for any nulls entries. Sometimes due to blurry image, the

detector cannot detect the hand which l eads to null entry into the dataset. Hence, it is necessary to clean these points or will lead to bia sness whil e making

the predictive model. Rows containing these null entries are searched and using their indexes removed from the table. After the removal of u nwanted

points, we normalized x and y coordinates to fit into our system. The data file is then prepared for splitting into training and validation set. 80% of the data

is retained for training our model with various optimization and loss function, whereas 20% of data is reserved for validatin g the model.

1.3 Stage 3: Prediction using Machine Learni ng Algorithm

Predictive analysis of different sign languages are p erformed using machine learning algorithms a nd Support Vector Machine (SVM) outperformed other

algorithms. T he details of the analysis ar e discu ssed in table 2 i n the result section. SVM is effective in high dimensional spaces. In the case where the

number of samples are greater than the number of dimensions, SVM performs effectively. SVM is a cluster of supervised learning methods capable of

classification, regression and outliers detection.

The following formula poses the optimization problem tackled by SVMs:

min󰇛,,󰇜1

2+ (1)



=1

󰇛󰇛󰇜+󰇜> 1  (2)

In equation (1) and equation (2), denotes the distances to the correct margin with >= 0, I = 1, …, n, C denotes a r egularization parameter,  =

 denotes the normal vector, 󰇛󰇜 denotes the transformed input space vector, b denotes a bias parameter, denotes the i-th target value. The

objective is to classify as many data points correctly as possible by maximizing the margin from the Support Vectors to the hyperplane while minimizing

the term wTw. The kernel function used is RBF (radical basis function) that turns the input space into a higher -dimensional space, so t hat not every d ata

point is explicitly mapped. SVM works r elatively well w hen there is a clear margin of separati on between classes. Hence, we used SVM to classify

multiple classes of sign language alphabets and numerics.

International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17 13

1.4 Quantitative Analysis

To analyze results for each of the data sets, we used performance matrix such as accuracy, precision, recall, F1 score. Accuracy is the number of correctly

predicted data points out of all t he data points. can be calcula ted as the number of all correct predictions to the total number of items in the data

measures, shown in equation (3).

 =+

+ + + (3)

 = 

+ (4)

 =

+ (5)

 describes how a ccurate our model is out of those predicted positives, how many of them are actual positive. is a good measure to determine,

when the cost of False positive is high.  calculates how many of the actual positives our model capture by labelling them as positive.  represent

the model metric we will select when there is a high cost associated with False Negatives. The mathematical formulation o f Prec ision and Reca ll are gi ven

in equation (4) and (5) respectively.

 =2 ×  × 

+ (6)

F-Measure in equation (6) provides a way to combine both precision and recall into a single measure that captures both properties. It is u sed t o ha ndle

imbalanced classification. Confusion matrix was also analyzed to have a better understanding of the types of errors b eing made by our classifier. The key

to confusion matrix is number of correct and incorrect predictions are summarized with count valu es and broken down by each class.

5. Result and Discussion

A K-Fold Cross-Validation was performed on the dataset by taking ten folds. The average accuracy over ten iterations of different algorithms is

demonstrated in Table 2. It can be observed from the presented accuracies that SVM outperformed other machine learning algorithms such as KNN,

Random Forest, Decision Tree, Na ïve Ba yes and also a chieved higher accuracy than deep learning algorithm s such as Artificial Neural Network (ANN)

and Multi-Layer Perceptron (MLP).

Table 2: Average accuracy obtained using machine learning and deep learning algorit hms.

Dataset

SVM

KNN

Random Forest

Decision Tree

Naive Bayes

ANN

MLP

ASL(alphabet)

99.15%

98.21%

98.57%

53.74%

97.12%

94.69%

Indian(alphabet)

99.29%

98.87%

98.59%

86.77%

94.79%

96.48%

Italian(alphabet)

98.19%

96.75%

97.83%

77.19%

78.63%

72.14%

ASL(numbers)

99.18%

97.56%

96.74%

95.12%

97.56%

Turkey (numbers)

96.22%

93.08%

94.33%

83.64%

93.71%

83.64%

The highest accuracy achieved u sing the model is bolded in the above table for each of the sign language dataset s.

For exhau stive testing, each sign language i mage dataset is pre-pr ocessed to extract features using MediaPipe framework and trained in Support Vector

Machine to classify gestures correctly. An a ccuracy of 99% is achieved for most of the datasets which outp erform present state-of-arts and classify

fingerspellings of Sign Languages precisely. Maximum accuracy of 99.29% is gained for Indian Sign La nguage and minimum accura cy of 96.22% is

obtained for Turkey Sign Language numbers prediction using handgestures. T he testing performance for each dataset is summarized in Table 3. Confusion

matrix is illustrated in figure 3 and figure 4 demonstrates real -time sign language detection.

Table 3: Performance analysis using SVM algorithm on different datasets

Dataset name

Training Accuracy

Testing Accuracy

Precision

Recall

F1-Score

ASL(alphabet)

99.50%

99.15%

Indian(alphabet)

99.92%

99.29%

Italian(alphabet)

99.72%

98.19%

14 International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17

Turkey (numbers)

99.37%

96.22%

American (numbers)

98.77%

99.18%

The train ed model i s explicitl y lightweight which mak es our machine learning model appropriate for deployment in mobile applica tion. Real -time sign

language detection makes our methodology fast, robust, a daptable specifically for smart devices. Mediapipe’s stat e-of-art makes feature extraction easy by

breaking down and ana lyzing c omplex hand-tracking information, without the need t o build a convolutional neural network from scratch. The proposed

methodology uses minimum computational power and consumes less time to trai n model than other state-of-arts present. Table 4 illustrates comparison of

the performance of other works of literature using machine learning / deep learning algorithms and ours.

Table 4: Comparison with other current methods.

The highest accuracy is bolded in the above table for each of the sign language dataset.

Sign Language

Reference

Type

Number of classes

Method

Accuracy

America n

P.Das et al.,[14]

Alphabets

Deep CNN

94.3%

M.Taskiran et al.,[15]

Alphabets and

Numbers

CNN

98.05%

N.Saquib and

A.Rahman[16]

Alphabets

KNN

96.14%

Random Forest

96.13%

ANN

95.87%

SVM

94.91%

Ours

Alphabets

SVM

99.15%

Numbers

SVM

99.18%

Indian

K.K.Dutta et al.,[17]

Alphabets

KNN

94%-96%

M.Sharma et al.,[18]

Numbers

KNN and Neural

Network

97.10%

J.L.Raheja et al.,[19]

Alphabets

SVM

97.5%

Ours

Alphabets

SVM

99.29%

Italian

L.Pigou et al.,[20]

Alphabets

CNN

91.7%

Ours

Alphabets

SVM

98.19%

International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17 15

(a)

(b)

(c)

(d)

16 International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17

(e)

Figure 3: Confusion matrix a) American Sign Language (alphabets), b) American Sign Language (numbers), c) Indian Sign Language

(alphabets), d) Turkey Sign Language (numbers), e) Italian Sign Language (alphabets)

American Sign Language - Alphabets

American Sign Language - Numbers

International Journal of Research Publication and Reviews Vol (2) Issue ( 5) (2021) Page 9-17 17

Figure 4: Real-time America n Sign Language Recognition. American alphabets: ‘S’, ‘U’, ‘I’ and numbers: ‘1’, ‘3’, ‘9’

6. Conclusion

With an average accuracy of 99% in most of the sign language dataset using MediaPipe’s technology and machine learning, our proposed methodology

show that MediaPipe can be efficiently used as a tool to detect complex hand gesture precisely. Although, sign language model ling using image

processing techniques has evolved over the past few years but methods are complex with a requirement of high computational power. Time consumption

to train a model is also high. From that perspective, t his work provides new insights into this problem. Less computing power and the adaptability to smart

device s makes th e model robust and cost-effective. Training and testing with various sign language datasets show this framework can be adapted

effectively for any regional sign language data set and maximu m accuracy can be obtained. Fa ster real -time detection demonstrates the model’s efficiency

better than the present state-of-arts. In the future, the work can be extended by introducing word detection of sign language from videos using Mediapipe’s

state-of-art and best possible classi fication algorithms.

REFERENCES

[1] Shukor AZ, Mi skon MF, Jamaluddin MH, Bin Ali F, Asyraf MF, Bin Bahar MB. 2015. A ne w data glove ap proach for Malaysian sign language det ection.

Procedia Comput Sci 76:60–67

[2] Almeida SG, Guimarães FG, Ramírez JA. 2014. Feature extraction in Brazilian sign language recognition based on phonological structure and using RGB-D

sensors. Expert Syst Appl 41(16):7259–7271

[3] Murakami K, Taguchi H. 1991. Gesture recognition using recurrent neural net works. In: Proceedings of the ACM SIGCHI c onference on Human factors in

computing syste ms, pp 237–242. https ://dl.acm.org/doi/pdf/10.1145/10884 4.10890 0

[4] Wang RY, Popović J. 2009. Real-time hand-tracking with a color glove. ACM Trans Graph 28(3):63

[5] Rekha J, Bhattacharya J, Majumder S. 2011. Hand gesture recognition for sign language: a new hybrid approach. In: International Conference on Image

Processing, Computer Vision, and Pattern Recognition (IPCV), pp 80–86

[6] Kurdyumov R, Ho P, Ng J. 2 011. Sign language classificati on using webcam images, pp 1 –4. http://cs229 .stanf ord.edu/proj2 011/ Kurdy umovH oNg-SignL

angua geCla ssifi catio nUsin gWebc amIma ges.pdf

[7] Tharwat A, Gaber T, Ha ssanien AE, Shahin MK, Refaat B. 2015. Sift-based arabic sign language reco gnition system. In: Springer Afro -European conference

for industrial adva ncement, p p 359–370. https ://d oi.or g/10.1 007/9 78-3-319-13572 -4_30

[8] Baranwal N, Na ndi GC. 2017. An efficient gesture based humanoid learning u sing wavelet descriptor and MFCC techniques. Int J Mach Learn Cybern

8(4):1369–1388

[9] Elakkiya R, Sel vamani K, Velumadhava Rao R , Kanna n A. 201 2. Fuz zy hand gesture rec ognition based human computer i nterface inte lligent syste m. UACEE

Int J Adv Comput Netw Secur 2(1):29 –33 (ISSN 2250 –3757)

[10] Ahmed AA, Aly S. 20 14. Ap pearance-based arabic sign language rec ogniti on usi ng hid den markov models. In: IEEE International Conference on Engineering

and Technology (ICET), pp 1–6. https ://doi.org/10.1109/ICEng Techn ol.2014.70168 04

[11] R. Sharma, R. Khapra, N. Da hiya. June 2020. Sign Language Gesture Recognition, pp.14-19

[12] W. Liu, Y. Fan, Z. Li, Z. Zhang. Jan 2015 . Rgbd video based human hand t rajectory tracking and gesture recognition system i n Mathematical Problems in

Engineering,

[13] Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C. L., & Grundmann , M. 2020. MediaPipe Hands: On-device Real-time Hand

Tracking. arXiv preprint arXiv:2006.10214.

[14] Das, P., Ahmed, T., & Ali, M. F. 2020, June. Static Hand Gesture Recognition for American Sign Language using Deep Convolutiona l Neural Network.

In 2020 IEEE Region 10 Symposium (TENSYMP) (pp. 1762-1765). IEEE.

[15] M. Taskiran, M. Killioglu and N. Kahraman. 2018 . A Real -Time System for Recognition of American Sig n Language by using Deep Learning, 20 18 4 1st

International Conference on Telecommunications and Signal Processing (TSP ), Athens, Greece, pp. 1-5, doi: 10.1109/TSP.2018.8441304.

[16] Nazmus Saquib and Ashikur R ahman. 2020. Applicati on of machine learning techniques for real-time sign language detection using wearable sensors. In

Proceedings of the 11th ACM Multimedia Systems C onference (MMSys '20). Association for Computing Machinery, New York, NY, USA, 178–189.

DOI:http s://doi.o rg/10.1 145/333 9825.339 1869

[17] Dutta, K. K., & Bellary, S. A. S. 2017, September. Machine learning techniques for Indian sign l anguage recognit ion. In 2017 International Conference on

Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC) (pp. 3 33-336). IEEE.

[18] Sahoo, Ashok. 2014. Indian sign language recognition using neural net works and kNN classifiers. Journal of Engineer ing and Applied Sciences. 9. 1255-1259.

[19] Raheja JL, Mishra A, Chaudary A. 2016 September . Indian Sign Language Recognition Using SVM 1. P attern Recognition and Image Analy sis.; 26 (2).

[20] Pigou, L., Dieleman, S., Kindermans, P. J., & Schrauwen, B. 2014, September. Sign langua ge recognition using convolutional neural net works. In Eu ropean

Conference on Computer Vision (pp. 572-578). Springer, Cham.

Detección de acción humana para el control de inventario utilizando visión por computadora

Article

Full-text available

Apr 2024

La visión por computadora (VC) puede ser un proceso que facilite algunas tareas en la gestión de inventarios, por medio de este proceso se puede realizar un análisis permanente de un inventario y así mantener registro de todos los movimientos realizados, entregando un reporte instantáneo cuando sea requerido. Esto supone una mejora en la seguridad, ya que al mantener un control estricto de los elementos existentes en el inventario se puede saber si un elemento pertenece o no a un inventario o cuando se retira o agrega un elemento, tras esta necesidad de control de inventario, surge la necesidad de diseñar un sistema inteligente que pueda facilitar el control de inventarios. Mediante la combinación de 2 frameworks, se realiza la creación de un algoritmo capaz de realizar la identificación y conteo de objetos, así como la identificación de la mano para determinar cuándo se realiza una manipulación humana al inventario. Para lograr este objetivo, se utilizaron dos algoritmos: MediaPipe y YOLOv5 combinado con el dataset de COCO, el primero se usó para la detección de manos y el segundo identifica y cuenta los objetos. Después de las pruebas realizadas al algoritmo se determina que el reconocimiento de manos de MediaPipe tuvo una precisión del 96% y la detección y clasificación de objetos usando YOLO fue de 43.7%. Teniendo como retos el algoritmo la superposición, la oclusión/auto oclusión de los objetos, o la pérdida de foco de los elementos debido al sensor.

A Novel Approach for Recognizing Real-Time American Sign Language (ASL) Using the Hand Landmark Distance and Machine Learning Algorithms

Conference Paper

Nov 2023

A Sign Language Recognition System for Helping Disabled People

Conference Paper

Full-text available

Dec 2023

Machine Learning Technology to Recognize American Sign Language Alphabet

Conference Paper

Full-text available

Dec 2023

Empowering Gestures: Composing Succinct Meaning Using Vision and Swin Transformers for Indian Sign Language

Chapter

May 2024

Real-time hand gesture recognition based on multi-connect architecture associative memory in human computer interaction

Conference Paper

Full-text available

Jan 2024

A Mediapipe-Based Hand Gesture Recognition Home Automation System

Conference Paper

Full-text available

Nov 2023

As one gets older, his/her mobility tends to decrease. Therefore, simple tasks such as getting up to switch the lights on or turning the fan off can become difficult. Thus, it became imperative to create a system which allows them to perform these tasks - a “Hand Recognition based Home Automation System”. Starting with the research unraveled different ways of implementation of the hand gesture recognition. After considering the functionalities and drawbacks of various methodologies, a library called MediaPipe, was the one that resonated best with this project. This paper includes analysis and comparison of various types of models on a hand gesture dataset - HaGRID.

Empowering Communication: Harnessing CNN and Mediapipe for Sign Language Interpretation

Conference Paper

Nov 2023

Automatic assessment of communication skill in real-world job interviews: A comparative study using deep learning and domain adaptation.

Conference Paper

Jan 2024

Deep Multimodal-based Number Finger Spelling Recognizer for Thai Sign Language

Conference Paper

Oct 2023

Static Hand Gesture Recognition for American Sign Language using Deep Convolutional Neural Network

Conference Paper

Full-text available

Jan 2020

One of the complicated issues of computer vision is the recognition of any sign languages. The deaf and the dumb people use these sign languages to communicate. In the area of deep learning at recent progress there are numerous applications that neural networks can have for interpreting sign languages. The recognition of American Sign Language static images employing a capable artificial intelligence tool, Convolutional Neural Network has been proposed in this paper. ASL dataset of 1815 images of 26 English alphabets has been used to train and validate our model. Validation accuracy has been found 94.34% which is better than many existing methods

A Real-Time System For Recognition of American Sign Language by Using Deep Learning

Conference Paper

Full-text available

Jul 2018

Deaf people use sign languages to communicate with other people in the community. Although the sign language is known to hearing-impaired people due to its widespread use among hearing-impaired people, the sign language is not known much by other people. In this article, we have developed a real-time sign language recognition system for people who do not know sign language to communicate easily with hearingimpaired people. The sign language is American sign language. In this study, the convolutional neural network was trained by using dataset collected in 2012 by Massey University, Institute of Information and Mathematical Sciences, and 100% test accuracy was obtained. After network training is completed, the network model and network weights are recorded for the realtime system. In the real-time system, the skin color is determined for a certain frame for hand use, and the hand gesture is determined using the convex hull algorithm, and the hand gesture is defined in real-time using the registered neural network model and network weights. The accuracy of the real-time system is 98.05%.

An efficient gesture based humanoid learning using wavelet descriptor and MFCC techniques

Article

Full-text available

Aug 2017

Recognizing any gesture, pre-processing and feature extraction are the two major issues which we have solved by proposing a novel concept of Indian Sign Language (ISL) gesture recognition in which a combination of wavelet descriptor (WD) and Mel Sec Frequency Cepstral Coefficients (MFCC) feature extraction technique have been used. This combination is very effective against noise reduction and extraction of invariant features. Here we used WD for reducing dimensionality of the data and moment invariant point extraction of hand gestures. After that MFCC is used for finding the spectral envelope of an image frame. This spectral envelope quality is useful for recognizing hand gestures in complex environment by eliminating darkness present in each gesture. These feature vectors are then used for classifying a probe gestures using support vector machine (SVM) and K nearest neighbour classifiers. Performance of our proposed methodology has been tested on in house ISL datasets as well as on Sheffield Kinect gesture dataset. From experimental results we observed that WD with MFCC method provides high recognition rate as compare to other existing techniques [MFCC, orientation histogram (OH)]. Subsequently, ISL gestures have been transferred to a Humanoid HOAP-2 (humanoid open architecture platform) robot in Webots simulation platform. Then these gestures are imitated by HOAP-2 robot exactly in a same manner.

A New Data Glove Approach for Malaysian Sign Language Detection

Article

Full-text available

Dec 2015

A normal human being sees, listens, and reacts to his/her surroundings. There are some individuals who do not have this important blessing. Such individuals, mainly deaf and dumb, depend on communication via sign language to interact with others. However, communication with ordinary individuals is a major concern for them since not everyone can comprehend their sign language. Furthermore, this will cause a problem for the deaf and dumb communities to interact with others, particularly when they attempt to involve with educational, social and work environments. In this research, the objectives are to develop a sign language translation system in order to assist the hearing or speech impaired people to communicate with normal people, and also to test the accuracy of the system in interpreting the sign language. As a first step, the best method in gesture recognition was chosen after reviewing previous researches. The configuration of the data glove includes 10 tilt sensors to capture the finger flexion, an accelerometer for recognizing the motion of the hand, a microcontroller and Bluetooth module to send the interpreted information to a mobile phone. Firstly the performance of the tilt sensor was tested. Then after assembling all connections, the accuracy of the data glove in translating some selected alphabets, numbers and words from Malaysian Sign Language is performed. The result for the first experiment shows that tilt sensor need to be tilted more than 85 degree to successfully change the digital state. For the accuracy of 4 individuals who tested this device, total average accuracy for translating alphabets is 95%, numbers is 93.33% and gestures is 78.33%. The average accuracy of data glove for translating all type of gestures is 89%. This fusion of tilt sensors and accelerometer could be improved in the future by adding more training and test data as well as underlying frameworks such as Hidden Markov Model.

RGBD Video Based Human Hand Trajectory Tracking and Gesture Recognition System

Article

Full-text available

Feb 2015
MATH PROBL ENG

The task of human hand trajectory tracking and gesture trajectory recognition based on synchronized color and depth video is considered. Toward this end, in the facet of hand tracking, a joint observation model with the hand cues of skin saliency, motion and depth is integrated into particle filter in order to move particles to local peak in the likelihood. The proposed hand tracking method, namely, salient skin, motion, and depth based particle filter (SSMD-PF), is capable of improving the tracking accuracy considerably, in the context of the signer performing the gesture toward the camera device and in front of moving, cluttered backgrounds. In the facet of gesture recognition, a shape-order context descriptor on the basis of shape context is introduced, which can describe the gesture in spatiotemporal domain. The efficient shape-order context descriptor can reveal the shape relationship and embed gesture sequence order information into descriptor. Moreover, the shape-order context leads to a robust score for gesture invariant. Our approach is complemented with experimental results on the settings of the challenging hand-signed digits datasets and American sign language dataset, which corroborate the performance of the novel techniques.

Intelligent System for Human Computer Interface Using Hand Gesture Recognition

Article

Full-text available

Dec 2012

In early days computers are operated by various interface devices, which are developed by the humans to interact with computers. Starting from Punch-cards to touch screens man has changed the human life into an unimaginable state, In this paper, a novel method for dynamic hand gesture recognition based on human computer interface intelligent system is proposed. The main objective is to interact with computers without using mouse clicks and keystrokes. An architecture for hand posture, gesture modelling and recognition system is introduced, which is used as an interface to make possible communication with the sensory challenged (hearing impairment and gustatory impairment) people by simple hand gestures. This proposed system first transforms the pre-processed data of the detected hand into a fuzzy hand-posture feature model by using fuzzy neural networks, Second with the proposed model, the developed system determines the actual hand posture by applying fuzzy inference. Finally, from the sequence of detected hand postures, the system will recognize the hand gesture of the user. Moreover, the computer vision techniques are developed to recognize a dynamic hand gestures that make interpretations in the form of commands or actions. (C) 2012 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Noorul Islam Centre for Higher Education

Application of machine learning techniques for real-time sign language detection using wearable sensors

Conference Paper

May 2020

Machine Learning Techniques for Indian Sign Language Recognition

Conference Paper

Sep 2017

Sign Language Recognition Using Convolutional Neural Networks

Conference Paper

Mar 2015

There is an undeniable communication problem between the Deaf community and the hearing majority. Innovations in automatic sign language recognition try to tear down this communication barrier. Our contribution considers a recognition system using the Microsoft Kinect, convolutional neural networks (CNNs) and GPU acceleration. Instead of constructing complex handcrafted features, CNNs are able to automate the process of feature construction. We are able to recognize 20 Italian gestures with high accuracy. The predictive model is able to generalize on users and surroundings not occurring during training with a cross-validation accuracy of 91.7%. Our model achieves a mean Jaccard Index of 0.789 in the ChaLearn 2014 Looking at People gesture spotting competition.

Indian Sign Language Recognition using SVM

Article

Apr 2016

Needs and new technologies always inspire people to make new ways to interact with machines. This interaction can be for a specific purpose or a framework which can be applied to many applications. Sign language recognition is a very important area where an easiness in interaction with human or machine will help a lot of people. At this time, India has 2.8M people who can’t speak or can’t hear properly. This paper targets Indian sign recognition area based on dynamic hand gesture recognition techniques in real-time scenario. The captured video was converted to HSV color space for pre-processing and then segmentation was done based on skin pixels. Also Depth information was used in parallel to get more accurate results. Hu-Moments and motion trajectory were extracted from the image frames and the classification of gestures was done by Support Vector Machine. The system was tested with webcam as well as with MS Kinect. This type of system would be helpful in teaching and communication of hearing impaired persons.

Real-time Vernacular Sign Language Recognition using MediaPipe and Machine Learning

Abstract

Recommended publications

Real-time System for Translating American Sign Language to Text Using Robust Techniques

Customizable Sign Language Gesture Prediction for Assistive Devices Using Machine Learning

Real-time recognition of American sign language using long-short term memory neural network and hand...

R-DCNN Based Automatic Recognition of Indian Sign Language