Content uploaded by Miftahul Ma'arif
Author content
All content in this area was uploaded by Miftahul Ma'arif on Dec 08, 2020
Content may be subject to copyright.
Validation of Voice Recognition in Various Google
Voice Languages using Voice Recognition Module
V3 Based on Microcontroller
Khusnul Khotimah
Vocational Education
Department of Graduate
Program
Universitas Negeri Surabaya
Surabaya, Indonesia
khusnulelektro7@gmail.com
Agus Budi Santoso
Vocational Education
Department of Graduate
Program
Universitas Negeri Surabaya
Surabaya, Indonesia
agusbudi@unesa.ac.id
Miftahul Ma’arif
Physics Department
National Central University
Taoyuan, Taiwan
0000-0001-8472-7095
Alfiantin Noor Azhiimah
Vocational Education
Department of Graduate
Program
Universitas Negeri Surabaya
Surabaya, Indonesia
alfiantinnoor.azhiimah5@gm
ail.com
Bambang Suprianto
Vocational Education Department of
Graduate Program
Universitas Negeri Surabaya
Surabaya, Indonesia
bambangsuprianto@unesa.ac.id
Meini Sondang Sumbawati
Vocational Education Department of
Graduate Program
Universitas Negeri Surabaya
Surabaya, Indonesia
meinisondang@unesa.ac.id
Tri Rijanto
Vocational Education Department of
Graduate Program
Universitas Negeri Surabaya
Surabaya, Indonesia
tririjanto@unesa.ac.id
Abstract—Nowadays, one of the dynamic technological
development is Voice Recognition. Voice Recognition can
recognize someone's voice which can be used to facilitate work
efficiently. Voice recognition helps the user match the voice
which has been validated before and verify the compatibility of
the sound with the user's voice so that it meets the biometric
identification requirements. The purpose of this study is to
determine the success rate of giving orders to the prototype of
an automatic lamp using Voice Recognition Module V3. This
research uses google voice with various languages by using the
voice command "lights on" which means turning on the lights
automatically, and "lights off" means turning off the lights
automatically. Voice commands are taken in 9 random
languages based on the availability of Google Voice. Also, this
study aims to determine the effect of volume and distance on
the performance of Voice Recognition Module V3. This study
uses the distance between the microphone and the speaker in
the range of about 5 cm, 10 cm, and 15 cm and also the volume
of voice commands on google voice by 30%, 50%, and 100%.
The results show that the volume of google voice on the cell
phone is directly proportional to the percentage of the success
rate of voice commands. While the results of testing the
distance of the microphone with Google voice is inversely
proportional. In conclusion, the Voice Recognition Module V3
can function well at a distance of 5 cm even with a google voice
volume of 30%. Except in Chinese, because the vowel
pronunciation of the word sounds faint. Vocal clarity of voice
command pronunciation affects the success rate of voice
commands.
Keywords—voice recognition, google voice, various
languages
I. INTRODUCTION
One form of dynamic technological development is
Voice Recognition technology [1]. Voice Recognition uses
biometric systems from the human voice. Everyone has their
own characteristics biometrics such as ranging from
fingerprints, recognition, voice, retina, face, and signature [2]
[3]. Voice recognition can recognize a person's voice so that
it can be used to facilitate and assist in doing a job more
efficiently [4]. Voice Recognition can be developed and
applied to support the automation of household appliances
[5-6]. Starting from cooling the room automatically, opening
the automatic door, turning on the automatic lights, turning
on the automatic TV or radio, and others. So it can increase
the safety, comfort, and energy savings in the home. Not
only that, but voice recognition can also be applied to
wheelchairs with disabilities [7]. The operation of this
system does not require a lot of energy. Voice recognition in
meeting biometric requirements is done by matching the
user's voice with the validated voice[8].
In 2017, a study was proposed to determine the success
rate of voice commands using automatic door prototypes
carried out by several people in various languages in
Indonesia, ranging from Batak, Javanese, Sundanese,
Betawi, Mining, Toraja, Malay, and Lombok. The
assessment is done by using two voice commands in various
languages. Voice commands consisted of "unlock" which
means opening the locked door and "door lock" which means
locking the door. In this study, the female sex had a higher
success rate than the male gender [9].
This research uses google voice with various languages
in the world with the voice command "lights on" which
means turning on the lights automatically, and "lights off"
means turning off the lights automatically. Therefore we
conducted a study with the title "Validation of Voice
Recognition in Various Google Voice using Voice
Recognition Module V3 Based on Microcontrollers".
2020 the third International Conference on Vocational Education and Electrical Engineering (ICVEE)
978-1-7281-7434-1/20/$31.00©2020 IEEE
Authorized licensed use limited to: National Central University. Downloaded on December 08,2020 at 14:25:57 UTC from IEEE Xplore. Restrictions apply.
II. MATERIALS
AND
METHOD
In a sound recognition system, there are 3 main blocks,
namely the input block, the output, and the process or
program. The input block functions as a sensor that is used
to detect the input sound. A sound detection sensor can be a
microphone. A Voice recognition system would work well
in processing sound when using the same sound sampling as
the sound recording process. Data output from the
microphone will be forwarded to the voice recognition
module.
Voice recognition also known as speaker recognition is
designed to recognize who is the person who is speaking
[10]. Voice recognition uses different acoustic character
sounds between individuals. Acoustic patterns reflect both
anatomies (such as the size and shape of the throat and
mouth) and learned behavioral patterns (such as tone of voice
and speaking style) [11]. Before being able to recognize the
speaker's voice, this method requires some training where the
system will learn the speaker's voice, accent, and tone. This
is generally done by recording a series of words or textual
commands by the user speaking through an external
microphone [12].
Digital signal processing module (DSP) that is capable of
detecting endpoints (word boundaries) to separate speech from
non-speech transform original waveforms into frequency
domain representations, and scaling, filtering, and compressing
data. The aim is to improve and maintain only those
components of the spectral representation which is useful for
identification purposes. Thereby reducing the amount of
information that must be met by the pattern matching
algorithm. This set of speech parameters for a one-time
interval (usually 10-30 milliseconds) is called a speech frame
[11]. Voice Recognition Module V3 is a voice recognition
module and can be used in many control applications that
require detection. The working system of this module can be
used in conjunction with the Arduino microcontroller board
[13]. In Voice Recognition Module V3, voice commands are
stored in a large group such as a library. Voice recognition
only limits 7 voice commands which can be imported
effectively and simultaneously. However, during voice training
(train) using Voice Recognition Module V3 can be up to 80
voice train using a group that one of the voice train instead of a
group.
Voice Recognition Module V3 has a range voltage of 4.5
to 5.5 Volt and a current of less than 40 mA. The digital
interface in the form of level 5 V TTL is used to UART and
GPIO interfaces. Measuring 31 mm x 50 mm and up to
speech recognition accuracy 99% in an ideal environment.
The next block is the process block. The main component
is the Atmega328 microcontroller found on the Arduino Uno
the platform, which acts as a data processor from the input
block. The output of the sound sensor in the form of an
analog signal will be processed firstly by the voice
recognition module, namely Easy VR. In the module of Easy
VR, there is a sample database of the sound. Easy VR is in
charge of matching the sound sensor output data that
becomes the input data on Easy VR with the sound database
stored on Easy VR, then Easy VR will send data to Arduino
using serial communication [14]. The output block is a block
that contains actuator components such as servo motors,
solenoids, lights, and others.
This research was conducted at the Postgraduate
Laboratory of CPD Building, Surabaya State University,
Surabaya's Tongue Wetan Campus in March 2020. This
study used a waterfall development method that consisted of
system analysis, design, program writing (coding), and
testing [15].
The stage of the waterfall development method is as
follows.
1. System Analysis
Figure 5. Waterfall Method [15]
Voice command 1
Voice command 2
Voice command 3
Voice command 4
Voice command 5
Voice command 6
Voice command 7
…………………..
Recognize
r
Figure 3. Voice command setting [12]
Arduino UNO
Google Voice Lampu
Mikrofon Voice Recognition
Modul V3
Serial Monitor
Figure 6. System Block Diagram
Authorized licensed use limited to: National Central University. Downloaded on December 08,2020 at 14:25:57 UTC from IEEE Xplore. Restrictions apply.
The system created in this study is a prototype of an
automatic lamp using Voice Recognition Module V3. This
system uses the Voice Recognition Module V3 which is used
to store user voice commands. Voice commands are used as
codes to turn on the lights and turn off the lights
automatically. Voice commands are used as samples, e.g. the
words "lights on" to turn on automatic lights, and "lights off"
to turn off lights automatically. The word command that
would be used consists of 9 (nine) languages in the world
that are chosen randomly and based on the availability of
google voice. Those languages are Indonesian, English,
Chinese, Japanese, Korean, Swedish, Italian, Latin, German.
The system block diagram in this study can be described
as follows,
Voice commands emitted from various types of Google
voice through the speakers contained in the mobile phone.
The Mobile speaker is placed straight facing (line of sight)
with the microphone position. It is intended that the sound
does not receive a lot of noise (noise) and the normal
conditions of public conversation is an environment with a
state near the system having a loudness below 60 dB [16].
In this study, the tool system is divided into 3 main
blocks, namely the input block, output block, and processing
block. The input block consists of a microphone that is used
to detect the sound that will be forwarded to the Voice
Recognition Module V3. It also served to match the voice
commands coming out of the microphone with the voice
command database stored in the Voice Recognition Module
V3. Next, it will proceed to the process block which is used
to process the power from the Voice Recognition Module
output V3. The component in charge is the Arduino Uno
Atmega328 microcontroller [11]. There is a serial
communication between Arduino Uno with Voice
Recognition Module V3. In this system, the output block
consists of an LED lamp that is connected to pin 13 for the
positive LED leg and GND for the negative LED leg. The
LED lights function as automatic indicator lights which are
controlled by the Atmega328 microcontroller on the Arduino
platform. Also, the output block consists of an LCD 16 x 2
(Liquid Crystal Display) and it is connected serially with
Arduino.
2. Design
Based on the results of the analysis of the system being
made, the system design is then carried out which aims to see
the initial description, the components needed, and the
working principle of the tool. The component used by the
system in this study can be written as bellow
• Voice Recognition Module V3 as a module used to
match voice commands.
• Arduino Uno Atmega328 as a microcontroller.
• LED as an actuator
• LCD 16 x 2 as a display monitor
• The microphone as a voice sensor
• Handphone (google voice) as a voice source.
The software that has been used in this study is the
Arduino IDE only. Because the Voice Recognition Module
V3 is compatible with Arduino. This tool has a working
principle that is when the user says the voice command
"lights on" it will turn on the LED lights that are connected
to the Arduino. In the monitor, serial LCD will display the
sound detection that occurs. Whereas if the user utters the
voice command "lights off" then the meal will turn off the
LED lights that are connected to the Arduino and on the
LCD monitor series a sound detection will occur.
This case will also occur with the use of other languages
desired by the user. When the user says "light on" command
voice in another language, it will turn on the LED lights that
are connected to the Arduino.
3. Implementation
The next stage is the implementation phase. At this stage,
the results of the system design were applied to a prototype
of an automatic lamp based on Voice Recognition Module
V3. Then, compiling a sampling program Serial Monitor
voice command in the Arduino IDE software. Followed by
making a vote that is used as a voice command that is "lights
on" and "lights off". Sampling is done twice in ideal
conditions or there is no noise. sampling Voice command is
done via a PC with Arduino IDE software and each time it
switches to test another language, must be voice command
sampling performed.
After taking voice commands, then compile an Arduino
program for automatic light control connected to the serial
monitor. Arduino IDE Programming is an Arduino platform,
there is a microcontroller which is centered as the control
center for the whole system.
4. Evaluation
The next stage is the stage of testing the success rate of
detecting voice commands by influencing the distance and
volume of google voice in 9 languages. This study takes the
variable distance between the microphone and the speaker of
the android mobile that contains Google voice as much as 5
cm, 10 cm, and 15 cm. The variable volume of voice
commands on google voice by 30%, 50%, and 100%. The
testing phase carried out each of the 10 voice commands
while the control variable in this study is the use of google
voice as a source of voice commands. The use of google
voice is more reliable and is considered to be the voice of an
expert in languages used from various worlds. Other control
variables are the position of the microphone and the speaker
of the android mobile that are straight facing (line of sight)
so that the possibility of noise is very small. The flowchart
(flow diagram) of the system tools developed in this study
can be seen as follows.
III. R
ESULTS AND DISCUSSION
This study aims to test the success rate of detecting voice
commands by influencing the distance and volume of google
voice in 9 languages.
Description :
1. Power
2. Lamp On
3. Lamp Off
4. Microphone
Testing is done by sampling 10 times for each language
and variables according to the table. Then the successful
number of voice command pronunciations was counted. The
Authorized licensed use limited to: National Central University. Downloaded on December 08,2020 at 14:25:57 UTC from IEEE Xplore. Restrictions apply.
No
No
No
No No
Lamp On/ Lamp Off
Distance
5 cm
Distance
15 cm
Distance
10 cm
Star
t
Voice Recognition Module
Initialization
Input = Lamp On or
Lamp Off
Identified
Voice?
Volume
30%
Volume
100%
Volume
50%
Yes
Figure 7. Flowchart of the system
Figure 8. Automatic Lamp based Voice Recognition
the final result of testing in this study is the percentage of the success of pronunciation of each word in various languages
the success rate of voice commands using the Voice Recognition Module V3 with the distance between the microphone and
speaker of the google voice by 5 cm are shown in Table I.
Authorized licensed use limited to: National Central University. Downloaded on December 08,2020 at 14:25:57 UTC from IEEE Xplore. Restrictions apply.
TABLE I. T
HE SUCCESS RATE OF VOICE COMMANDS AT A DISTANCE OF
5
CM
No. Country
Language Command I Command
II
The distance of 5 cm
Command I
Volume
Command II
Volume
30% 50% 100% 30% 50% 100%
1. Indonesia Lampu Nyala Lampu Mati 100% 100% 100% 100% 100% 100%
2. English Lights On Lights Off 100% 100% 100% 100% 100% 100%
3. Chinese Dengliang Ximie 100% 100% 100% 0% 0% 100%
4. Japanese Tento Shoto 90% 100% 100% 100% 100% 100%
5. Korean Bul-eul Kyeoda Sodeung 100% 100% 100% 50% 100% 100%
6. Swedish Ijuset Pa Tands 40% 90% 100% 90% 100% 100%
7. Italian Accendi Luci spente 100% 100% 100% 100% 100% 100%
8. Latin In Lucem Off Lumine 90% 100% 100% 100% 100% 100%
9. German Licht An Lichten Aus 100% 100% 100% 100% 100% 100%
TABLE II.
THE SUCCESS RATE OF VOICE COMMANDS AT A DISTANCE OF
10
CM
No. Country
Language Command I Command
II
The distance of 10 cm
Command I
Volume
Volume
Command II
30% 50% 100% 30% 50% 30%
1. Indonesia Lampu Nyala Lampu Mati 0% 90% 100% 0% 80% 100%
2. English Lights On Lights Off 0% 100% 100% 0% 100% 100%
3. Chinese Dengliang Ximie 0% 90% 100% 0% 0% 100%
4. Japanese Tento Shoto 0% 40% 100% 0% 100% 100%
5. Korean Bul-eul Kyeoda Sodeung 0% 100% 100% 0% 0% 100%
6. Swedish Ijuset Pa Tands 0% 0% 100% 0% 10% 100%
7. Italian Accendi Luci spente 0% 100% 100% 0% 100% 100%
8. Latin In Lucem Off Lumine 0% 0% 100% 0% 0% 100%
9. German Licht An Lichten Aus 0% 0% 100% 0% 0% 100%
Based on Table I in above, the results can be
obtained that the success rate of voice commands at a
distance of 5 cm with a full volume (100%) reaches 100%
for all languages tested both for the voice command
(command) I and voice command (command) II. While the
success rate of voice commands with a half volume (50%)
reached 100% for all languages tested except Swedish at
90% in command I and Chinese at 0% in command II. It has
happened because the Chinese pronunciation of the word
"Ximie" has a double vowel which is "ie" so the
pronunciation is unclear. The percentage of 0% also occurs
in Chinese when the 30% volume is governed by the II
votes.
In the google voice volume of 30% in voice
command 1 which gets a percentage of 100% is Indonesian,
English, Chinese, Korean, Italian, and Latin. In Japanese,
the percentage of successful voice commands is 90%,
Swedish 40%, and Latin 90%. In voice command II, most of
all languages get a percentage of 100% except Korean at
50% and Swedish at 90%. In Korean, they get a percentage
of 50% due to having double vowels in the word "sodeung",
so the pronunciation is unclear.
Based on the table above shows that the distance is
inversely proportional to the success rate of voice
commands. The farther the distance between the
microphone and the speaker of the google voice, the success
rate of voice commands will be small. Therefore, the closer
the distance between the microphone and the speaker of the
google voice, the success rate will be a large percentage of
voice commands.
The success rate of voice commands using the
Voice Recognition Module V3 with the distance between
the microphone and the speaker of the google voice by 10
cm are shown in Table II. Based on Table II above, the
results can be obtained that the success rate of voice
commands at a distance of 10 cm with a full volume (100%)
reaches 100% for all languages tested both for the voice
command I and voice command II. While the success rate of
voice commands with a half volume (50%) in command I
reached 100% in English, Korean, and Italian. Percentage of
90% in Indonesian and Chinese while Japanese by 40% in
command I. Swedish, Latin, and German shows a
percentage of 0%.
In command II, English, Japanese, and Italian have
succeeded in voice commands 100%. The percentage of
Indonesian is 80% and Swedish is 10%. As for Chinese,
Korean, Latin, and German at 0%. It has happened because
some of the words used as command II have unclear
pronunciation. The success rate of voice commands at a
distance of 10 cm with a volume of 30% reaches 0% for all
languages tested both for the voice command word
(command I) and the voice command word (command II).
Authorized licensed use limited to: National Central University. Downloaded on December 08,2020 at 14:25:57 UTC from IEEE Xplore. Restrictions apply.
The results of the percentage of the success rate of voice
commands using the Voice Recognition Module V3 with the
distance between the microphone and the google voice
speaker by 15 cm are shown in Table. Based on Table III
above, the results can be obtained that the success rate of
voice commands at a distance of 15 cm with a full volume
(100%) reaches 100% for all tested languages.
IV. C
ONCLUSION
Based on results, the volume of the google voice on the
HP is directly proportional to the percentage of the success
rate of voice commands. If the volume of google voice is
greater, then the percentage of the success rate of voice
commands will be higher. When the volume of google voice
the percentage of the rate decreases the success of voice
commands will be lowed. So the best volume is a large
volume. While the results of testing the distance of the
microphone with Google voice is inversely proportional. The
greater distance of the microphone from the voice of Google
Voice results in a lower percentage of the success of voice
commands. If the distance between the microphone and
Google Voice is getting smaller, the percentage of the
success rate of voice commands is higher. So, the best
distance on the test Voice Recognition Module V3 is the
closest distance within a large volume.
The Voice Recognition Module V3 test results show that
the module can function well at a distance of 5 cm even
though with a google voice volume of 30%. Except in
Chinese, because the vowel pronunciation of the word is less
clear/vague. Vocal clarity of voice command pronunciation
affects the success rate of commands voice. All languages
are detected when the volume is increased to 100% even at
any distance. Success will decrease if the volume is reduced
to 50% and 30% in some languages such as Chinese and
Swedish. This study contributed to an understanding of the
validation of voice recognition through google voice using
the Microcontroller-based Voice Recognition Module V3.
Furthermore, this research used standard google voice and
there was no noise in it. So it could be concluded that the
research results were more reliable.
R
EFERENCES
[1] Sen, Sonali, et al. "Design of an intelligent voice-controlled home
automation system." International Journal of Computer Applications
121.15 (2015).
[2] Srivastava, Himanshu. "A comparison based study on biometrics for
human recognition." IOSR Journal of Computer Engineering (IOSR-
JCE) 15.1 (2013): 22-29.
[3] Ariyanti, Sinta, Slamet Seno Adi, and Sugeng Purbawanto. "SISTEM
BUKA TUTUP PINTU OTOMATIS BERBASIS SUARA." Elinvo
(Electronics, Informatics, and Vocational Education) 3.1 (2018): 83-
91.
[4] Karudaiyar, G, et al. “IOT Based Voice Controlled Smart Home
Automation,” International Journal of Engineering Applied Sciences
and Technology 2.5 (2017): 44-45.
[5] Baig, Faisal, Saira Beg, and Muhammad Fahad Khan. "Controlling
home appliances remotely through voice command." arXiv preprint
arXiv:1212.1790 (2012).
[6] Kamdar, Hem, et al. "A review on home automation using voice
recognition." International Research Journal of Engineering and
Technology (IRJET) 4.10 (2017).
[7] Oleiwi, Bashra Kadhim, and Farah F. Alkhalid. "Smart Autonomous
Wheelchair Controlled by Voice Commands-Aided by Tracking
System." IRAQI JOURNAL OF COMPUTERS, COMMUNICATION
AND CONTROL & SYSTEMS ENGINEERING 19.1 (2019): 82-87.
[8] Saini, Preeti, and Parneet Kaur. "Automatic speech recognition: A
review." International Journal of Engineering Trends and Technology
4.2 (2013): 1-5.
[9] Imario, Anjar, Dodi Wisaksono Sudiharto, and Endro Ariyanto. "The
validated voice recognition measurement of several tribes in
Indonesia using easy VR 3.0. Case study: The prototype of automated
doors." 2017 International Seminar on Application for Technology of
Information and Communication (iSemantic). IEEE, 2017.
[10] Basyal, Lochan, Sandeep Kaushal, and Gurjeet Singh. "Voice
Recognition Robot with Real Time Surveillance and Automation."
International Journal of Creative Research Thoughts 6.1 (2018):
2320-2882.
[11] King, Rawlson O'Neil. Speech and Voice Recognition. Biometrics
Research Group.[Online] 2020.
[12] RoboTech (n.d.), “EasyVR 3 User Manual Release 1.0.15.
RoboTechsrl,” from www.veear.eu/files/EasyVR-User-Manual.pdf,
accesed April 14th 2020.
[13] Dani, Akhmad Wahyu, Andi Adriansyah, and Dodi Hermawan.
"Perancangan Aplikasi Voice Command Recognition Berbasis
Android dan Arduino Uno." Jurnal Teknologi Elektro 7.1 (2016).
[14] Asnawi, Rustam, and Muhammad Said. "Testing of other languages
usage in addition to the default languages for the easy voice
recognition module." 2018 International Conference on Electronics
Technology (ICET). IEEE, 2018.
[15] Pressman, R.S,. Software Engineering A Practitioner’s Approach.
McGraw-Hill International Editions. 1992.
[16] K. L. Hidup, Keputusan Menteri Lingkungan Hidup Nomor : KEP-
48/MENLH/11/1996 tentang Baku Tingkat Kebisingan, Jakarta:
Menteri Negara Lingkungan Hidup, 1996.
T
ABEL
III.
THE SUCCESS RATE OF VOICE COMMANDS AT A DISTANCE OF
15
CM
No. Country
Language Command I Command
II
Distance 15 cm
Command I
Volume
Command II
Volume
30% 50% 100% 30% 50% 100%
1. Indonesia Lampu Nyala Lampu Mati 0% 0% 100% 0% 0% 100%
2. English Lights On Lights Off 0% 0% 100% 0% 0% 100%
3. Chinese Dengliang Ximie 0% 0% 100% 0% 0% 100%
4. Japanese Tento Shoto 0% 0% 100% 0% 0% 100%
5. Korean Bul-eul Kyeoda Sodeung 0% 0% 100% 0% 0% 100%
6. Swedish Ijuset Pa Tands 0% 0% 100% 0% 0% 100%
7. Italian Accendi Luci spente 0% 0% 100% 0% 0% 100%
8. Latin In Lucem Off Lumine 0% 0% 100% 0% 0% 100%
9. Ge
r
man Licht An Lichten Aus 0% 0% 100% 0% 0% 100%
Authorized licensed use limited to: National Central University. Downloaded on December 08,2020 at 14:25:57 UTC from IEEE Xplore. Restrictions apply.