Content uploaded by Mohamed Fezari
Author content
All content in this area was uploaded by Mohamed Fezari on Oct 30, 2016
Content may be subject to copyright.
Scientific Research and Essays Vol. 6(2), pp. 341-350, 18 January, 2011
Available online at http://www.academicjournals.org/SRE
ISSN 1992-2248 ©2011 Academic Journals
Full Length Research Paper
Hidden Markov model/Gaussian mixture models
(HMM/GMM) based voice command system: A way to
improve the control of remotely operated robot arm
TR45
Ibrahim M. M. El-emary1*, Mohamed Fezari2 and Hamza Attoui3
1Information Technology Deanship, King Abdulaziz University, Kingdom of Saudi Arabia.
2Department of Electronics, University of Annaba, Faculty of Engineering, Laboratory of Automatic and Signals,
Annaba, BP.12, Annaba, 23000, Algeria.
Accepted 11 November, 2010
A speech control system for a didactic manipulator arm TR45 is designed as an agent in a tele-
manipulator system command. Robust Hidden Markov Model (HMM) and Gaussian Mixture models
(GMM) are applied in spotted words recognition system with Cepstral coefficients with energy and
differentials as features. The HMM and GMM are used independently in automatic speech recognition
agent to detect spotted words and recognize them. A decision block will generate the appropriate
command and send it to a parallel port of the Personal Computer (PC). To implement the approach on a
real-time application, a PC parallel port interface was designed to control the movement of robot motors
using a wireless communication component. The user can control the movements of robot arm using a
normal speech containing spotted words.
Key words: Human-machine interaction, hidden Markov model, Gaussian mixture models, artificial intelligence,
automatic guided vehicle, voice command, robot arm and robotics.
INTRODUCTION
Manipulator robots are used in industry to reduce or
eliminate the need for humans to perform tasks in
dangerous environments. Examples of it include space
exploration, mining, and toxic waste cleanup. However,
the motion of articulated robot arms differs from the
motion of the human arm. While robot joints have fewer
degrees of freedom, they can move through greater
angles. For example, the elbow of an articulated robot
can bend up or down whereas a person can only bend
their elbow in one direction with respect to the straight
arm position (Beritelli et al., 1998; Bererton and Khosla,
2001). There have been many research projects dealing
with robot control and tele-operation of arm manipulators,
among these projects, there are some projects that build
intelligent systems (Kwee, 1997; Buhler et al., 1994;
*Corresponding author. E-mail: Omary57@hotmail.com.
Ibrahim et al., 2010;
www.ieeexplore.ieee.org/iel5/5398724/5404079/0540414
3.pdf?...). Since we have seen human-like robots in
science fiction movies such as in “I ROBOT” movie,
making intelligent robots or intelligent systems became
an obsession within the research group.
In addition, speech or voice command as human-robot
interface has a key role in many application fields and
various studies made in the last few years have given
good results in both research and commercial
applications (Bererton and Khosla, 2001; Rao et al.,
1998;
www.alzaytoonah.edu.jo/ICIT2009/documents/accepte
%20papers.pdf;
www.ieeexplore.ieee.org/iel5/5398724/5404079/0540414
3.pdf?... ; Yussof et al., 2005) just for speech recognition
systems. In this paper, we present a new approach to
solve the problem of the recognition of spotted words
within a phrase, using statistical approaches based on
342 Sci. Res. Essays
HMM and GMM (Gu and Rose, 2001; Rabiner, 1989). By
combining the two methods, the system achieves
considerable improvement in the recognition phase, thus
facilitating the final decision and reducing the number of
errors in decision taken by the voice command guided
system.
Speech recognition systems constitute the focus of a
large research effort in Artificial Intelligence (AI), which
has led to a large number of new theories and new tech-
niques. However, it is only recently that the field of robot
and Automatic Guided Vehicle (AGV) navigation has
started to import some of the existing techniques
developed in AI for dealing with uncertain information.
HMM is a robust technique developed all applied in
pattern recognition. Very interesting results were ob-
tained in isolated words speaker independent recognition
system, especially in limited vocabulary. However, the
rate of recognition is lower in continuous speaking
system. The GMM is also a statistical model that has
been used in speaker recognition and in isolated word
recognition systems. These two techniques HMM and
GMM were experimented independently and then
combined in order to increase the recognition rate. The
approach proposed here in this paper is to design a
system that gets specific words within a large or small
phrase, process the selected words (Spots) and then
execute an order (Djemili et al., 2004; Rabiner, 1989;
www.ieeexplore.ieee.org/iel5/5398724/5404079/0540414
3.pdf?...). As an application of this approach, a set of four
reduction motors were activated via a wireless designed
system installed on a Personal Computer (PC) parallel
port interface. The application uses a set of twelve
commands in Arabic words, divided in two subsets one
subset contains the names of main parts of a robot arm
(arm, fore-arm, wrist (hand), and gripper), the second
subset contains the actions that can be taken by one of
the parts in subset one (left, right, up, down, stop, open
and close). A specific word like “yade” which means arm
is also used at the beginning of the phrase as a
“password”. Voice command needs the recognition of
spotted words from a limited vocabulary used in AGV
system (Ferrer et al., 2000; Heck, 1997) and in
manipulator arm control (Rodriguez et al., 2003).
APPLICATION DESCRIPTION
Our used application is based on the voice command for
a set of four reduction motors. It therefore involves the
recognition of spotted words from a limited vocabulary
used to recognise the part and the action of a robot arm.
The vocabulary is limited to twelve words divided into
two subsets: object name subset necessary to select the
part of the robot arm to move and command subset
necessary to control the movement of the arm example
like: turn left, turn right and stop for the base (shoulder),
Open close and stop for the gripper. The number of
words in the vocabulary was kept to a minimum both to
make the application simpler and easier for the user.
The user selects the robot arm part by its name then
gives the movement order on a microphone, connected
to sound card of the PC. The user can give the order in a
natural language phrase as example: “Yade, gripper
open execute”. A speech recognition agent based on
HMM technique detects the spotted words within the
phrase, recognises the main word “Yade” witch is used
as a keyword in the phrase, it recognises the spotted
words, then the system will generate a byte where the
four most significant bits represent a code for the part of
the robot arm and the four less significant bits represent
the action to be taken by the robot arm. Finally, the byte
is sent to the parallel port of the PC and then it is
transmitted to the robots through a wireless transmission
system.
The application is first simulated on PC. It includes
three phases: the training phase, where a reference
pattern file is created, the recognition phase where the
decision to generate an accurate action is taken and the
appropriate code generation phase, where the system
generates a code of 8 bits on parallel port. In this code,
four higher bits are used to codify the object names and
four lower bits are sued to codify the actions. The action
is shown in real-time on parallel port interface card that
includes a set of four stepper motors to show what
command is taken and the radio frequency emitter.
THE SPEECH RECOGNITION AGENT
The speech recognition agent is based on HMM. In this
paragraph, a brief definition of HMM is presented and
speech processing main blocks are explained.
However, a pre-requisite phase is necessary to process
a data base composed of twelve vocabulary words
repeated twenty times by fifty persons (Twenty five
male and twenty five female). So, before starting in the
creation of parameters, 50*20*12 “wav” files are
recoded in a repository. Files from 35 speakers are
saved on DB1 to be used for training and files from 15
speakers are used for tests and then saved in DB2,
theses test are done off-line.
In the training phase, each utterance (saved wav file)
is converted to a Cepstral domain (MFCC features,
energy, and first and second order deltas) which
constitutes an observation sequence for the estimation
of the HMM parameters associated to the respective
word. The estimation is performed by optimisation of
the likelihood of the training vectors corresponding to
each word in the vocabulary. This optimisation is
carried by the Baum-Welch algorithm (Rabiner, 1989;
Figure 1. Presentation of left-right (Bakis) HMM.
Ibrahim et al., 2010).
HMM MODEL BASICS
A HMM is a type of stochastic model appropriate for non
stationary stochastic sequences, with statistical
properties that undergo distinct random transitions among
a set of different stationary processes. In other words, the
HMM models a sequence of observations as a piecewise
stationary process. Over the past years, HMM have been
widely applied in several models like pattern (Djemili et
al., 2004), or speech recognition (Djemili et al., 2004;
Ferrer et al., 2000). The HMMs are suitable for the
classification from one or two dimensional signals and
can be used when the information is incomplete or
uncertain. To use HMM, we need a training phase and a
test phase. For the training phase, we usually work with
the Baum-Welch algorithm to estimate the parameters (Π
i,A,B) for the HMM (Rabiner, 1989; Ferrer et al., 2000).
This method is based on the maximum likelihood
criterion. To compute the most probable state sequence,
the Viterbi algorithm is the most suitable.
The HMM model is basically stochastic finite state
automaton which generates an observation string, that is,
the sequence of observation vectors, O = O
1
,..O
t
,… ,O
T
.
Thus, HMM model consists of a number of N states
S={S
i
} and of the observation string produced as a result
of emitting a vector O
t
for each successive transitions
from one state S
i
to a state S
j
. O
t
is d dimension and in
the discrete case takes its values in a library of M
symbols.
The state transition probability distribution between
state S
i
to S
j
is A= {a
ij
}, and the observation probability
distribution of emitting any vector O
t
at state Sj is given
by B= {b
j
(O
t
)}. The probability distribution of initial state
is ={
i
}.
1
( )
ij t j t i
a P q S q S
+
= = = (1)
El-emary et al. 343
B = {b
j
(O
t
)} (2)
0
( )
i i
P q S
π
= = (3)
Given an observation O and a HMM model = (A,B,),
the probability of the observed sequence by the
forward-backward procedure P(O/) can be computed
(Kwee, 1997). Consequently, the forward variable is
defined as the probability of the partial observation
sequence 1 2
,....
t
O O O
(until time t) and the state S at
time t, with the model as (i). The backward variable is
defined as the probability of the partial observation
sequence from t+1 to the end, given state S at time t and
the model as (i). The probability of the observation
sequence is computed as follow:
1 1
( / ) ( )* ( ) ( )
N N
t t T
i i
P O i i i
λ α β α
= =
= =
(4)
and the probability of being in state I at time t (given the
observation sequence O and the model ) is computed
as follows:
0
( )
i i
P q S
π
= = (5)
The flowchart of a connected HMM is an HMM with all
the states linked altogether (every state can be reached
from any state). The Bakis HMM is left to right transition
HMM with a matrix transition defined as shown in Figure
1.
GMM MODEL BASICS
The GMM can be viewed as a hybrid model between
parametric and non- parametric density models as shown
in Figure 2. Like a parametric model, it has structure and
parameters that control the behavior of density in known
ways. Like non-parametric model it has many degrees of
freedom to allow arbitrary density modeling. The GMM
density is defined as weighted sum of Gaussian densities
given by Equation 6 as follows:
),,()(
,mmmMG CmxgWxP
= (6)
Here m is the Gaussian component (m=1…M), and M is
the total number of Gaussian components. Wm are the
component probabilities (wm = 1), also called weights.
We consider K-dimensional densities, so the argument is
a vector x = (x1, ... , xK)T. The component probability
344 Sci. Res. Essays
GMM
model
HMM
model
Likelihood
Training
samples
Input
speech
Decision
block
Action
Likelihood
Figure 2. Speech recognition agent based on HMM/GMM
model.
0 0.5 1 1.5 2 2.5 3 3.5 4
4.5
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Silence
Figure 3. Phrase test “yade diraa fawk tabek” and Silence at the
beginning and at the end.
density function (pdf), g(x, m, Cm), is a K-dimensional
Gaussian probability density function (pdf) given in
Equation 7 as follows:
2/1
2/
)()(2/1 )2/(*1),,(
1
m
k
xcx
mm CeCxg m
m
T
mΠ= −−− −
µµ
µ
(7)
Where m is the mean vector, and Cm is the covariance
matrix. Now, a Gaussian mixture model probability
density function is completely defined by a parameter list
given by = {w1, 1, C1... w1, 1, C1} m, m=1…M.
Organizing the data for input to the GMM is important
since the components of GMM play a vital role in making
the word models. For this purpose, we use K- means
clustering technique to break the data into 256 cluster
centroids. These centroids are then grouped into sets of
32 and then passed into each component of GMM. As a
result, we obtain a set of 8 components of GMM. Once
the component inputs are decided, the GMM modelling
can be implemented as (Figure 3).
EM ALGORITHM
The expectation maximization (EM) algorithm is an
iterative method for calculating maximum likelihood
distribution parameter estimates from incomplete data
(elements missing in feature vectors). The EM update
equations are used which gives a procedure to iteratively
maximize the log-likelihood of the training data given the
model. The EM algorithm is a two step process:
=
=
Mm
i
j
i
jt
i
j
i
m
i
mt
i
m
Cxgw
Cxgw
tmy
..1
),,(
),,(
),(
µ
µ
(8)
Estimation step in which current iteration values of the
mixture are utilized to determine the values for the next
iteration as given in Equation 8.
Maximization step in which the predicted values are
then maximized to obtain the real values for the next
iteration as given in Equations 9, 10 and 11.
=
=
+=
Tt
tm
Tt
ttm
i
my
Xy
..1
,
..1
,
1
µ
(9)
=
+=
Tt
tm
i
myW
..1
,
1 (10)
=
=
+
+
−
=
Tt
tm
Tt
i
jmjttm
i
jm y
xy
..1
,
..1
21
,,,
1
,
)(
µ
λ
(11)
EM algorithm is well known and highly appreciated for its
numerical stabilities under threshold values of min.
Using the final re-estimated w, and C, the value of
GMM
Lis calculated with respect to all the word models
available with the recognition engine as shown in
Equation12
=
=
Tt
tGMGMM XP
T
L
..1
)(log
1 (12)
El-emary et al. 345
Figure 4. Windowing diagram.
HMM/GMM MODEL
The HMM/GMM hybrid model has the ability to find the
joint maximum probability among all possible reference
words W given the observation sequence O. In real case,
the combination of the GMMs and the HMMs with a
weighted coefficient may be a good scheme because of
the difference in training methods. The ith word
independent GMM produces likelihood LiGMM, I = 1,
2,…, W, where W is the number of words. The ith word
independent HMM also produces likelihood LiHMM, I = 1,
2,…, W. All these likelihood values are passed to the
likelihood decision block, where they are transformed into
the new combined likelihood L’ (W):
L'(W)= (1− x(W))LiGMM + x(W)LiHMM (13)
Where x(W) denotes a weighting coefficient.
The value of x is calculated during training of the Hybrid
model. In Hybrid Testing, the subset of training data is
used and its HMM and GMM likelihood values are
calculated which combined using weighing coefficient.
Static values of weighted coefficient are also used in
order to get higher recognition rate. In that case the
conception of 12 HMM models one per vocabulary word
and 12 GMM models, on for each word. The resulting of
both models is taken by the decision block.
SPEECH PROCESSING PHASE
Once the phrase is acquired through a microphone and
the PC sound card, the samples are stored in a wav file.
Then the speech processing phase is activated. During
this phase the signal (samples) goes through different
steps: pre-emphasis, frame-blocking, windowing, feature
extraction and Mel-Filter Cepstral Coefficients (MFCC)
analysis.
Pre-emphasis step
In general, the digitized speech waveform has a high
dynamic range. In order to reduce this range, pre-
emphasis is applied. By pre-emphasis [1], we imply the
application of a high pass filter, which is usually a first -
order FIR of the form
H
(
z)
=
1
a
×
z
1. The pre-emphasis
is implemented as a fixed- coefficient filter or as an
adaptive one, where the coefficient a i s a dj ust ed wi th
tim e acc ordi ng to the autocorrelation values of the
speech. The pre-emphasis block has the effect of spectral
flattening which renders the signal less susceptible to
finite precision effects (such as overflow and underflow)
in any subsequent processing of the signal. The selected
value for a in our work is 0.9375.
Frame blocking
Since the vocal tract moves mechanically slowly, speech
can be assumed to be a random process with slowly
varying properties. Hence, the speech is divided into
overlapping frames of 20 ms every 10 ms. The speech
signal is assumed to be stationary over each frame and
this property will prove useful in the following steps.
346 Sci. Res. Essays
PE-FB-W
FFT
LOG
DC
T
Output
MFCC
Input
Speech
Figure 5. MFCC block diagram.
Windowing
To minimize the discontinuity of a signal at the beginning
and the end of each frame, we window each frames. The
windowing tapers the signal to zero at the beginning and
end of each frame. A typical window is the Hamming
window of the form:
2
( ) 0.54 0.46*cos 0 1
1
n
W n n N
N
π
= − ≤ ≤ −
−
(14)
Feature extraction
In this step, speech signal is converted into stream of
feature vectors coefficients which contain only that
information about given utterance that is important for its
correct recognition. An important property of feature
extraction is the suppression of information irrelevant for
correct classification, such as information about speaker
(e.g. fundamental frequency) and information about
transmission channel (e.g. characteristic of a
microphone). The feature measurements of speech
signals are typically extracted using one of the following
spectral analysis techniques: MFCC Mel frequency filter
bank analyzer, LPC analysis or discrete fourier transform
analysis. Currently the most popular features are Mel
frequency Cepstral coefficients MFCC (Rabiner, 1989).
MFCC analysis
The MFCC are extracted from the speech signal as
shown in Figure 4. The speech signal is pre-emphasized,
framed and then windowed, usually with a Hamming
window. Mel-spaced filter banks are then utilized to get
the Mel spectrum. The natural logarithm is then taken to
transform into the cepstral domain and the discrete
cosine transform is finally computed to get the MFCCs as
shown in the block diagram of Figure 5.
1
( 1/ 2)
log( )* cos
N
k i
i
k i
C E N
π
=
−
=
(15)
Where the acronyms signify:
- PE-FB-W: Pre-Emphasis, Frame Blocking and
windowing.
- FFT: Fast Fourier Transform
- LOG: Natural Logarithm
- DCT: Discrete Cosine Transform
PARALLEL INTERFACE CIRCUIT
The speech recognition agent based on HMM will detect
words, and process each word. Depending on the
probability of recognition of the object name and the
command word a code is transmitted to the parallel port
of the PC. The vocabulary to be recognized by the
system and their meanings are listed as in Table 1. It is
obvious that within these words, some are object names
and other are command names. The code to be
transmitted is composed of 8 bits, four bits most
significant bits are used to code the object name and the
four least significant bits are used to code the command
to be executed by the selected object. Example: “yade
diraa fawk tabek”.
A parallel port interface was designed to display the
real-time commands. It is based on the following TTL IC
(integrated circuits): a 74LS245 buffer, a microcontroller
PIC16F84 and a radio frequency transmitter from RADIO
Table 1. Meaning of the vocabulary voice commands, assigned
code and controlled motor.
1) Yade (1) Name of the manipulator (keyword)
2) Diraa (2) Upper limb motor (M1)
3) Saad (3) Limb motor (M2)
4) Meassam(4) Wrist (hand) motor (M3)
5) Mikbath(5) Gripper motor (M4)
6) Yamine (1) Left turn (M0)
7) Yassar (2) Right turn (M0)
8) Fawk (3) Up movement M1, M2 and M3
9) Tahta (4) Down movement M1, M2 and M3
10) Iftah (5) Open Grip, action on M4
11) Ighlak (6) Close grip, action on M4
12) Kif (7) Stop the movement, stops M0,M1, M2,
M3r or M4
METRIX TX433-10 (modulation frequency 433 Mhz and
transmission rate 10 Kbs) (Table 1), Data Sheet
PIC16F876 (2001).
TR45 MANIPULATOR ARM DESCRIPTION AND
INTERFACE
As in Figure 6.b and 6.c, the structure of the mechanical
hardware and the computer board of the robot arm in this
paper is similar to MANUS (Kwee, 1997; Buhler et al.,
1994). However, since the robot arm needs to perform
simpler tasks than those in (Heck, 1997). do. The robot
arm is composed of four feedback controlled movements
for the elements: base, upper-limb, limb and wrest the
movement command is realised by a moto-reductor block
(1/500) powered by +12 and – 12 volts. The copy of
voltage is given by a linear rotator potentiometer fixed on
the moto-reductor block and powered by +10 and -10
volts.
One open loop controlled movement, for the gripper,
with the same type of command. Displacement
Characteristics is given by the following angle values:
Base : 290°
Upper limb: 108°
Lim : 280°
Wrist : 290°
Gripper: 100°
The computer board of the robot arm consists of a
PIC16F876, with 8K-instruction Electrically
Programmable Read Only Memory (EEPROM), three
timers and 3 ports (Larson, 1999], four power circuits to
drive the moto-reductors and one H bridges driver using
BD134 and BD133 transistors for DC motor to control the
El-emary et al. 347
gripper, a RF receiver module from RADIOMETRIX
which is the SILRX-433-10 (modulation frequency 433
MHz and transmission rate is 10 Kbs) [16 as shown in
Figure 6b.
Each motor in the robot arm performs the
corresponding task to a received command (example:
“yamin”, “kif” or Fawk”) as in Table 1. Commands and
their corresponding tasks in autonomous robots may be
changed in order to enhance or change the application.
In the recognition phase, the speech recognition agent
gets the sentence to be processed, treats the spotted
words, then takes a decision by setting the corresponding
bit on the parallel port data register and hence the
corresponding LED is on. The code is also transmitted in
serial mode to the TXM-433-10, (Yamano et al., 2005).
EXPERIMENTS ON THE SYSTEM
The speech recognition agent is tested within the laboratory of
L.A.S.A, were two different conditions to be tested are runes: off
line, by using DB2 and on real-time. There are three types of testes:
HMM and GMM models then the HMM/GMM model is tested on-
line and in real-time.
The results are presented in Figure 7 After testing the recognition
of command and object words 100 times in the following conditions:
(a) off-line witch means test words are selected from DB1 (b) On
real-time witch means some users will command the system in real-
time . The results are shown in Figures 7a and b. it is obvious that
the real-time test results are lower compared to that of off-line tests,
this is mainly due to environment and material conditions changes.
DISCUSSION
The recognition of spotted words from a limited
vocabulary in the presence of background noise was
used in this paper. The application is speaker-inde-
pendent. Therefore, it does not need a training phase for
each user. It should, however, be pointed out that this
condition does not depend on the overall approach but
only on the method with which the reference patterns
were chosen. So, by leaving the approach unaltered and
choosing the reference patterns appropriately (based on
speakers), this application can be made speaker-
dependent.
The effect of the environment is also taken into con-
sideration and here the results on the HMM/GMM model
just by changing the microphone, we notice that with the
microphone Mic1 used in recording the data base we get
better rate than using a new microphone Mic2. The HMM
based model gives better results than GMM
independently, by combining GMM and HMM and using
as features MFCC and differentials we increased the
recognition rate. The application is speaker independent.
However, by computing parameters based on speakers’
pronunciation the system can be speaker dependant.
348 Sci. Res. Essays
Figure 6a. Parallel interface circuit and a photo of the
designed card.
Figure 6b. Robot arm block diagram (Computer board and
motors).
CONCLUSION AND FUTURE WORKS
A voice command system for robot arm is proposed and
implemented in this paper based on a hybrid model
HMM/GMM for spotted words. The results of the tests
shows that a better recognition rate can be achieved
using hybrid techniques and especially if the phonemes
Figure 6c. Overview of the robot arm and parallel interface.
0
20
40
60
80
100
120
Yade
Diraa
Saad
Meassa m
Mikbath
Yamine
Yassar
Faw k
Tahta
Iftah
Ighlak
Kif
HMM GMM HMM/GMM
Rate
Figure 7a. HMM, GMM and HMM/GMM models results, off-line
tests.
of the selected word for voice command are quite
different. The effect of the used microphone for tests is
proved in the results presented in Figure 7c. However, a
good position of the microphone and additional filtering
may enhance the recognition rate.
Spotted words detection is based on speech detection
then processing of the detected. Once the parameters
were computed, the idea can be implemented easily
El-emary et al. 349
0
10
20
30
40
50
60
70
80
90
100
Yade
Diraa
Saad
Meassam
Mikbath
Yamine
Yassar
Fawk
Tahta
Iftah
Ighlak
Kif
HMM GMM HMM/GMM
Figure 7b. HMM, GMM and HMM/GMM models results, on real-time tests.
0
10
20
30
40
50
60
70
80
90
100
Yade
Diraa
Saad
Meassam
Mikbath
Yamine
Yassar
Fawk
Tahta
Iftah
Ighlak
Kif
Mic2 Mic1
Rate
Figure 7c. Microphone effect on teh results.
within a hybrid design using a DSP and a microcontroller
since it does not need too much memory capacity.
Finally, since the designed electronic command for the
robot arm consists of a microcontroller and other low-cost
components namely wireless transmitters, the hardware
design can easily be carried out as a future works (Kim et
al., 1998; Fezari et al., 2005). Also, the application can be
implemented on a DSP or a microcontroller in the future
in order to be autonomous (Hongyu et al., 2004
www.alzaytoonah.edu.jo/ICIT2009/documents/accepted
%20papers.pdf).
REFERENCES
Beritelli F, Casale S, Cavallaro A (1998). A Robust Voice Activity
Detector for Wireless Communications Using Soft Computing, IEEE
Journal on Selected Areas in Communications (JSAC), special Issue
Signal Process. Wireless Communications, 16: 9.
Bererton C, Khosla P(2001). Towards a team of robots with
reconfiguration and repair capabilities, Proceedings of the 2001 IEEE
International Conference on Robotics and Automation, pp. 2923-
2928.
Rao RS, Rose K, Gersho A (1998). Deterministically Annealed Design
of Speech Recognizers and Its Performance on Isolated Letters,
Proceedings IEEE ICASSP'98, pp. 461-464.
Gu L, Rose K (2001). Perceptual Harmonic Cepstral Coefficients for
Speech Recognition in Noisy Environment. Proc ICASSP 2001, Salt
Lake City.
Djemili R , Bedda M, Bourouba H (2004). Recognition Of Spoken
Arabic Digits Using Neural Predictive Hidden Markov Models. Int.
Arab J. Inform. Technol., IAJIT, 2: 226-233.
Rabiner LR ( 1989).Rabiner. Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition. Readings in Speech
Recognition, chapter A, pp. 267-295,
Hongyu LY, Zhao Y, Dai, Wang Z (2004). A secure Voice
Communication System Based on DSP, IEEE 8th International Conf.
on Cont. Atau. Robotc and Vision, Kunming, China, pp. 132-137.
Ferrer MA, I Alonso, C Travieso (2000). Influence of initialization and
350 Sci. Res. Essays
Stop Criteria on HMM based recognizers, Electronics Lett. IEE, 36:
1165-1166.
Kwee H (1997). Intelligent control of Manus Wheelchair. In:
proceedings Conference on Reabilitation Robotics, ICORR’97, Bath
1997, pp. 91-94.
Yussof JM (2005). A Machine Vision System Controlling a Lynxarm
Robot along a Path, University of Cape Town, South Africa, October
28.
Yamano HM, Nasu Y, Mitobe K, Ohka M (2005). Obstacle Avoidance in
Groping Locomotion of a Humanoid Robot, Int. J. Adv. Robotic Syst.,
2(3): 251 – 258.
Buhler C, Heck H, Nedza J, Schulte D(1994). MANUS wheelchair-
Mountable Manipulator- Further Devepolements and Tests. Manus
Usergroup Mag., 2(1): 9-22.
Heck H (1997). User Requirements for a personal Assistive Robot, In
proc. Of the 1st MobiNet symposium on Mobile Robotics Technology
for Health Care Services, Athens, pp. 121-124.
Rodriguez E, Ruiz B, Crespo AG, Garcia F (2003). Speaker Recognition
Using a HMM/GMM Hybrid Model”. In: Proceedings of the First
International Conference on Audio- and Video-Based Biometric
Person Authentication, pp. 227- 234.
larson M (1999). Speech Control for Robotic arm within rehabilitation.
Master thesis, Division of Robotics, Dept of mechanical engineering
Lund Unversity, Sweden
Data sheet PIC16F876 (2001). From Microchip inc. User’s Manual,
http://www.microchip.com.
Radiometrix components (2010). TXm-433 and SILRX-433 Manual, HF
Electronics Company. http://www.radiometrix.com.
Kim WJ, Lee JM, Kang SY, Shin JC (1998). Development of A voice
remote control system. In Proceedings of the 1998 Korea Automatic
Control Conference, Pusan, Korea, pp. 1401-1404.
Fezari MM, Bousbia-S, Bedda M (2005). Hybrid technique to enhance
voice command system for a wheelchair, In: proceedings of Arab
Conference on Information Technology ACIT’05, Jordan.
Ibrahim M, El Emary M, Fezari M (2010). "Speech as a High Level
control for Teleoperated Manipulator Arm", The second International
Conference on Advance Computer Control, China, 27-29.March.,
http: i ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5486727
ww.alzaytoonah.edu.jo/ICIT2009/documents/accepted%20papers.pdf
www.ieeexplore.ieee.org/iel5/5398724/5404079/05404143.pdf?