ArticlePDF Available

Improving English Pronunciation: An Automated Instructional Approach


Abstract and Figures

This paper describes an experiment in which groups of children attempted to improve their English pronunciation using an English-language learning software, some English films, and a speech-to-text software engine. The experiment was designed to examine two hypotheses. The first is that speech-to-text software, trained in an appropriate voice, can act as an evaluator of accent and clarity of speech as well as help learners acquire a standard way of speaking. The second is that groups of children can operate a computer and improve their pronunciation and clarity of speech, on their own, with no intervention from teachers. The results of the experiment are positive and point to a possible new pedagogy. (c) 2003 Massachusetts Institute of Technology Information Technologies and International Development.
Content may be subject to copyright.
Mitra, Tooley, Inamdar, DixonResearch Reports /Improving English Pronunciation
© 2003 The Massachusetts Institute of Technology
Information Technologies and International Development
Volume 1, Number 1, Fall 2003, 75–84
Centre for Research in
Cognitive Systems
NIIT Limited
Synergy Building
IIT Campus
New Delhi, India
University of Newcastle
Newcastle Upon Tyne, England
Centre for Research in
Cognitive Systems
NIIT Limited
Synergy Building
IIT Campus
New Delhi, India
University of Newcastle
Newcastle Upon Tyne, England
Improving English Pronunciation:
An Automated Instructional
Sugata Mitra,aJames Tooley,b
Parimala Inamdar,cand Pauline Dixond
This paper describes an experiment in which groups of children attempted to
improve their English pronunciation using an English-language learning
software, some English ªlms, and a speech-to-text software engine. The
experiment was designed to examine two hypotheses. The ªrst is that speech-
to-text software, trained in an appropriate voice, can act as an evaluator of
accent and clarity of speech as well as help learners acquire a standard way of
speaking. The second is that groups of children can operate a computer and
improve their pronunciation and clarity of speech, on their own, with no
intervention from teachers. The results of the experiment are positive and
point to a possible new pedagogy.
It has been assumed that speech-to-text (STT) programs cannot distin-
guish good pronunciation and have not been used for this purpose
(Goodwin-Jones 2000). However, we propose that STT programs can be
used instead of human listeners to evaluate the quality of pronunciation
by comparing against a standard accent and pronunciation. The proposi-
tion is based on a recent patent application (Mitra and Ghatak 2002).
To investigate this hypothesis, we need to discover whether an STT
program can judge pronunciation as well as a human can. If an STT pro-
gram were demonstrated to be as good as a human judge of pronuncia-
tion, it could be suggested that human speakers, particularly children,
may be able to improve their pronunciation on their own, using such a
program rather than requiring a human teacher. This hypothesis is inspired
by, and uses ªndings of, previous research that shows that groups of chil-
dren can learn to use computers to self-instruct without teachers (Mitra
1988, 2003; Mitra and Rana 2001). Popularly referred to as the “hole-in-
the-wall,” a series of experiments showed that groups of children (8–13
years old) were able to self-instruct in using computers for playing games,
surªng the Internet, painting, and other activities, irrespective of their so-
cial, linguistic, or ethnic backgrounds. Working in groups and unsuper-
vised, free access was important to the self-instructional process.
This paper is made up of four parts. First, the reasons there is desirabil-
ity for better pronunciation are examined. Second, the research method
and results are described, setting out how, where, and when the data
were collected. Third, the ªndings of the research are discussed, and ª-
nally the paper closes with a conclusion and a vision for the way forward.
Hyderabad is a large city in southern India and, like
many such cities in India, contains sprawling slum
areas. These areas in Hyderabad contain a large
number of small private schools, sometimes as many
as 10 in a square kilometre. These schools are ªlled
to capacity with children whose parents pay sub-
stantial amounts for education (in comparison to
their incomes) in spite of the fact that there are sev-
eral free schools in the vicinity operated by the gov-
ernment. The single most important reason the slum
parents send their children to these private schools
is the English language.
There are 17 languages recognized by the United
Nations, and more than 700 dialects related to these
languages, spoken in India. Hindi is the national lan-
guage, and English is a common “bridging” lan-
guage that is used everywhere.
As a consequence of its colonial past, people
who speak English are generally considered more
suitable for most jobs than people who do not. Al-
though this may not be a happy situation, it is one
of the main reasons India scores over other South-
east Asian nations in the software industry. India is
the second largest exporter of software in the world
after the United States, and its success is often at-
tributed to the ability of its industry and its people
to deal with the English language.
The ability to speak in English can determine the
living standards and occupations of most Indians. It
is for this reason that the private schools in Hydera-
bad prosper.
Although these schools teach English with a rea-
sonable effectiveness, they suffer from a severe
mother-tongue inºuence (MTI). The products of
such schools, and, indeed, any school in India, can
be ºuent in English but would often speak the lan-
guage with an accent that is incomprehensible any-
where in the world. The reason for this is because
teachers in such schools have a strong MTI them-
selves, which their students copy, and the problem
perpetuates itself. It is in this context that the exper-
iment in this study was conducted.
People from all countries are now working and liv-
ing in a globalized environment where communica-
tion from and to almost anywhere in the world may
occur practically instantaneously. Labor mobility and
the existence of international employment opportu-
nities have heightened the need to communicate
and to be understood. The most recent of such ser-
vices are in the information technology (IT) enabled
services (ITES) area. These services use communica-
tion technology to leverage the (often cheaper)
work force of one country to service the require-
ments of another. For example, the cost of taking
orders on the telephone is cheaper in India than in
the United States because of lower salaries for tele-
phone operators. As a result, many U.S. companies
have set up call centers in India, where telephone
operators take orders for American goods from
American customers, who are unaware that the
conversation they are having is with an Indian lo-
cated in India. English is generally regarded as the
language that can provide this communication uni-
versality, so much so that parents know how impor-
tant it is for their children to master the English
language for them to succeed. Moreover, in devel-
oping countries it has been found that not only do
the elite and middle classes aspire for their children
to acquire and master English language skills but
also parents from poor slum areas now place great
importance on their children attaining the ability to
read, write, and speak in English. In the private
schools in the low-income areas of Hyderabad, In-
dia, for instance, it is shown that the single most
important element for parents choosing a private
school is that it teaches the entire curriculum in the
English vernacular (Tooley and Dixon Forthcoming).
However, such ambitions are thwarted because
good-quality teachers (and, in particular, native Eng-
lish speakers) are not available to provide, inter alia,
good English pronunciation language skills to their
students. As a result, even the best students from
the system have a strong MTI on their English pro-
nunciation and their speech is only barely under-
standable. This, in turn, affects their employability
and social status severely, leading to underemploy-
ment and, sometimes, crime, poverty, and associ-
ated urban social problems.
The need to be understood for cultural as well as
ITES purposes is the reason behind the undertaking
of the experiment. We decided to conduct the ex-
periment in an area of great educational need,
where untrained teachers were catering to children
whose parents were illiterate and where English was
not widely spoken although greatly desired. The ex-
periment sets out to explore two hypotheses:
76 Information Technologies and International Development
Research Reports / Improving English Pronunciation
Hypothesis 1: The objective measurement of
words correctly recognized by an STT program
closely parallels subjective human judgments of
Hypothesis 2: Given suitable software and
hardware, children, and in particular disadvan-
taged children, can instruct themselves, with-
out requiring a teacher, in the improvement of
their English pronunciation.
The experiment took place in Hyderabad, India, be-
tween September 2002 and January 2003. A com-
puter was placed in a private, unaided school (Peace
High School, Edi Bazaar), which caters to very low-
income families, situated in a slum area of the city.
The school fees are between 75 rupees to 150 ru-
pees per month (about US$1.50 to US$3 per
month). The school serves children of auto-rickshaw
drivers, market traders, and service workers, with
approximate monthly income ranging from 1,000
rupees to 3,000 rupees per month. Most families
have more than one child; hence, the school fees
can exceed 25% of the monthly incomes. A sample
of 16 children from different classes in the school,
between 12 and 16 years old, were chosen at ran-
dom to participate in the study. The school is an
English medium school; the entire curriculum is
taught in English. The mother tongue of the chil-
dren is either Urdu or Telegu. The children could al-
ready read and speak English at variable standards
of competency.
The school was provided with a computer. The
multi-media computer was a Pentium P4, 1.8 GHz,
256 Mb RAM, with microphone and speakers. The
specialized software used was “Ellis Kids” and
“Dragon Naturally Speaking,” using the Windows
XP operating system. The system was installed in a
quiet part of the school. This ensured that the
sounds received by the microphone were only those
of the learners and not external noise. STT programs
are often confused in noisy environments.
Ellis ( is one of the leading
English language learning software programs in the
market, released in 1992. Ellis Kids is its specialized
children’s package. It teaches vocabulary, listening,
grammar, and communication skills through multi-
media inputs such as video, audio, and text.
Learning is tested through multiple-choice ques-
tions; the students are also able to test their pronun-
ciation by recording a few words of their speech
and comparing it with a standard English spoken
voice. Ellis is not equipped with any text-to-speech
capability and is not capable of providing the user
with any feedback on the quality of the learner’s
pronunciation. Indeed, there is no software at this
time that can do so. However, the focus of Ellis is
the whole of language learning, of which pronunci-
ation is a small part.
Dragon Naturally Speaking (www.dragonsystems.
com) is an STT engine. Dragon can be trained to
recognize, interpret, and convert to text a particular
user’s speech. This is done by comparing the audio
input with a stored proªle. To create a proªle, a user
has to read several passages provided by the soft-
ware into the computer’s microphone. These read-
ings are then converted into a proªle for the user,
about 3 Mb in size, which needs to be selected be-
fore speech can be entered into the word processing
program. Naturally, Dragon will best recognize those
users’ speech in whose voices it has been trained.
This experiment used the Dragon STT program
uniquely as a learning and testing tool for English
pronunciation. Four proªles of speech were created
in Dragon: Pauline, English female, James, English
male, Sugata, Indian male, Parimala, Indian female.
These were used as pronunciation standards against
which students practiced and were measured. Stu-
dents read passages into Dragon using any one of
these proªles. For the ªnal measurements, all male
voices were measured against the proªle Sugata,
and all female voices were measured against the
proªle Parimala.
Four ªlms were also installed in the computer as ad-
ditional environmental inputs for spoken English.
These were The Sound of Music, My Fair Lady, Guns
of Navarone, and The King and I. There are two rea-
sons for introducing these ªlms into the experi-
ments. First, the students have no reference to how
they should correctly pronounce an English word, al-
though Dragon may tell them that their pronuncia-
tion is incorrect. By watching 4 hours of ªlms, we
expected them to have heard most of the common
Volume 1, Number 1, Fall 2003 77
Mitra, Tooley, Inamdar, Dixon
English words as spoken by native English speakers.
The second reason for selecting these ªlms was to
break the monotony of practicing a few passages
from a school reader everyday. The ªlms were cho-
sen because all, except Guns of Navarone, had
something to do with education and children or
young people. Guns of Navarone was chosen be-
cause it contains the kind of action that young peo-
ple seem to enjoy.
Students, consisting of eight girls and eight boys,
were randomly organized into four groups of four
and each group was instructed to spend a total of
3 hours per week with the system (Figure 1). A
timetable was made out for this purpose and given
to the students.
The students were then given a demonstration of
Ellis and Dragon software and were told the names
and locations of the movies stored in the computer.
They were given no instructions other than that they
should try to improve their English pronunciation by
reading passages from their school English text-
books into Dragon and to try to make Dragon rec-
ognize as much of their readings as possible. They
were told to use Ellis if they wished, or to watch any
ªlm of their choice on the computer, if bored. It was
suggested to each group that they should help each
other and collectively organize what they would do
with the computer when it was their turn to use it.
There was to be no organized adult intervention af-
ter this point, except in case of any equipment fail-
ure and related maintenance activity. We relied on
the earlier results (Mitra 1988, 2003; Mitra and
Rana 2001) that groups of children can instruct
themselves without teachers if given adequate com-
puting resources. In effect, the children were left
with a clear objective (to alter their speech until un-
derstood by the Dragon STT program) and a demon-
stration of the resources available (i.e., computer,
Dragon, Ellis, and movies). They were asked to orga-
nize themselves to meet this objective and they
78 Information Technologies and International Development
Research Reports / Improving English Pronunciation
Figure 1. Children at Peace High School, Edi Bazaar, Hyderabad, India, working on their computer.
readily agreed to try. Indeed, they agreed with great
The experiment was started in September 2003.
Each month, a researcher visited the school and
asked all 16 children to read passages from a stan-
dard English textbook into the Dragon STT program.
From October, they took measurements of the stu-
dents’ reading, comparing the resultant text pro-
duced by Dragon with the correct text, and
calculated the percentage of correctly identiªed
words. Each passage was of approximately 100
words. The text produced by Dragon was done us-
ing girls’ readings compared against the Parimala
proªle, whereas the boys’ readings were compared
against the Sugata proªle. Dragon performs poorly
when a female proªle is used to judge a male voice
and vice versa. This is because of obvious frequency
differences in male and female voices. It is interest-
ing to note that, for very young children of either
gender, a female proªle should be used for the
same reason.
Each month, starting in October, a new passage
was added to the one in the previous month; that
is, the children read one passage in September and
October, two in November, and so on until January
2003, when they read four passages. This was done
so that their pronunciation in the ªrst passage,
which they read four times in the 4 months since
October, could be compared with their pronuncia-
tion in the fourth passage, which they read for the
ªrst time in January.
It should be mentioned that there were interrup-
tions in the school calendar because of examinations
and holidays during the experiment, which reduced
the learners’ time spent at the computer. We esti-
mate that each child spent between 10 and 15
hours on the computer during this period. Mea-
surements were taken at monthly intervals from
September 2002 to January 2003; that is, ªve mea-
surements in total. In the ªrst month, the students
used Ellis only, and they began using Dragon in Oc-
tober. This was unavoidable because the Dragon
program was unavailable before October.
A video clip was made of each child reading each
passage in each of these monthly sessions. The
video clips were recorded so that human judges
would be able to watch and rank the pronunciations
of the children reading the passages. At this point it
is important to explain why we chose to record the
readings using video, and not just audio, as the
The ªrst hypothesis, as described earlier, is to de-
termine whether an STT program, such as Dragon,
would score and rank the pronunciations of human
speakers such that these scores and ranks are similar
to those by human judges. That is, the judgment of
the STT program and of human judges should show
concordance. Because Dragon is “judging” pronun-
ciation only by “listening” to the speakers, it would
seem natural that the judging process for humans
should also use only the audio from each speaker.
After all, it is possible that a human judge would
score differently if he or she were to be looking at a
speaker or only listening to a recording of a
speaker’s voice during the judging process.
Some reºection shows that this reasoning is in-
correct in the current context. First, we can ªnd no
evidence that the human judgment of pronunciation
is affected by whether the judge is only listening to
a voice or looking at the speaker. Second, an STT
program such as Dragon can neither “listen” nor
“judge” in any manner that can be remotely com-
parable to the process followed by humans. Humans
process speech and vision in a way that is, as far as
current understanding goes, different from the way
computers do (e.g., see Bhatnagar et al. 2002).
Dragon does not listen; it receives a stream of bits
(zeros and ones) and processes them mathematically
against another stored array of bits—the “proªle.”
It is therefore futile to try to give the humans and an
STT program the “same” input, in this case, audio.
What is more relevant is that the STT program
should be shown to behave as an instrument that
can measure pronunciation in a way such that its
measurements reºect the same values as those pro-
duced by human, subjective judgment. A measuring
instrument does not necessarily have to measure the
same parameter as a human being to come to simi-
lar conclusions. For example, an EEG machine can
detect an epileptic attack by measuring electrical
signals from the patient’s brain, whereas a human
being can detect the same attack by other, subjec-
tive audiovisual means.
In the present context, what we wish to study in
the children is their clarity and pronunciation in spo-
ken English as judged by other humans. In society,
such judgments are made more often in face-to-face
situations than otherwise. Indeed, in the present
Volume 1, Number 1, Fall 2003 79
Mitra, Tooley, Inamdar, Dixon
context of the children of Hyderabad, India, the
judgment of the effectiveness of their spoken Eng-
lish will be almost entirely done by others who are
in their presence. Although it is conceivable that
such a judgment may be made over telephone or
some other media where the speaker is not present
visually, it is not considered an important or relevant
factor for the present experiment.
We have therefore presented the human judges
with video clippings of the children reading, to be
closest to the social context in which their English
will be judged in real life. Dragon, on the other
hand, has been presented with audio bit streams of
these readings (in the 44 KHz, mono .wav format),
which is the only input it can accept.
Sometimes, some of the children were absent for
a session and their data could not be collected. Al-
though this is undesirable, it did not affect more
than 10% of the planned readings. The data at the
end of the experiment consisted of 16 children’s
reading of passages 1, 2, 3, and 4 over 5 months,
for a total of 160 readings, 160 resultant text ªles
from Dragon, and 160 video clips.
Analysis of data was done at the end of the experi-
ment. At this time, the children were ranked for
each of the readings in each of the months in order
of the percentage of words correctly identiªed by
Dragon, for each passage.
The video clips were labeled and randomized,
and viewed by four human judges, who are the au-
thors of this paper. Each judge, while viewing a
video clip, was unaware of when the recording had
been made. Because the process of scoring was
done only once, at the end of the experiment, it
was important that the judges be unaware of the
date of each recording. The judges then scored each
video clip in terms of their subjective assessment, on
a score of 1 to 10 (10 high), of how well each child
performed in English pronunciation, including clarity
of speech and accent. Eventually, the scores ob-
tained were used to rank the children according to
each judge.
It is important to establish ªrst that the human
judges are, themselves, in agreement with each
other before proceeding to match their rankings
with that of Dragon. Kendall’s coefªcient of concor-
dance (Siegel 1956) was calculated for the four
judges for each of the 5 months for this purpose.
Kendall’s coefªcient, W, is considered more effective
for smaller samples (less than 30) than Cohen’s
kappa for measuring concordance.
The same coefªcient (W) was also calculated
with Dragon acting as a “ªfth judge”; that is, the
ranking obtained from the process using Dragon
was added to those obtained from the four human
A control group is needed to verify the second hy-
pothesis. However, it was difªcult to obtain a con-
trol group at the beginning of the experiment. Any
child tested for pronunciation would want to be a
part of the experimental “course,” as they perceived
it. Because the equipment would be left in charge
of the children and no adult control would be im-
posed on them, it would be impossible to monitor if
any child from the control group was also joining
the experimental group. Based on advice from the
school principal, we decided to not measure any
child other than those chosen for the experimental
group. These children were instructed not to let
other children use the computer, an instruction they
followed meticulously. As a result, no control group
was tested at the beginning of the experiment, and
this creates limitations to the generalizability of the
results. The following method was adopted to rec-
tify this situation.
At the end of the experimental period (January
2003), 16 children were selected at random from
the same sections of the same school as those from
which the experimental groups had been selected.
These children were aware of the fact that some of
their peers were working on a computer, but they
had no other information about what they were do-
ing. Neither did this new group of children have any
exposure to computers or to any other teaching-
learning method other than that used traditionally
by the school. Therefore, the only difference be-
tween the new (control) group and our experimental
group was that the latter had worked on the com-
puter as described earlier for 5 months. The control
group was asked to read out three passages from a
standard English textbook, and a video clip was pro-
duced for each reading session.
It was now necessary to establish whether the
scores of the children using Dragon and Ellis for
5 months were signiªcantly different from those of
the children who had not. To do this, we decided to
80 Information Technologies and International Development
Research Reports / Improving English Pronunciation
use four new judges to remove any bias, because
the previous judges were already familiar with the
experimental group, having visited the school many
times. In this context, a judge has to be someone
whose English pronunciation is of an acceptable
standard. All judges chosen were such that their
own recognition rates in Dragon were 60% or
more. Incidentally, the best STT programs, including
Dragon, generally show a recognition rate of 85%
or less, even when used by the same person who
has trained it.
Four new judges were selected and presented
with a collection of 48 video clips: 16 from the ex-
perimental group’s readings of September 2002, 16
from the experimental groups readings from January
2003, and 16 from the control groups readings of
January 2003. The clips were presented to the
judges in random order to ensure that they had no
way of determining either the nature of the group
(experimental or control) or the time of the video re-
cording. Each reading was given a score from 1 to
10 (10 high) for the same subjective parameters as
used earlier by each judge. The average scores for
each group of 16 readings were calculated, the
readings were converted to ranks, and the Kendall’s
coefªcient of concordance was calculated for the
four new judges.
Table 1 shows the results for the Kendall’s coefª-
cient of concordance. This table reveals two interest-
ing factors. First, the human judges, although judg-
ing subjectively, were in very strong concordance,
with Wranging from 0.79 in October to 0.94 in
January, with p0.001 in every case. Second, the
addition of the ªfth judge, Dragon, did drop the
coefªcient of concordance, but it remained at a
signiªcant degree of concordance, ranging from
0.69 in November to 0.81 in January, with
p0.001 again.
Figure 2 then shows the average scores of the
four human judges for the entire group of 16 chil-
dren, compared with the average Dragon score for
the entire group, over the ªve-month period
(Dragon has no score for the ªrst month). Two inter-
esting factors are revealed again. First, the self-in-
structional process using the software dramatically
improved the results for reading: a 117% increase
for the human judges in terms of their subjective
judgments of pronunciation, and a 79% increase for
the ªfth judge, Dragon, in terms of percentage of
words recognized. Second, that the curves are
closely correlated suggests again that the method of
objectively measuring words recognized by Dragon
parallels the human method of subjectively judging
pronunciation clarity and accent. In Figure 2, it ap-
pears as though the human judge scores are consis-
tently higher than those of Dragon. This is not
important, as the two scores can be normalized to
the same scale. We have not done so because it
would then be visually difªcult to see the two sets
of scores, human and Dragon, separately.
In the control part of the experiment as described
earlier, the Kendall coefªcient for the four new
judges was found to be 0.71, again showing good
agreement among them. The average scores for the
experimental group’s readings in September 2002
and the control group’s readings in January 2003,
shown in Table 2, were found to be exactly equal at
30% each. The average score for the experimental
group’s readings in January 2003 were found to be
Volume 1, Number 1, Fall 2003 81
Mitra, Tooley, Inamdar, Dixon
Table 1. Kendall’s Coefªcient of Concordance (W)
Data for 16 Students
Reading 3 Passages W2df p W 2df p
September 0.83 49.5 15 0.001
October 0.79 31.7 10 0.001 0.70 34.8 10 0.001
November 0.80 44.7 14 0.001 0.69 48.2 14 0.001
December 0.84 43.8 13 0.001 0.73 47.2 13 0.001
January 0.94 52.5 14 0.001 0.81 56.8 14 0.001
Note: 2chi-square; df degrees of freedom, pthe probability of Wbeing nonsigniªcant; STT
signiªcantly higher at 72%. This is shown in Fig-
ure 3.
It may be noticed that STT (Dragon) scores were
not used in this part of the measurement. This is be-
cause the purpose of these measurements was to
compare the results of the experimental and control
groups alone. Therefore, correlating these with STT
results was not required in this part of the
The results are interesting for both of the hypothe-
ses. Regarding the ªrst, the high concordance values
shown in Table 1 seem to indicate unequivocally
that the method of using the objective measure of
percentage of words correctly recognized by Dragon
closely mimics the human process of subjectively
judging children’s pronunciation. This seems to be
an important result, with possibly signiªcant
implications for the automation of the judgment of
pronunciation. Another intriguing possibility is that
of automatically detecting speakers with a certain
The procedure followed with the control group
and four independent judges shows clearly that the
control group’s scores at the end of the experiment
in January match the experimental group’s scores at
the beginning of the experiment in September. In
other words, the control group children, with no ex-
posure to the experimental apparatus and proce-
dure, continued to have pronunciations similar to
those of the experimental group children before
they began their interaction with the computer and
software. The experimental group, on the other
hand, showed signiªcant improvement after the ex-
perimental period.
82 Information Technologies and International Development
Research Reports / Improving English Pronunciation
Figure 2. Improvements in pronunciation as measured by human judges and by a speech-to-text program.
Table 2. Independent Comparison of Experimental and Control Group Scores
Description of Video Clips Average Percent Scores from 4 Judges
Experimental Group, September 2002 30
Experimental Group, January 2003 72
Control Group, January 2003 30
Ideally, we should have scored the control group
at the beginning of the experiment as well. How-
ever, because this could not be done and because it
is very unlikely that the control group’s scores at be-
ginning of the experimental period could have been
higher than at the end of the period, we propose on
the basis of Figure 3 that the improvements in pro-
nunciation observed in the experimental group of
children were caused by the experimental proce-
dure. The fact that this is supported by an inde-
pendent, randomized judging process further
supports this contention.
It seems highly unlikely that such improvements
in pronunciation could have been effected by other
things in the school or slum environment. The
school principal corroborated that he had noticed
signiªcant improvements in the language ability of
these children but not others. Certainly, we can say
that when groups of children are given the appropri-
ate resources, they can improve their pronunciation
with minimal intervention from adults. Further work
is required to see to what extent their improvement
was based on which parts of the software packages
used (Ellis, Dragon, or, indeed, My Fair Lady!) and
the degree to which these improvements exceed
those that might be expected over time without
such interventions.
Yet another possibility emerges from these re-
sults. It appears that, using an STT system, children
can acquire any kind of accent, because the accent
acquired is similar to the one in which the system is
We would like to thank Mr. Wajid, principal of Peace
High School, Edi Bazaar, Hyderabad, India, for his
support for this experiment. Detailed comments
from one of the referees were invaluable and re-
sulted in a dramatic improvement in the paper.
Bhatnagar, G., S. Mehta, and
S. Mitra, eds. 2002. Intro-
duction to Multimedia Systems. San Diego,
Calif.: Academic Press.
Eastment, D. 1998. ELT and the New Technology:
The Next Five Years. Retrieved from
Ehsani, F. and E. Knodt. 1998. “Speech Technology
in Computer-Aided Language Learning: Strengths
and Limitations of a New CALL Paradigm.” Lan-
guage Learning & Technology 2:45–60.
Goodwin-Jones, B. 2000. “Emerging Technologies:
Speech Technologies for Language Learning.”
Language Learning & Technology 3:6–9.
Mitra, S. 1988. “A Computer-Assisted Learning
Strategy for Computer Literacy Programmes.” Pa-
per presented at the annual convention of the
Volume 1, Number 1, Fall 2003 83
Mitra, Tooley, Inamdar, Dixon
Figure 3. Scores of experimental and control groups in different months as
marked by four independent judges.
84 Information Technologies and International Development
Research Reports / Improving English Pronunciation
All-India Association for Educational Technology,
Goa, India.
Mitra, S. 2003. “Minimally Invasive Education: A
Progress Report on the ‘Hole-in-the-wall’ Experi-
ments.” British Journal of Educational Technology
Mitra, S. and R. Ghatak. 2002. “An apparatus for
measuring clarity of spoken English,” Indian pat-
ent application number 1159/DEL/2002, Novem-
ber 18.
Mitra, S. and V. Rana. 2001. “Children and the
Internet: Experiments with Minimally Invasive Ed-
ucation in India.” The British Journal of Educa-
tional Technology 32:221–232.
Siegel, S. 1956. Nonparametric Statistics for the Be-
havioural Sciences. New York: McGraw-Hill.
Stark, D. G. 1997. Hal’s Legacy: 2001’s Computer as
Dream and Reality. Cambridge, MA: MIT Press.
Tooley, J. and P. Dixon. Forthcoming. “Providing Edu-
cation for the World’s Poor: A Case Study of the
Private Sector in India,” in B. Davies and J. West-
Burnham, eds., Handbook of Educational Leader-
ship and Management. Victoria, Australia:
Pearson Longman.
... The reason why SOLE can be used to overcome English Language Teaching; (1) Students learn to use computers and internet themselves by anyone, anywhere, and in any language (S. D. ; (2) students can individually achieve educational goals related (Inamdar, 2007); (3) improving their pronunciation in English (S. T. Mitra, 2003) ; and (4) improve their school performance. ...
Full-text available
Improving education in the 4.0 era needs to be adjusted to the teaching and learning in higher education. Thus, it is necessary to have a learning strategy that prioritizes student independence in learning by using facilities in the digital era such as the internet. Therefore, SOLE can be one of the strategies that are in accordance with this 4.0 education. The purpose of this study was to find out the effectiveness of SOLE for apprentices in English as a Foreign Language (EFL) classrooms. In this study, we used quasi-experimental studies conducted through pre-test and post-test on study subjects as primary data for the study. The analysis showed a significant difference between students taught using SOLE and conventional methods in English proficiency. The Wilcoxon test results prove that the two-sided asymptotic significance of the Wilcoxon test is 0.000 for experimental class, and 0.000 for control class which the Asymptotic significance 2-tailed below Alpha (0.05). It means that SOLE can be more effective than lecturing.
... Al contrario, se molte parole vengono trascritte in modo errato, allora deve essere presente un problema di pronuncia. La ricerca ha mostrato risultati positivi nell'uso dell'ASR per la pratica della pronuncia indipendente con professionisti adulti (Hincks, 2003) e con i bambini (Mitra et al., 2003;Neri et al., 2008). Quando gli studenti notano le parole dettate non sono state trascritte come previsto, diventano consapevoli dei loro punti deboli e modificano la loro pronuncia fino a quando le loro parole non vengono trascritte in modo più accurato. ...
Full-text available
Un parlato intelligibile (cioè un parlato che può essere facilmente comprensibile da un interlocutore) è un obiettivo più realistico per gli studenti di una lingua straniera che un parlato privo di qualsiasi accento. Questo contributo revisiona le ricerche più recenti sulla percezione e produzione delle caratteristiche sia segmentali (es. suoni del linguaggio) che soprasegmentali (es. accento, ritmo, tono, intonazione) da parte di parlanti di una seconda lingua (L2) imparata in classe. Ricercatori e insegnanti hanno suggerito numerosi modi per applicare la tecnologia all’insegnamento della L2 e facilitare l’apprendimento della pronuncia. Tuttavia, molti insegnanti si sentono ancora insicuri sui metodi per insegnare la pronuncia e l’idea di usare computer, dispositivi mobili o altre tecnologie può sembrare doppiamente intimidatorio. Se guardiamo alla tecnologia concentrandoci sui compiti pedagogici e poi sulla scelta degli strumenti più efficaci di supporto per ognuno, possiamo ottenere i migliori risultati sia per gli insegnanti che per gli studenti.
... However, Burston (2014Burston ( , 2015 and Sung, Chang, and Yang (2015) claim there are no significant effects of mobile language learning in informal settings. On the other hand, other studies found a positive relationship between IDLE and L2 outcomes, such as in reading and listening (Sylvén & Sundqvist, 2012), speaking (Mitra, Tooley, Inamdar, & Dixon, 2003), writing, (Sun et al., 2017), vocabulary (Jensen, 2017;Sundqvist & Wikström, 2015;Sylvén & Sundqvist, 2012), and formal testing (Lai, Zhu, & Gong, 2015;Sundqvist & Wikström, 2015). ...
Full-text available
div> IDLE (Informal Digital Learning of English) is a worldwide phenomenon that represents one of the most significant advances in autonomous language learning outside the classroom in recent decades. This study examines the experiences of IDLE activities based on the ACTIONS model (Access, Cost, Teaching and learning, Interactivity and user-friendliness, Organizational issues, Novelty, and Speed) which focused on vocabulary. The results of the study are intended to be a self-reflection on the factors involved in creating English vocabulary quizzes on Instagram as IDLE sources for higher education students. The study aims to use social media, especially Instagram, as a learning tool in a digital context. The researcher uses written narratives that contain her experiences in creating such English vocabulary quizzes. For that reason, the study participant is the researcher herself, a 21-year-old female undergraduate student in the English Education Department. In the study, the researcher uses thematic analysis to analyse the narrative data. This includes three activities: 1) repeated reading of the data; 2) coding and categorising the data extracts; and 3) recognising the thematic headings. The results show that creating IDLE activities based on the ACTIONS model leads to flexibility of access, affordable costs, teaching and learning needs based on followers' feedback and correction, excellent interactivity and user-friendliness, no organisational issues, novelty, and speed. The study offers new insights into how English pre-service teachers' engagement with IDLE serves as a significant factor in their future teaching tasks. </div
... 221). He went on to conduct experiments on his 'hole in the wall' (HitW) that measured success by improvements on a computer icon recognition test score (Mitra, 2005), and later to publish studies suggesting children in remote Indian villages could achieve scores on-par with their peers in mainstream schools in subject areas ranging from molecular biology (Mitra & Dangwal, 2010), to intellectual maturity, mathematics, English (Dangwal, 2011) and English pronunciation (Mitra, Tooley, Inamdar, & Dixon, 2003). ...
Full-text available
Explains the theory of student voice and argues for its extended use in the language classroom. Presents a case study from Thailand.
... Gilakjani and Rahimy [11] have shown that using computers to teach pronunciation is more effective than using traditional methods. Mitra, et al. [28] conducted an experiment in which students would try to improve their English pronunciation using English learning software through some English movies and transfer software tools for voice to text. The experiment was carried out in Hyderabad, India, from September 2002 to January 2003. ...
Conference Paper
English pronunciation teaching and learning are essential in international integration and globalization. Speaking applications were born to support teaching and learning English pronunciation. Combined with the Task-Technology Fit theory, this study used an experimental method to evaluate three methods of English pronunciation teaching and learning for university students including, learning directly with teachers in class, self-study using the mobile app, and a combination of both of them. In addition, the study also built a questionnaire based on two characteristics mentioned in the TTF, namely task characteristics and technology characteristics, specifically, pronunciation English learning applications characteristics and English pronunciation learning characteristics. The research results pointed out that a combination of teaching in class and application adoption for pronunciation is the best method. Finally, this research has contributed to English pronunciation teaching methods in countries using English as a second language and proposed managerial implications for companies producing English-speaking applications on mobile devices.
... For example, Burston (2014Burston ( , 2015 and Sung, Chang, and Yang (2015) have concluded that mobile language learning in informal settings generates moderate results. Other studies have suggested more positive outcomes for language learners in such domains as pronunciation (Mitra et al., 2003), vocabulary (Jensen, 2017;Sundqvist & Wikstrom, 2015), reading and listening (Sylven & Sundqvist, 2012) and formal testing (Lai, Zhu, & Gong, 2020;Sundqvist & Wikstrom, 2015). Hall (2009) argues that informal learning comprises essential conditions for human development and is crucial for effective learning. ...
Full-text available
The internet has offered numerous opportunities for educational content delivery. The main current delivery models for learning a language online range from more formal structured approaches provided by schools and universities, which typically take place in a VLE (Virtual Learning Environment) or LMS (Learning Management System), to more informal unstructured approaches, including Virtual Worlds like Second Life and MMORPGs (Massively Multiplayer Online Role-Playing Games) like World of Warcraft. The purpose of this study is to analyse the experiences and perceptions of the online digital tools that provide engagement with the English language outside the classroom by the non-linguist students at a Ukrainian public university. The study is based on the quantitative and qualitative data collected employing an online questionnaire including Likert-type ratings, multiple-choice questions, and free-text responses to open questions. The questionnaire inquires about students’ experiences with 17 technologies not related to their classroom activities, how frequently they are used, how helpful the students find them for their language acquisition in general, and how useful they are considered for the development of particular language competencies (writing, reading, speaking, listening, pronunciation, grammar, communicative competence). The results of the survey attest to regular students’ engagement with the English language involving online technologies, which leads to implications for foreign language learners, teachers, and researchers of second language acquisition for incorporating online digital tools for foreign language acquisition beyond the classroom. Being aware of how students engage with technology outside the classroom may facilitate educators in increasing learners’ engagement with the foreign language, provide additional practice, and produce an emotional response, which increases retention of information.
Full-text available
This study is about the use of online peer-correction or peer feedback/review strategies which consist of learners giving and receiving online feedback about the mistakes made while speaking in English in class. Peer feedback provides some benefits, for example, making the learner scope of criticism more extensive, improving learners’ personal development as they become autonomous and mindful through participating in peer evaluation (Gupta, 2019).
Conference Paper
Full-text available
The learning crisis in higher education is an emerging concern attracting the attention of researchers globally. The paper explores effective pedagogy known as self-organized learning in informal settings among engineering undergraduates of the Indian Institute of Technology, Delhi. // The exploratory research design has been used to understand students' meaning of learning, examining the influence of peer interaction and studying the process of self-organized learning in informal settings among peers. Observation, Likert scale, and Semi-structured interviews have been used to collect the data from undergraduates. Interviews have been analyzed using qualitative conventional content analysis technique on the software Atlas. ti. // The finding shows students' meaning of learning does not confine to academic and professional learning but also includes skillsets for psychological well-being and social support. Additionally, the influence of peer interaction can be negative, which discourages students from being open and comfortable, and on other positive, which encourages students to learn. Self-organized peer learning shows that students' learning is not only highly engaged but surprisingly also matches with the experience of being in a state of mind known as flow. Findings may be used to invent pedagogies that are effective as well as joyful for higher education students to learn.
Full-text available
Recently, educators have faced significant changes in the classroom. Teachers are seen as mediators between students and educational environments, while students play an active role in their learning process. This study on the use of peer correction strategies consists of students giving and receiving feedback on the mistakes made when speaking English in class. These strategies can be implemented in the classroom to improve their autonomy, interaction and participation. In this qualitative study, classroom observations and a focus group were carried out. Eight adolescent students from an educational institute in Mexico participated. The results show aspects related to the way in which students apply these strategies in their learning process and how they feel about it. Most expressed feeling in a comfortable environment when applying peer correction strategies in the classroom. La corrección en pares como estrategia para mejorar producción oral de los adolescentes en el aprendizaje de lenguas extranjeras Resumen Recientemente, los educadores han enfrentado cambios significativos en el aula. Los docentes son vistos como mediadores entre los estudiantes y los entornos educativos, mientras que los alumnos desempeñan un papel activo en su proceso de aprendizaje. Este estudio explora el uso de estrategias de corrección por pares, consistentes en que los estudiantes den y reciban retroalimentación sobre los errores cometidos al hablar inglés en clase. Estas estrategias se puede implementar en el aula para mejorar su autonomía, interacción y participación. En este estudio cualitativo se llevaron a cabo observaciones en el aula y un grupo focal. Participaron ocho estudiantes adolescentes de un instituto educativo en México. Los resultados muestran aspectos relacionados con la forma en que los alumnos aplican estas estrategias en su proceso de aprendizaje y cómo se sienten al respecto. La mayoría expresó sentirse en un ambiente cómodo al aplicar estrategias de corrección entre pares en el aula.
Full-text available
: We investigate the suitability of deploying speech technology in computer-based systems that can be used to teach foreign language skills. In reviewing the current state of speech recognition and speech processing technology and by examining a number of voice-interactive CALL applications, we suggest how to create robust interactive learning environments that exploit the strengths of speech technology while working around its limitations. In conclusion, we will draw on our review of these applications to identify directions of future research that might improve both the design and the overall performance of voice-interactive CALL systems. 1.0 Introduction During the past two decades, the exercise of spoken language skills has received increasing attention among educators. Foreign language curricula focus on productive skills with special emphasis on communicative competence. Students' ability to engage in meaningful conversational interaction in the target language is considered an ...
Provides a description of the following speech technologies for language learning: recorded speech (from analog to digital); speech recognition; speech synthesis; multilingual speech to speech; speech on the Web. A resource list is also provided. (Author/VWL)
Experiments first conducted in 1999 revealed that children are able to learn to use computers and the Internet on their own, irrespective of their social, cultural or economic backgrounds. These experiments were labelled by the press as "hole-in-the-wall" because the experimental arrangement consisted of computers built into openings in brick walls in public spaces. This paper describes the work done subsequent to these initial experiments, the results obtained and some, possible, conclusions.
Urban children all over the world seem to acquire computing skills without adult intervention. Indeed this form of self-instruction has produced hackers?‘children who can penetrate high tech security systems. Is this kind of learning dependent only on the availability of technology? We provided slum children in New Delhi with Internet access in their settlement. The paper describes the results obtained in the first month of unsupervised and unguided access. It is observed that children seem to understand and use the technology fluently. Language and formal education do not seem to make any significant difference.
An apparatus for measuring clarity of spoken English Indian patent application number 1159
  • S Mitra
  • R Ghatak
Mitra, S. and R. Ghatak. 2002. " An apparatus for measuring clarity of spoken English, " Indian patent application number 1159/DEL/2002, November 18.
Dixon Figure 3. Scores of experimental and control groups in different months as marked by four independent judges. Information Technologies and International Development Research Reports / Improving English Pronunciation All-India Association for Educational Technology
  • Mitra
  • Inamdar Tooley
Mitra, Tooley, Inamdar, Dixon Figure 3. Scores of experimental and control groups in different months as marked by four independent judges. Information Technologies and International Development Research Reports / Improving English Pronunciation All-India Association for Educational Technology, Goa, India.
Hal's Legacy: 2001's Computer as Dream and Reality
  • D G Stark
Stark, D. G. 1997. Hal's Legacy: 2001's Computer as Dream and Reality. Cambridge, MA: MIT Press.
Providing Education for the World's Poor: A Case Study of the Private Sector in India
  • J Tooley
  • P Dixon
  • Forthcoming
Tooley, J. and P. Dixon. Forthcoming. " Providing Education for the World's Poor: A Case Study of the Private Sector in India, " in B. Davies and J. West- Burnham, eds., Handbook of Educational Leadership and Management. Victoria, Australia: Pearson Longman.
ELT and the New Technology: The Next Five Years
  • D Eastment
Eastment, D. 1998. ELT and the New Technology: The Next Five Years. Retrieved from