I've just been having a look at the site and trying to decide whether it has real potential for helping EFL ESL students with their listening, reading and pronunciation.
As an experiment I decided to select quite a challenging text and see what the site could do. I also decide to select a British English accent, as in the past I know that TTS systems had struggled more with UK accents than US ones, due to the wider range of sounds in UK English.
Anyway, here are the results. The text is from Wikipedia.org at: http://en.wikipedia.org/wiki/Text_to_speech and is about the challenges of text normalisation in TTS.
- Click here to watch Elizabeth read the text to you.
Or
- Listen using this media player
This is the actual text you should be hearing:
"Text normalization challenges
The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation. There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".
Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.
Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; when a year or perhaps a part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a social security number, as "one three two five". A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.
Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs. "
The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation. There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".
Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.
Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; when a year or perhaps a part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a social security number, as "one three two five". A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.
Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs. "
What I like about the site
- The site is free though you do have to register.
- The site creates a number of options once it has converted the text to speech. This includes creating an Mp3 file to download, creating an embed code to embed the audio into a blog or website, or download to i-pod.
- They have quite a selection of avatars and voices
- The site can convert text from a number of sources including Word, PDF, a website (just type in the URL) or even an RSS feed!
- You can make the texts private or public
- There doesn't seem to be a limit on many you can create
- I found it hard to get a link to the avatar reading the text. It would have been nice to be able to embed her into my blog, but I just couldn't get that to work.
- Processing the text can take a while.
So, if you've listened to the text, please do send in a comment and let me know what you think about the useability of a tool like this with EFL ESL students.
Related lnks:
Activities for students:
Best
Nik Peachey
7 comments:
Hi Nik,
I have an article in with the TES on this very site (out in Autumn I think) so was interested to read this. I've suggested some applications for teachers including keeping on top of (boring and long?) documentation by listening rather than reading, and creating revision documents for students. Will be interested to see ideas others post. Thanks for your really helpful blog.
Hi Gail
Thanks for the contribution. I'm working on a few more ideas myself too and will hope to post some time next week. Will your TES article be available online? Would be really interested to read it. Glad you like the blog (hope it doesn't fall into that boring and long category you mention!)
best
Nik
I actually tried it out and it is good. The only thing is that the voices do sound computerized. Other than that, I think that this could be used to increase comprehension and communication for our students.
Helene Cruz
Guam
Hi helen
I agree. I'm sure it has uses, though I don't think you could fool someone that it was a real person. I spent a lot of time looking at text to speach afew years back and it's amazing how much better this one is than tey were just 2 - 3 years ago. TTS certainly has a future I think.
Best
Nik
http://www.ispeech.org/convert.text.php
This one is free (totally, since you don't even have to register) and the voice is quite natural sounding.
The danger of TTS for foreign language study, is that since the voices are sounding more and more like a native speaker, students may assume that they are listening to a native speaker, thus learn wrong English.
I've heard "podcasts" done with these by non-native speakers that are full of grammatical errors.
Also, though more often than not the intonation and rhythm of the speech is natural, there are times when this is not so. If a student does "listen and repeat" practice after TTS, he/she is likely to be mis-learning something.
Students may benefit for TTS, but they should be taught (warned) about potential problems, too.
Hi Anonymous,
I agree. I think we need to make students aware that these aren't the voice of 'real' people, and they are different, but I think TTS is going to become part of our everyday communication experience and already some companies are outsourcing telephone services to TTS, so students are going to need to understand TTS and be able to identify it. iSpeech looks like a handy tool.
Best
Nik
This is the first time I have seen this program and I'm currently working with ESL students so this would be nice to try.
Post a Comment