Jump to content

Recommended Posts

Posted

I got a PM this morning asking for advice on translating a Thai e-mail. Unfortunately, I can't locate the question, so I'm broadcasting my twopennyworth here.

My thought is that if you can't understand the Thai, try to produce an interlinear translation, i.e.:

[*] Split the text into words (or best guesses thereof)

[*] Write the core meanings under each word

[*] Optionally combine the meanings into intellible enough English.

The most difficult step is the first one. As Thai is primarily a monosyllabic language, division into syllables will generally be good enough. Syllable beginnings can often be identified by the preposed vowels - เ แ โ ไ ใ. Syllable ends can be identified by (though when not in combination with another vowel, it is far from rare within a word) and thanthakhat (but beware thanthakhat in loanwords, where it can occur in the middle of a syllable).

There is a web-based program that will do the first two steps, Thai-to-English Translation at Thai-language.com. I gave it test text:

ไม่รู้ว่าจะทำอะไร มาลีก็รักริชาร์ดมาก ๆ

'Don't know what to do. Mali loves Richard very much.'

ริชาร์ด defeated the program - depending on the text around, it got split before or after the ร์. With a different version of the text, มาลี got split into two words. One must bear this sort of problem in mind when one gets unintelligible breakdowns. Conveniently, if you give Thai-language.com the initial part of a word, it will provide possible completions if it is not a whole word. In this particular case, note that both the names are in the Thai-language.com dictionary. (The word-breaker, however, is probably using a different and much smaller dictionary.)

One thing a simple program (possibly even a sophisticated program) will find hard to deal with is misspellings, e.g. confusing กลับ and กับ.

Posted (edited)

Hi Richard,

Interesting post.

There is a web-based program that will do the first two steps, Thai-to-English Translation at Thai-language.com. I gave it test text:

ไม่รู้ว่าจะทำอะไร มาลีก็รักริชาร์ดมาก ๆ

'Don't know what to do. Mali loves Richard very much.'

ริชาร์ด defeated the program - depending on the text around, it got split before or after the ร์. With a different version of the text, มาลี got split into two words. One must bear this sort of problem in mind when one gets unintelligible breakdowns. Conveniently, if you give Thai-language.com the initial part of a word, it will provide possible completions if it is not a whole word. In this particular case, note that both the names are in the Thai-language.com dictionary. (The word-breaker, however, is probably using a different and much smaller dictionary.)

I think thai-language.com uses a word splitter built into Windows rather than their own so perhaps that's why it doesn't recognise ริชาร์ด as a word. I tried to write my own way of doing it for thai2english.com, and it's far from straightforward! ไม่รู้ว่าจะทำอะไร is easy enough, as words can be broken up by a dictionary lookup. However it's more tricky when one or more of the words (such as ริชาร์ด) don't appear in the dictionary. How it works on my site at present is :

มาลีก็รักริชาร์ดมาก => broken to มาลี ก็ รั กริช าร์ด มาก (กริช being in the dictionary) => checked, and realised รั and ร์ด are impossible and puts the 'word' back together as รักริชาร์ด, sentence is now มาลี ก็ รักริชาร์ด มาก => checks รักริชาร์ด for all possible word combinations in the dictionary but fails (as ริชาร์ด isn't there) and leaves it all as one word. Adding ริชาร์ด to the dictionary would solve all problems here, but it's not that practical to add every personal name and place name etc to the dictionary. I'm trying to get some algorithm working which could automatically work out which words are likely to be place/personal names and split words on that if dictionary look up fails, but it's not quite good enough yet.

As Thai is primarily a monosyllabic language, division into syllables will generally be good enough. Syllable beginnings can often be identified by the preposed vowels - เ แ โ ไ ใ. Syllable ends can be identified by ะ (though when not in combination with another vowel, it is far from rare within a word) and thanthakhat (but beware thanthakhat in loanwords, where it can occur in the middle of a syllable).

It's harder than you might think. To get it 80% right is not too difficult but the last 20% is a lot more challenging . Sure, there are a few clues with เ แ โ ไ ใ and ะ, but there's lots of complicating factors too. อ , ว, ห all present challenges, plus there are many spellings inconsistent with pronuncation, silent letters that aren't necesarily indictated and can be inconsistent too. You can see it wouldn't be easy to automatically divide into syllables when you've ปกติ vs ชาติ , กวี vs กว่า , ภูมิพล vs ภูมิใจ etc .... Once that's all right it's my eventual goal to have it translate automatically, how long that'll be though I don't know. If you've got any suggestions they'd be welcome :o Is Glenn still actively work on improving the translator on thai-language.com by the way ?

Incidentally has anyone ever tried http://www.thai2me.com/ ? They claim to do both English > Thai and Thai > English translation, but I haven't tried it out yet.

Edited by mike_l
Posted
.... I tried to write my own way of doing it for thai2english.com....

Hi mike_l, welcome to the forum. I use the Thai2English site a lot and think it's one of the best.

Regarding the email phrase above, if you get your site to display the separate words, you get this:

ไม่รู้ ว่า จะ ทำ อะไร มาลี ก็ รักริชาร์ด มาก ๆ

mâi róo wâa jà tam a-rai maa-lee gôr rák-rí-châat mâak mâak

But you can guess that rak-ri-chaat is two words. So you can put a space after rak รัก , copy the Thai phrase back into the submit box, and re-submit:

ไม่รู้ ว่า จะ ทำ อะไร มาลี ก็ รัก ริ ชาร์ด มาก ๆ

mâi róo wâa jà tam a-rai maa-lee gôr rák rí châat mâak - and that's not bad!

Anyway, great site - t2e - and hope you can join in the discussions here :o

Posted
I think thai-language.com uses a word splitter built into Windows rather than their own.

That's what it says, but documentation can easily fall behind the facts.

As Thai is primarily a monosyllabic language, division into syllables will generally be good enough. Syllable beginnings can often be identified by the preposed vowels - เ แ โ ไ ใ. Syllable ends can be identified by ะ (though when not in combination with another vowel, it is far from rare within a word) and thanthakhat (but beware thanthakhat in loanwords, where it can occur in the middle of a syllable).

It's harder than you might think. To get it 80% right is not too difficult but the last 20% is a lot more challenging . Sure, there are a few clues with เ แ โ ไ ใ and ะ, but there's lots of complicating factors too. อ , ว, ห all present challenges, plus there are many spellings inconsistent with pronuncation, silent letters that aren't necesarily indictated and can be inconsistent too. You can see it wouldn't be easy to automatically divide into syllables when you've ปกติ vs ชาติ , กวี vs กว่า , ภูมิพล vs ภูมิใจ etc .... Once that's all right it's my eventual goal to have it translate automatically, how long that'll be though I don't know. If you've got any suggestions they'd be welcome :o Is Glenn still actively work on improving the translator on thai-language.com by the way ?

I don't doubt that the last 20% is difficult. What's the current state of the art? I have a nasty feeling that 95% has been considered good, and that that was achieved by 'training' the transcriber rather than the principled application of rules. What's the state of the art nowadays?

Glenn's been playing with Thai to English translator. He's also put a couple of new rules into the transliterator, but nothing that I'm aware of to stop its overenthusiasm for closed syllables.

When do you expect to attack your problem with closed dead syllables? They've a tendency to appear without tone marks, which is wrong. Have you addressed the difference between the long and short low back round vowel ([ɔ], transcribed 'aw' or 'or' here) in closed live syllables? Most dictionaries and none of your supported transliteration schemes seem not to address this distinction.

Posted

Richard W, Thanks for the advice on translating Thai script. I tried the web-based program that you suggested, but could not get it to work for me:

http://www.thai-language.com/translate/

I ended up using the services of http://www.thai-english-translation.com/index.htm

They were extremely helpful and very prompt and the quality of the translation appears to be very high. They charge $US0.05 per English word for translation from Thai to English or vice versa.

Next time I might try: http://www.thai2me.com/ as suggested by Mike_I.

The assistance from you both is greatly appreciated. The rest of this discussion is way over my head, so I will leave you to it!

Posted

Richard,

This one is hard to use a Thai-English transaltion website for it is hard to get exact meaning.

ไม่(don't) รู้ว่า(know) จะทำ(to do) อะไร(what)

มาลี(Malee) ก็(still) รัก(loves) ริชาร์ด(Richard) มากๆ(very much)

Putting them together, it should be:

Although I(Malee) don't know what to do (about it/something), I still love you(Richard) very much.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...