Jump to content

Recommended Posts

Posted
3.  Learn how to use the Thai dictionary...alphabetical order in Thai dictionaries is a bit more difficult than how its done in English....

Just a little snippet from another thread...

I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe.

Thanks,

Bryan

Posted

3.  Learn how to use the Thai dictionary...alphabetical order in Thai dictionaries is a bit more difficult than how its done in English....

Just a little snippet from another thread...

I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe.

Thanks,

Bryan

Dear Bryan,

I am not sure, but I am guessing that the order is based on the unicode numbers and I venture to guess, based on your (interesting) post, that the Thai vowels are sorted (in MS EXCEL) in unicode as they are written, with the consonant first and the vowels second.

One techincal problem is that the sort routine should be, to the effect, check the first character, if it is a vowel, check the next character, and sort on the first consonant, if if the consonant is หอ หีบ, then ..... if and then... and then .....

Good post, BTW. Technically interesting. I also would be interested to review your idea of a good sort algorithm. Would you like to virtually develop one (or two) sort algorithms here, in collaboration, to (perhaps) submit to Microsoft?

Yours sincerely, Mr. Farang

Posted

3.  Learn how to use the Thai dictionary...alphabetical order in Thai dictionaries is a bit more difficult than how its done in English....

Actually, Thai alphabetic order is easier to remember than English alphabetic alphabet. The only complications are (1) remember to swap the preposed vowels with the following consonant and (2) compare the tone and similar marks only if you get a tie on the consonants and basic vowels - left to right, not right to left as in French.

Just a little snippet from another thread...

I have been making myself Thai wordlists for self study in Excel.  When I sort them it puts all the แ  เ  ไ  and at the head of the list, rather than sorting according to the consonants as in a dictionary.  The alphabetical sort is just not smart enough (or maybe its the operator).  Has anyone else had this problem and solved it?  I have an early version of Office XP on WinMe.

Thanks,

Bryan

Dear Bryan,

I am not sure, but I am guessing that the order is based on the unicode numbers and I venture to guess, based on your (interesting) post, that the Thai vowels are sorted (in MS EXCEL) in unicode as they are written, with the consonant first and the vowels second.

Bizarrely enough, that does not explain it. The consonants precede the vowels in numerical code order, both in TIS-620 and in Unicode, which for Thai is just a shift of the TIS-620 encoding. Does Excel know the data is Thai, or does it just appear as Thai because of the font you have selected? Are the consonants sorted properly? It's conceivable that Excel thinks you have Latin-1 data.

One techincal problem is that the sort routine should be, to the effect, check the first character, if it is a vowel, check the next character, and sort on the first consonant, if if the consonant is หอ หีบ, then .....    if and then... and then .....

Good post, BTW.  Technically interesting.  I also would be interested to review your idea of a good sort algorithm.  Would you like to  virtually develop one (or two) sort algorithms here, in collaboration,  to (perhaps) submit to Microsoft?

Yours sincerely, Mr. Farang

The Unicode Consortium has already implicitly proposed a sort algorithm - the Unicode Collation Algorithm. Microsoft's response has been along the lines of 'Not invented here'.

Posted
Does Excel know the data is Thai, or does it just appear as Thai because of the font you have selected?  Are the consonants sorted properly?  It's conceivable that Excel thinks you have Latin-1 data.

Dear Richard,

Those are good questions (above).

Dear Bryan,

Can you post an example of how Excel sorts Thai now (screen shot maybe)? Thanks.

The Unicode Consortium has already implicitly proposed a sort algorithm - the Unicode Collation Algorithm.  Microsoft's response has been along the lines of 'Not invented here'.

Dear Richard,

I searched the Unicode link above and could not find any reference to an algorithm to sort Thai (also did not find the word "Thai" in the document). The algorithm I was thinking about would be more specific to Thai, but implemented based on unicode. Unicode would (or could) be the basis, but the actual sort algorithm would be based on Thai grammer. In fact, there might be a different ways to sort, based on different "views" of the grammer.

Another area to explore, but I have not thought about it, are user macros v. Excel built-in sorts.

Yours sincerely, Mr. Farang

Posted

Good, well informed answers. I will pore over them for awhile. It does appear that the sort routine sorts on the first character, then the second and so on; putting the vowels first for some reason.. This results in several errors in the following list. I have been fixing these manually, but if it is a large list, say 1000 words, that could be time consuming.

เครื่องหมาย

เวลา

แท้

ใจความ

กริยานุเคราะห์

กำหนด

ขยาย

ข้าราชการ

ครบบริบูรณ์

ดังนี้

ตัวอย่าง

ทั่วไป

ประโยค

ประกอบ

ประธาน

ระเบียบ

ระดับ

ราชาศัพท์

วิกรรตถกริยา

สมบูรณ์

สมุหนาม

สุภาษิต

I will look into this, and would be happy to share anything I come up with or work with someone, though I am not any kind of computer programmer. Maybe there would be some kind of macro that could be written.

Thank you all,

Bryan

Posted
The Unicode Consortium has already implicitly proposed a sort algorithm - the Unicode Collation Algorithm.  Microsoft's response has been along the lines of 'Not invented here'.

I searched the Unicode link above and could not find any reference to an algorithm to sort Thai (also did not find the word "Thai" in the document).

Section 3.1.3 Line 1 Word 12.

The Default Unicode Collation Element Table (DUCET) (1.1 Mbyte) contains the data (numerical 'weights') defining how Thai is sorted, both in comparison with Thai and in comparison with other scripts.

The algorithm I was thinking about  would be more specific to Thai, but implemented based on unicode.  Unicode would (or could) be the basis, but the actual sort algorithm would be based on Thai grammer.    In fact, there might be a different ways to sort, based on different "views" of the grammer.   

As far as words are concerned, the default Unicode collation order for non-compound words seems to accord with Thai dictionaries unless the words have the same spelling. Dictionaries list compound words under one of their elements, so the oder naturally differs, as with English dictionaries. The Unicode collation order is helpless when words are spelt the same but pronounced differently, and indeed dictionaries differ on the ordering of แหน and แหน.

The Unicode collation order is supposed to produce a reasonable global sort order for use in all languages and to sort text in mixtures of scripts. Imagine a directory listing with file names in a mixture of scripts! One may in principle customise the sort according to one's language (or even idiosyncratic preferences), but there is no reason to sort words in the Thai script in anything but the order appropriate for the Thai language. Other languages in the Thai script (even Pali) are not important enough to override the Thai rules.

The Thai deviations from this global order are documented in file th.xml in the core localisation data (1.0 Mbytes) from Unicode - unfortunately I can't find the file loose. The only deviation for real words that I can see is in the placement of yamakkan. I may be mistaken - what difference can it make to sorting whether ฤๅ is treated as one letter or two?

I deliberately made no claims about the sorting of Thai punctuation and symbols - global consistency would override any expectation of seeing Thai sorted in Thai order.

I concede that it is surprising (but generally convenient) that Thai dictionaries ignore consonant clustering in their sort order.

Posted
...I concede that it is surprising (but generally convenient) that Thai dictionaries ignore consonant clustering in their sort order.

Sorry Richard, but can you explain what you mean by this, perhaps with a couple of examples?

Posted

Dear Bryan,

I kindly suggest that one of the next steps in your analysis is to map the Unicode encoding for the Thai language to your list, to see if a pattern emerges.

Yours sincerely,

Mr. Farang

Good, well informed answers.  I will pore over them for awhile.  It does appear that the sort routine sorts on the first character, then the second and so on; putting the vowels first for some reason..  This results in several errors in the following list.  I have been fixing these manually, but if it is a large list, say 1000 words, that could be time consuming.

เครื่องหมาย

เวลา

แท้

ใจความ

กริยานุเคราะห์

กำหนด

ขยาย

ข้าราชการ

ครบบริบูรณ์

ดังนี้

ตัวอย่าง

ทั่วไป

ประโยค

ประกอบ

ประธาน

ระเบียบ

ระดับ

ราชาศัพท์

วิกรรตถกริยา

สมบูรณ์

สมุหนาม

สุภาษิต

I will look into this, and would be happy to share anything I come up with or work with someone, though I am not any kind of computer programmer.  Maybe there would be some kind of macro that could be written.

Thank you all,

Bryan

Posted

Cheap and nasty way to sort only on Thai Consts in column (in this case column A). Knocked it up in 30 minutes, so its not perfect, and only sorts for a single column which must be A - can be changed to accomodate multi columns and/or prompt for the column etc. It still uses Excel's sort (which is much faster than a bubble or ripple sort written in VBA!), but it ignores the vowels. Anyway, cut and past this into a VBA Macro (XL 98+ - though I used 2002, so may need modifing for earlier versions):

Option Explicit

Sub ThaiSort_Click()
 Dim lngLoop As Long, lngStartRow As Long
 Dim strWord1 As String
 Dim blnHDR As Boolean
 Dim intWC1 As Integer, intWC2 As Integer
 
 If MsgBox("Header Row?", vbYesNo + vbQuestion, "Thai Sort") = vbYes Then
   lngStartRow = 2
   blnHDR = xlYes And 1
 Else
   lngStartRow = 1
   blnHDR = xlNo And 1
 End If

 'Create a new sheet temporarily
 Dim tempWS As Worksheet
 Dim tempWSName As String, curWSName As String
 Dim intMissCount As Integer
   
 curWSName = ActiveSheet.Name
 tempWSName = "Wolf5370" & Left(Time, 2) & Mid(Time, 4, 2) & Right(Time, 2)
 Set tempWS = Worksheets.Add
 tempWS.Name = tempWSName
 
 'Copy over the column to be sorted (Col A) using C&P
 Sheets(curWSName).Select
 Columns("A:A").Select
 Selection.Copy
 Sheets(tempWSName).Select
 Columns("A:A").Select
 ActiveSheet.Paste
 
 'Remove all vowels and other crap and dump into column B on the temp sheet
 intMissCount = 0
 For lngLoop = lngStartRow To Rows.Count
   If Range("A" & lngLoop).Value = "" Then
     intMissCount = intMissCount + 1
   Else
     intMissCount = 0
   End If
   
   If intMissCount > 100 Then Exit For
   
   Range("B" & lngLoop).Value = ConstsOnly(Range("A" & lngLoop).Value)
 Next lngLoop
 
 'Now sort it by the de-vowelled column
 Columns("A:B").Select
 Selection.Sort Key1:=Range("B" & lngStartRow), Order1:=xlAscending, Key2:=Range("A" & lngStartRow) _
       , Order2:=xlAscending, Header:=blnHDR, OrderCustom:=1, MatchCase:=False _
       , Orientation:=xlTopToBottom, DataOption1:=xlSortNormal, DataOption2:= _
       xlSortNormal
       
       
 'Now copy it back
 Columns("A:A").Select
 Selection.Copy
 Sheets(curWSName).Select
 Columns("A:A").Select
 ActiveSheet.Paste
 
 'Now kill the temp sheet
 Application.DisplayAlerts = False
 Sheets(tempWSName).Delete
 Application.DisplayAlerts = True
End Sub

Function ConstsOnly(ByVal strWord As String) As String
 'Routine to remove Thai Vowels

 Dim intInnerLoop As Integer
 Dim strTemp As String, strChar As String
 Dim lngUniCode As Long
 
 If Len(strWord) = 0 Then
   ConstsOnly = ""
   Exit Function
 End If
 
 strTemp = ""
 For intInnerLoop = 1 To Len(strWord)
   strChar = Mid(strWord, intInnerLoop, 1)
   lngUniCode = AscW(strChar)
   
   Select Case lngUniCode
     Case 161 To 206: strTemp = strTemp & strChar 'First set of UniCode Thai Consts.
     Case 3585 To 3630: strTemp = strTemp & strChar 'Second set of UniCode Thai Consts.
     Case 3663 To 3673: strTemp = strTemp & strChar ' Thai numbers in UniCode
     Case 63247: strTemp = strTemp & strChar 'Another Thai Const
     Case 63247: strTemp = strTemp & strChar 'Another Thai Const
     Case 63232: strTemp = strTemp & strChar 'Another Thai Const
   End Select
   
 Next intInnerLoop
   
 ConstsOnly = strTemp
End Function

Posted
I have been making myself Thai wordlists for self study in Excel.  When I sort them it puts all the แ  เ  ไ  and at the head of the list, rather than sorting according to the consonants as in a dictionary.  The alphabetical sort is just not smart enough (or maybe its the operator).  Has anyone else had this problem and solved it?  I have an early version of Office XP on WinMe.

Thanks,

Bryan

Hi,

Are you using a Thai or English version of Windows ME? I found that some programs eg Pirch would only sort Thai in alphabetical order under Thai Windows.

Secondly, I seem to remember that getting Office XP to display Thai properly on English Windows ME was not straightforward and involved hacking the registry and installing extra files etc.

Posted
Dear Bryan,

I kindly suggest that one of the next steps in your analysis is to map the Unicode encoding for the Thai language to your list, to see if a pattern emerges.

Yours sincerely,

Mr. Farang

I agree that would be a good place to start. I will do some googling to figure out how to do it. So far all I have found is an Excel function "code()", which returns someting like an ascii number for each character, but its always the same, 63, for all the Thai chars. I will need to find a way to extract the real code number. As I recall from my little bit of programming experience in fortran (or was it pascal) in college, there was some "string" functions which would get it. Maybe then there would be a way to sort or do an customized sort on those numbers.

As an aside, I also used a program which typed Thai in old ascii font, and I sorted that. Unlike unicode, the consonants came before the vowels, and I could get the ascii code for all the characters.

Thanks, Bryan

Posted
Cheap and nasty way to sort only on Thai Consts in column (in this case column A). Knocked it up in 30 minutes, so its not perfect, and only sorts for a single column which must be A - can be changed to accomodate multi columns and/or prompt for the column etc. It still uses Excel's sort (which is much faster than a bubble or ripple sort written in VBA!), but it ignores the vowels. Anyway, cut and past this into a VBA Macro (XL 98+ - though I used 2002, so may need modifing for earlier versions):

Dear Khun Wolf,

Quite nice and very instructive as well. Thank you for this contribution!

Yours sincerely, Mr. Farang

Posted
Hi,

Are you using a Thai or English version of Windows ME? I found that some programs eg Pirch would only sort Thai in alphabetical order under Thai Windows.

Secondly, I seem to remember that getting Office XP to display Thai properly on English Windows ME was not straightforward and involved hacking the registry and installing extra files etc.

I am using English WinMe and English OfficeXP. It has been generally well behaved and was easy to set up, but missing a few functions, for example, it won't read Thai filenames, and this sorting problem.

Cheap and nasty way to sort only on Thai Consts in column (in this case column A). Knocked it up in 30 minutes, so its not perfect, and only sorts for a single column which must be A - can be changed to accomodate multi columns and/or prompt for the column etc. It still uses Excel's sort (which is much faster than a bubble or ripple sort written in VBA!), but it ignores the vowels. Anyway, cut and past this into a VBA Macro (XL 98+ - though I used 2002, so may need modifing for earlier versions):

This will be quite a learning experience for me, as I try using this. It probably answers the questions raised in my previous post from a few minutes ago. Thank you for spending the time and expertise in writing the code.

And I thank everyone else for all the other good answers.

Bryan

Posted
So far all I have found is an Excel function "code()", which returns someting like an ascii number for each character, but its always the same, 63, for all the Thai chars

Yeah, its better addressing the Thai Characters by their Unicode number. The way to do this is using AscW (W stands for Wide I think - ASC for American Standard Code, as in ASCII). You can go the other way using ChrW with the Unicode value.

The Character Map (Start->Programs->Accessories->Character Map) will show the Ascii and Unicode characters for any font set - select a Thai font and clicking any character will give its values. :o

Posted
I have been making myself Thai wordlists for self study in Excel.  When I sort them it puts all the แ  เ  ไ  and at the head of the list, rather than sorting according to the consonants as in a dictionary.  The alphabetical sort is just not smart enough (or maybe its the operator).  Has anyone else had this problem and solved it?  I have an early version of Office XP on WinMe.

Well, now you've asked, I now have the 'problem' myself. I'm using Excel 2002 (Thai edition? - the menus are all in Thai) under Windows XP Home Edition SP2. I write 'problem' because I use Word for tables of text, and Word 2002 sorts the posted list corectly. I'm not sure if Windows XP installations have a language - if it does, mine's is English. The only thing that could conceivably make a difference is that the default codepage is Windows-874 (i.e. Thai). I had to switch it back because otherwise the command interpreter would not handle Thai filenames.

So, unless you are using Excel as a spreadsheet (or chess-playing program, or whatever), one possible solution is to copy your table to Word, sort it there, and copy back to Excel.

Posted
Yeah, its better addressing the Thai Characters by their Unicode number. The way to do this is using AscW (W stands for Wide I think - ASC for American Standard Code, as in ASCII). You can go the other way using ChrW with the Unicode value.

The Character Map (Start->Programs->Accessories->Character Map) will show the Ascii and Unicode characters for any font set - select a Thai font and clicking any character will give its values.  :o

For starters, I kindly suggest someone create an Excel spreadsheet with the Thai alphabet in one column and the Unicode numeric identifers in the next column and post the XLS to this thread, we can use this file for "configuration control".....so we can "sing" off the same "sheet of music".... so to speak.

If we can't post an XLS in TV, I suggest that we rename the file FILENAME.XLS.PDF then someone can upload and we can download and change the name back to FILENAME.XLS :-_

Anyone have time to do this? Volunteers?

Posted
Yeah, its better addressing the Thai Characters by their Unicode number. The way to do this is using AscW (W stands for Wide I think - ASC for American Standard Code, as in ASCII). You can go the other way using ChrW with the Unicode value.

The Character Map (Start->Programs->Accessories->Character Map) will show the Ascii and Unicode characters for any font set - select a Thai font and clicking any character will give its values.  :o

And Unicode publishes charts by script, partly broken up by code range.

There are some curious gaps in what the Character Map will show. It hasn't yet been updated to display Tamil letter sha (U+0BB6) (the equivalent of ), which was admitted this year. I'm not sure if any versions of the Windows renderer (Uniscribe) yet treat it as a Tamil letter.

Posted
For starters, I kindly suggest someone create an Excel spreadsheet with the Thai alphabet in one column and the Unicode numeric identifers in the next column and post the XLS  to this thread, we can use this file for "configuration control".....so we can "sing" off the same "sheet of music".... so to speak.

If we  can't post an XLS in TV, I suggest that we  rename the file  FILENAME.XLS.PDF  then  someone can upload and we can download and change the name back to FILENAME.XLS  :-_

Why? The Thai code chart is readily available from Unicode, and the basic data for Thai Unicode characters (from UnicodeData.txt) follows:

0E01;THAI CHARACTER KO KAI;Lo;0;L;;;;;N;THAI LETTER KO KAI;;;;

0E02;THAI CHARACTER KHO KHAI;Lo;0;L;;;;;N;THAI LETTER KHO KHAI;;;;

0E03;THAI CHARACTER KHO KHUAT;Lo;0;L;;;;;N;THAI LETTER KHO KHUAT;;;;

0E04;THAI CHARACTER KHO KHWAI;Lo;0;L;;;;;N;THAI LETTER KHO KHWAI;;;;

0E05;THAI CHARACTER KHO KHON;Lo;0;L;;;;;N;THAI LETTER KHO KHON;;;;

0E06;THAI CHARACTER KHO RAKHANG;Lo;0;L;;;;;N;THAI LETTER KHO RAKHANG;;;;

0E07;THAI CHARACTER NGO NGU;Lo;0;L;;;;;N;THAI LETTER NGO NGU;;;;

0E08;THAI CHARACTER CHO CHAN;Lo;0;L;;;;;N;THAI LETTER CHO CHAN;;;;

0E09;THAI CHARACTER CHO CHING;Lo;0;L;;;;;N;THAI LETTER CHO CHING;;;;

0E0A;THAI CHARACTER CHO CHANG;Lo;0;L;;;;;N;THAI LETTER CHO CHANG;;;;

0E0B;THAI CHARACTER SO SO;Lo;0;L;;;;;N;THAI LETTER SO SO;;;;

0E0C;THAI CHARACTER CHO CHOE;Lo;0;L;;;;;N;THAI LETTER CHO CHOE;;;;

0E0D;THAI CHARACTER YO YING;Lo;0;L;;;;;N;THAI LETTER YO YING;;;;

0E0E;THAI CHARACTER DO CHADA;Lo;0;L;;;;;N;THAI LETTER DO CHADA;;;;

0E0F;THAI CHARACTER TO PATAK;Lo;0;L;;;;;N;THAI LETTER TO PATAK;;;;

0E10;THAI CHARACTER THO THAN;Lo;0;L;;;;;N;THAI LETTER THO THAN;;;;

0E11;THAI CHARACTER THO NANGMONTHO;Lo;0;L;;;;;N;THAI LETTER THO NANGMONTHO;;;;

0E12;THAI CHARACTER THO PHUTHAO;Lo;0;L;;;;;N;THAI LETTER THO PHUTHAO;;;;

0E13;THAI CHARACTER NO NEN;Lo;0;L;;;;;N;THAI LETTER NO NEN;;;;

0E14;THAI CHARACTER DO DEK;Lo;0;L;;;;;N;THAI LETTER DO DEK;;;;

0E15;THAI CHARACTER TO TAO;Lo;0;L;;;;;N;THAI LETTER TO TAO;;;;

0E16;THAI CHARACTER THO THUNG;Lo;0;L;;;;;N;THAI LETTER THO THUNG;;;;

0E17;THAI CHARACTER THO THAHAN;Lo;0;L;;;;;N;THAI LETTER THO THAHAN;;;;

0E18;THAI CHARACTER THO THONG;Lo;0;L;;;;;N;THAI LETTER THO THONG;;;;

0E19;THAI CHARACTER NO NU;Lo;0;L;;;;;N;THAI LETTER NO NU;;;;

0E1A;THAI CHARACTER BO BAIMAI;Lo;0;L;;;;;N;THAI LETTER BO BAIMAI;;;;

0E1B;THAI CHARACTER PO PLA;Lo;0;L;;;;;N;THAI LETTER PO PLA;;;;

0E1C;THAI CHARACTER PHO PHUNG;Lo;0;L;;;;;N;THAI LETTER PHO PHUNG;;;;

0E1D;THAI CHARACTER FO FA;Lo;0;L;;;;;N;THAI LETTER FO FA;;;;

0E1E;THAI CHARACTER PHO PHAN;Lo;0;L;;;;;N;THAI LETTER PHO PHAN;;;;

0E1F;THAI CHARACTER FO FAN;Lo;0;L;;;;;N;THAI LETTER FO FAN;;;;

0E20;THAI CHARACTER PHO SAMPHAO;Lo;0;L;;;;;N;THAI LETTER PHO SAMPHAO;;;;

0E21;THAI CHARACTER MO MA;Lo;0;L;;;;;N;THAI LETTER MO MA;;;;

0E22;THAI CHARACTER YO YAK;Lo;0;L;;;;;N;THAI LETTER YO YAK;;;;

0E23;THAI CHARACTER RO RUA;Lo;0;L;;;;;N;THAI LETTER RO RUA;;;;

0E24;THAI CHARACTER RU;Lo;0;L;;;;;N;THAI LETTER RU;;;;

0E25;THAI CHARACTER LO LING;Lo;0;L;;;;;N;THAI LETTER LO LING;;;;

0E26;THAI CHARACTER LU;Lo;0;L;;;;;N;THAI LETTER LU;;;;

0E27;THAI CHARACTER WO WAEN;Lo;0;L;;;;;N;THAI LETTER WO WAEN;;;;

0E28;THAI CHARACTER SO SALA;Lo;0;L;;;;;N;THAI LETTER SO SALA;;;;

0E29;THAI CHARACTER SO RUSI;Lo;0;L;;;;;N;THAI LETTER SO RUSI;;;;

0E2A;THAI CHARACTER SO SUA;Lo;0;L;;;;;N;THAI LETTER SO SUA;;;;

0E2B;THAI CHARACTER HO HIP;Lo;0;L;;;;;N;THAI LETTER HO HIP;;;;

0E2C;THAI CHARACTER LO CHULA;Lo;0;L;;;;;N;THAI LETTER LO CHULA;;;;

0E2D;THAI CHARACTER O ANG;Lo;0;L;;;;;N;THAI LETTER O ANG;;;;

0E2E;THAI CHARACTER HO NOKHUK;Lo;0;L;;;;;N;THAI LETTER HO NOK HUK;;;;

0E2F;THAI CHARACTER PAIYANNOI;Lo;0;L;;;;;N;THAI PAI YAN NOI;paiyan noi;;;

0E30;THAI CHARACTER SARA A;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA A;;;;

0E31;THAI CHARACTER MAI HAN-AKAT;Mn;0;NSM;;;;;N;THAI VOWEL SIGN MAI HAN-AKAT;;;;

0E32;THAI CHARACTER SARA AA;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA AA;;;;

0E33;THAI CHARACTER SARA AM;Lo;0;L;<compat> 0E4D 0E32;;;;N;THAI VOWEL SIGN SARA AM;;;;

0E34;THAI CHARACTER SARA I;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA I;;;;

0E35;THAI CHARACTER SARA II;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA II;;;;

0E36;THAI CHARACTER SARA UE;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA UE;;;;

0E37;THAI CHARACTER SARA UEE;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA UEE;sara uue;;;

0E38;THAI CHARACTER SARA U;Mn;103;NSM;;;;;N;THAI VOWEL SIGN SARA U;;;;

0E39;THAI CHARACTER SARA UU;Mn;103;NSM;;;;;N;THAI VOWEL SIGN SARA UU;;;;

0E3A;THAI CHARACTER PHINTHU;Mn;9;NSM;;;;;N;THAI VOWEL SIGN PHINTHU;;;;

0E3F;THAI CURRENCY SYMBOL BAHT;Sc;0;ET;;;;;N;THAI BAHT SIGN;;;;

0E40;THAI CHARACTER SARA E;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA E;;;;

0E41;THAI CHARACTER SARA AE;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA AE;;;;

0E42;THAI CHARACTER SARA O;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA O;;;;

0E43;THAI CHARACTER SARA AI MAIMUAN;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA MAI MUAN;sara ai mai muan;;;

0E44;THAI CHARACTER SARA AI MAIMALAI;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA MAI MALAI;sara ai mai malai;;;

0E45;THAI CHARACTER LAKKHANGYAO;Lo;0;L;;;;;N;THAI LAK KHANG YAO;lakkhang yao;;;

0E46;THAI CHARACTER MAIYAMOK;Lm;0;L;;;;;N;THAI MAI YAMOK;mai yamok;;;

0E47;THAI CHARACTER MAITAIKHU;Mn;0;NSM;;;;;N;THAI VOWEL SIGN MAI TAI KHU;mai taikhu;;;

0E48;THAI CHARACTER MAI EK;Mn;107;NSM;;;;;N;THAI TONE MAI EK;;;;

0E49;THAI CHARACTER MAI THO;Mn;107;NSM;;;;;N;THAI TONE MAI THO;;;;

0E4A;THAI CHARACTER MAI TRI;Mn;107;NSM;;;;;N;THAI TONE MAI TRI;;;;

0E4B;THAI CHARACTER MAI CHATTAWA;Mn;107;NSM;;;;;N;THAI TONE MAI CHATTAWA;;;;

0E4C;THAI CHARACTER THANTHAKHAT;Mn;0;NSM;;;;;N;THAI THANTHAKHAT;;;;

0E4D;THAI CHARACTER NIKHAHIT;Mn;0;NSM;;;;;N;THAI NIKKHAHIT;nikkhahit;;;

0E4E;THAI CHARACTER YAMAKKAN;Mn;0;NSM;;;;;N;THAI YAMAKKAN;;;;

0E4F;THAI CHARACTER FONGMAN;Po;0;L;;;;;N;THAI FONGMAN;;;;

0E50;THAI DIGIT ZERO;Nd;0;L;;0;0;0;N;;;;;

0E51;THAI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;

0E52;THAI DIGIT TWO;Nd;0;L;;2;2;2;N;;;;;

0E53;THAI DIGIT THREE;Nd;0;L;;3;3;3;N;;;;;

0E54;THAI DIGIT FOUR;Nd;0;L;;4;4;4;N;;;;;

0E55;THAI DIGIT FIVE;Nd;0;L;;5;5;5;N;;;;;

0E56;THAI DIGIT SIX;Nd;0;L;;6;6;6;N;;;;;

0E57;THAI DIGIT SEVEN;Nd;0;L;;7;7;7;N;;;;;

0E58;THAI DIGIT EIGHT;Nd;0;L;;8;8;8;N;;;;;

0E59;THAI DIGIT NINE;Nd;0;L;;9;9;9;N;;;;;

0E5A;THAI CHARACTER ANGKHANKHU;Po;0;L;;;;;N;THAI ANGKHANKHU;;;;

0E5B;THAI CHARACTER KHOMUT;Po;0;L;;;;;N;THAI KHOMUT;;;;

The first few semicolon-separated fields are code (in hex), Unicode name, type of character, combining class, left-to-right property (for mixing with Arabic or Hebrew).

Posted
Why? 

Because with an XLS spreadsheet, one can readily experiment with sorting and macros, and look at the co-relationships of how the Unicode sorts, etc.

An experimenter cannot easily "do that" with what you posted Khun Richard because what you posted in not in an Excel spreadsheet .... to answer your question "why?"

Cheers!

Posted
...I concede that it is surprising (but generally convenient) that Thai dictionaries ignore consonant clustering in their sort order.

Sorry Richard, but can you explain what you mean by this, perhaps with a couple of examples?

The general rule in sorting Indic languages is that one sorts by the order of the phonetic elements, so เพลา [M]phlau 'axis' would be sorted as + + เ-า. However, in Thai, this would be very inconvenient if you don't know how the word was pronounced, because you could be looking at เพลา [M]phee[M]laa '(meal) time', which would be sorted as + เ- + + . (This word is actually a doublet of เวสา 'time'.) The rule that is applied is to ignore clusters, so both words เพลา are sorted as + เ- + + . Similarly, แหน [RL]haen 'keep for oneself' and [RL]nae 'duckweed' are sorted as + + , and appear next to one another in the dictionary. As tonemarks (and maitaikhu and karan) are used to sort only when the consonants and vowels are the same, the four words แ้หง [RL]ngae 'sheepishly', แหง่ [LL]ngae 'buffalo calf', แห่ง [LS]haeng 'place' and แห้ง [FL]haeng 'dry, hoarse' appear together and in that order in the dictionary, all sorted as + + and then ordered on the basis of the presence and type or absence of a tonemark.

In most Indic languages the problem does not arise, because the absence of a vowel is marked. However, the vowel absence markers in the Thai script, phinthu and yamakkan, are rarely used.

Posted
(This word is actually a doublet of เวสา 'time'.)

Dear Khun Richard,

Finally! I get a chance to improve your excellent posts - an honor, Sir.

Time == เวลา not your accidental typo เวสา above.

Just a tiny typo, (saw so) v. (law ling)

Khrap Phom!

Mr. Farang

Posted
(This word is actually a doublet of เวสา 'time'.)

Dear Khun Richard,

Finally! I get a chance to improve your excellent posts - an honor, Sir.

Time == เวลา not your accidental typo เวสา above.

Just a tiny typo, (saw so) v. (law ling)

Khrap Phom!

Mr. Farang

Khun Mr. Farang.

Curious as to why you say "saw so" for . All my books including Thai school books use "saw seua" (tiger) as the example? Is there a standardized set of words used for the support words for the characters? However my books are really old, like me. :o

Posted
(This word is actually a doublet of เวสา 'time'.)

Finally! I get a chance to improve your excellent posts - an honor, Sir.

Time == เวลา not your accidental typo เวสา above.

But you missed a real clanger! I said 'sorting Indic languages', but I should have said 'sorting in Indic scripts'. Thai is very definitely not an Indo-European language, let alone an Indic language!

Posted
Khun Mr. Farang.

Curious as to why you say "saw so" for  . All my books including Thai school books use "saw seua" (tiger) as the example?  Is there a standardized set of words used for the support words for the characters?  However my books are really old, like me.  :D

Dear Khun Tywais,

I made a mistake! (Was watching Hurricane Rita heading toward the coast of Texas - third "biggest" hurricane in the history of US hurricanes!!)

:o

Yours sincerely,

Mr. Farang

Posted
Is there a standardized set of words used for the support words for the characters?

Yep there is. The words are fixed (although some do no fit the words anymore - Kor Khun for example as it is becoming obsolete the spelling for Khun has changed). Other than that (and maybe the extra couple of consonents - if your books say 46 instead of 44) your 'old' books should serve you just as well.

PS: To get a list in Excel as requested earlier try pasting this into an empty workbook's VBA (Macro) and run it. It will list all the characters with their ASCII (pointless, but for information) and Unicode in both hex and decimal.

Some vowels will be preceeded by Ohr Ang (Zero Char) as they will not print to the screen otherwise.

Public Sub MakeList()
 Dim lngLoop As Long
 Dim lngRowNumber As Long
 Dim strChar As String
 
 'Set Initial Row Number
 lngRowNumber = 1
 
 'Clear down earlier run
 Cells.Select
 Selection.ClearContents
 
 'Put in headers
 Cells(1, 1).Value = "Char"
 Cells(1, 2).Value = "ASCII (Hex)"
 Cells(1, 3).Value = "ASCII (Dec)"
 Cells(1, 4).Value = "UniCode (Hex)"
 Cells(1, 5).Value = "UniCode (Dec)"

 'Loop through valid UniCode Chars
 For lngLoop = CLng(&HE01) To CLng(&HE5B)
   'Ignore the missing parts of the Unicode set (no characters defined)
   If lngLoop < &HE3B Or lngLoop > &HE3E Then
     lngRowNumber = lngRowNumber + 1
     Cells(lngRowNumber, 1).NumberFormat = "@"  'Make it text so no's will not show as digits
     
     strChar = ChrW(lngLoop) 'Get the character from the UniCode Value
     
     If (lngLoop = &HE31) Or (lngLoop > &HE33 And lngLoop < &HE3F) Or _
        (lngLoop > &HE46 And lngLoop < &HE4F) Then
        ' place zero character (Ohr Ang) for these vowels/marks so they will print to screen
        Cells(lngRowNumber, 1).Value = ChrW(&HE2D) & strChar ' Ohr Ang + Char
     Else
       Cells(lngRowNumber, 1).Value = strChar         ' Character
     End If
     Cells(lngRowNumber, 2).Value = Hex(Asc(strChar)) ' Hex Ascii
     Cells(lngRowNumber, 3).Value = Asc(strChar)      ' Decimal Ascii
     Cells(lngRowNumber, 4).Value = Hex(lngLoop)      ' Hex UniCode
     Cells(lngRowNumber, 5).Value = lngLoop           ' Decimal Unicode
           
   End If
   
 Next lngLoop
End Sub

:o

Posted
PS: To get a list in Excel as requested earlier try pasting this into an empty workbook's VBA (Macro) and run it. It will list all the characters with their ASCII (pointless, but for information) and Unicode in both hex and decimal.

Some vowels will be preceeded by Ohr Ang (Zero Char) as they will not print to the screen otherwise.

Dear Khun Wolf,

Excellent, thank you. The Macro worked great! You are a talented macro programmer - ging ging na krap... :-) .

I experimented with the spreadsheet created by the macro and sorted on "Char". Sample partial results are attached in a screenshot. Note the vowels without "Awe Ang"come before the consonants.

Interesting. Any ideas why anyone?

Yours sincerely, Mr. Farang

post-21382-1127434667_thumb.jpg

Posted
I experimented with the spreadsheet created by the macro and sorted on "Char".  Sample partial results are attached in a screenshot.  Note the vowels without "Awe Ang"come before the consonants. 

Interesting.  Any ideas why anyone?

Yes! :D The sorting for Thai has been miscoded. Someone somewhere has misunderstood the Thai vowels.

In most Indic languages, there are two types of vowel symbols. There are the 'independent vowels', which are used without consonants, and the 'dependent vowels', which are used with consonants. Roughly speaking, the independent vowels are the ones that appear at the start of a word. In Devanagari, the most important of the Indic alphabets, the independent vowels come before the consonants in alphabetic order. It may also be relevant that in Devanagari, only one of the vowels appears before its consonant.

Thai has ditched the independent vowels, except for o ang, which it combines with the dependent vowels to make up for the lack of independent vowels. (Burmese and Khmer are moving in the same direction - independent vowels are chiefly used for Pali/Sanskrit loans.)

I think someone thought that the Thai vowel symbols that can appear at the start of a word were independent vowels, and carefully ordered them between the digits and the consonants. :o:D:D Note that sara aa and sara am, which don't need a leading o ang, aren't misordered, but appear after the consonants.

Incidentally, ignore the 'ASCII' columns generated by the spreadsheet. What is happening is that Asc of a Thai string effectively returns a question mark - 3F is the ASCII code for '?'.

Posted
Hi,

Are you using a Thai or English version of Windows ME? I found that some programs eg Pirch would only sort Thai in alphabetical order under Thai Windows.

Secondly, I seem to remember that getting Office XP to display Thai properly on English Windows ME was not straightforward and involved hacking the registry and installing extra files etc.

I got on my old computer with Thai version Win98 and Thai Office2000 and it sorted perfectly. Apparently the Thai version has an algorithm that will handle the vowels which occur before consonants. I don't have the latest version of Thai XP windows and office to try the sort. Hopefully it will work as well.

Again, thanks everyone for all the excellent posts.

Bryan

Posted

One solution to the problem is to generate a sort key in another column from the text and sort on that. Normally you will want to keep this extra column hidden. The function key() defined by the following macro seems to work:

Public Function key(raw As String)
key = Len(raw)
pending = ""
proc = ""
follower = ChrW(&HE45) 'Character after last of preposed.
For pos = 1 To Len(raw)
   onch = Mid(raw, pos, 1)
   code = AscW(onch)
   If code = follower Then
       proc = proc & follower
   End If
   If Len(pending) = 0 Then
       If &HE40 <= code And code <= &HE44 Then        'Preposed vowel
           pending = onch
       Else
           proc = proc & onch
       End If
   Else
       proc = proc & onch & follower & pending
       pending = ""
   End If
Next pos
If Len(pending) = 0 Then
   proc = proc & pending
End If
key = proc
Exit Function
End Function

Swapping round preposed vowels and the following consonant does not quite work, as เลข would still sort before ลม. The trick I use to get round the preposed vowels' being misordered (between the digits and the consonants) is to insert a lakkhangyao () between the characters, the next character in the normal sequence, and for good measure double original lakkhangyaos.

I've tested the macro on Windows 2000 (Excel 2000? - I forgot to check the version number) and on Excel 2002 with Visual Basic 6.3 under Windows XP. It works with the original demonstration list, and it seems to work with tones.

  • 13 years later...
Posted

To sort using Thai dictionary alphabetical order in MS Word and Excel, you need to set Window's format to Thai. This is how you do it in Windows 10:

 

Control Panel > Region > "Formats" tab > Format: Thai (Thailand) > "Apply" button

 

This setting will also change other things such as the default currency to baht and the date to Thai language and Thai years. So after you have done the sorting, you might want to change back to your previous format setting.

 

image.png.01aa08e44318ea6e79bbb8cdd11646c1.png

 

 

 

  • Like 1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...