Jump to content

ThaiNotes

Member
  • Posts

    136
  • Joined

  • Last visited

Posts posted by ThaiNotes

  1. I've added a number of new features to the dictionary.

     

    - Abbreviations are coloured.  However over them for a popup showing the expansion and an English explanation.  (No English where the reference is to a Thai language book.)

     

    - Where there is a reference to another word in the dictionary (for example, กก ๗ - ดู กะวะ) you can click on the word (in this case กะวะ) to see that word's definition.

     

    - After you search for a word, you can search for that word on four other dictionaries directly:  thai-language.com, thai2english.com, LEXiTRON and Thai language Wikipedia.  To do this, position the mouse over one of the terms and a popup will appear.  Select the appropriate dictionary and the search result will appear in a new tab.

     

    There are a number of known issues:

     

    - Performance for some queries can be poor, particularly browsing by letter and reverse lookup for short, common words.  This I think I can improve with further work, but it's not a straightforward fix.

     

    - Some of the links added are spurious.  That's more of an irritation than a problem.  I don't expect to be able to fix this.  (The problem is inconsistent formatting of the RID's entries.)

     

    - Previously the application downloaded the dictionary word list once, and then used a locally stored copy.  For some reason it's now downloading the word list every time you start it.  This should be easy to fix, if I can track down the cause.

     

    - Font sizes are inconsistent in the results displayed.

     

    - The popups sometimes hide themselves when they shouldn't.  The workaround for this is to move the mouse cursor over the popup itself.  Then all will be fine.

     

    - For the popups that link to third party sites, there's no guarantee that the word will be available at that site.  (If I can get lists of all words at the third party sites I can fix this.)

     

    - LEXiTRON searches are done via my webserver, so can be rather slow.  (All other external searches are done directly.)

     

    As ever, all and any feedback much appreciated.

     

    The dictionary remains at http://thai-notes.com/dictionaries/RIDictionary.html

  2. Is there any on-line Thai dictionary, apart from the Royal Institute Dictionaries (and various sites such as sanook.com which serve up the RID's definitions)?

    I'm interested in Thai dictionaries - not Thai-English dictionaries such as LEXiTRON, Thai2English, thailanguage.com and others of that ilk.

    Thanks.

  3. Thanks for that, Mole. However, I am confused by ญวน. I had thought that this referred to an historic empire, much of which was within Vietnam's current borders. If the dictionary compilers mean (modern) Vietnamese why didn't they use the more obvious เวียดนาม?

    As for คำโบราณ : archaic - does that mean that the word has largely fallen out of use (as in English words such as egads, forsooth, prithee) and is only used for special effect - usually comic? Or is it simple seen as an ancient word which is still used in Thai.

    (For what it's worth, I attach a list of all words classified as คำโบราณ in the RID.)

    ???????.txt

  4. I want to translate the abbreviations used in the RID into English. Would somebody mind checking the following, please? (I'm not aiming for word-for-word here - just to capture the sense.) The ones I'm least sure of I've marked with an asterisk.

    เขมร : Khmer
    ตะเลง : Mon*
    ละติน : Latin
    จีน : Chinese
    เบงกาลี : Bengali
    สันสกฤต : Sanskrit
    ชวา : Javanese*
    ปาลิ (บาลี) : Pali
    อังกฤษ : English
    ญวน : Yuan
    ฝรั่งเศส : French
    ฮินดี : Hindi
    ญี่ปุ่น : Japanese
    มลายู : Malay
    กริยา : Verb
    วิเศษณ์ (คุณศัพท์หรือกริยาวิเศษณ์) : Adverb
    นาม : Noun
    สรรพนาม : Pronoun
    นิบาต : Prefix, determiner*
    สันธาน : Conjunction
    บุรพบท : Preposition
    อุทาน : Exclamation
    คือ คำที่ใช้ในกฎหมาย : Law
    คือ คำที่ใช้ในบทร้อยกรอง : Poetry
    คือ คำที่ใช้ในวงการทูต : Diplomacy
    คือ คำที่ใช้ในวงการเมือง : Politics
    คือ คำที่ใช้ในวงการศึกษา : Education
    คือ คำที่ใช้ในการเกษตรกรรม : Agriculture
    คือ คำที่ใช้ในคณิตศาสตร์ : Mathematics
    คือ คำที่ใช้ในวิชาคอมพิวเตอร์ : Computing
    คือ คำที่ใช้ในเคมี : Chemistry
    คือ คำที่ใช้ในจริยศาสตร์ : Ethics
    คือ คำที่ใช้ในชีววิทยา : Biology
    คือ คำที่ใช้ในดาราศาสตร์ : Astronomy
    คือ คำที่เป็นภาษาเฉพาะถิ่น เช่น (ถิ่น-ปักษ์ใต้) คือ คำที่เป็นภาษาถิ่นภาคใต้ (ถิ่น-พายัพ) คือ คำที่เป็นภาษาถิ่นภาคพายัพ (ถิ่น-อีสาน) คือ คำที่เป็น ภาษาถิ่นภาคอีสาน : Regional dialect
    คือ คำที่ใช้ในธรณีวิทยา : Geology
    คือ คำที่ใช้ในการบัญชี : Accounting
    คือ คำที่ใช้เฉพาะในหนังสือ ไม่ใช่คำพูดทั่วไป เช่น กนก ลุปต์ ลุพธ์ : Written language only
    คือ คำโบราณ : Old fashioned*
    คือ คำที่ใช้ในปรัชญา : Philosophy
    คือ คำที่เป็นภาษาปาก : Colloquial
    คือ คำที่ใช้ในพฤกษศาสตร์ : Botany
    คือ คำที่ใช้ในแพทยศาสตร์ : Medicine
    คือ คำที่ใช้ในฟิสิกส์ : Physics
    คือ คำที่ใช้ในวิชาไฟฟ้า : Electronics
    คือ คำที่ใช้ในภูมิศาสตร์ : Geography
    คือ คำที่ใช้ในมานุษยวิทยา : Anthropology
    คือ คำที่ใช้ในวิชาแม่เหล็กไฟฟ้า : Electromagnetics*
    คือ คำที่ใช้ในราชาศัพท์ (ถ้าไม่มีอธิบายเป็นอย่างอื่น ให้หมายความว่า ใช้เฉพาะ ของเจ้านาย) : Royal language
    คือ คำที่ใช้ในเรขาคณิต : Geometry
    คือ คำที่เลิกใช้แล้ว : Obsolete
    คือ คำที่ใช้ในวิทยาศาสตร์ : Science
    คือ คำที่ใช้ในวรรณกรรม : Literature
    คือ คำที่ใช้ในไวยากรณ์ : Grammar
    คือ คำที่ใช้ในศาสนศาสตร์ : Religion
    คือ คำที่ใช้ในเศรษฐศาสตร์ : Economics
    คือ คำที่ใช้ในสถิติศาสตร์ : Science
    คือ คำที่ใช้ในสรีรวิทยา : Physiology
    คือ คำที่ใช้ในสังคมศาสตร์ : Social science
    คือ คำที่เป็นสำนวน : Aphorism
    คือ คำที่ใช้ในสัตวศาสตร์ : Zoology
    คือ คำที่ใช้ในวิชาแสง : Optics*
    คือ คำที่ใช้ในโหราศาสตร์ : Astrology
    คือ คำที่ใช้ในอุตุนิยมวิทยา : Meteorology

    Thank you.

  5. PS. I don't know if you know this online dictionary : http://dict.longdo.com/
    If it similar to what you made but the layout is not so nice and it misses wildcard functionality.
    It also doesn't remember words previously looked up.
    It also does a multi dictionary lookup (including the RID, and it supports French and German).


    I'd been vaguely aware of it, but never looked at it seriously.

    It uses a rather different approach from mine in that I store the word list locally, Longdo doesn't. That means I can provide suggestions much faster and can also reasonably support wildcard searches.

    Rather oddly, I think, when searching for a term, Longdo's suggestions are words including the letters typed, not words starting with what's typed. Not sure why they decided to do this.

    The RID lookup appears to be from the previous version of the RID (though there is a link to look up a word with the latest version at the RID website).

    And one other big difference: my site has no advertising and I'm not trying to sell anyone anything.

    Anyway, interesting to have a look at it.

    As for linking to looking up words in other dictionaries, that's something I've thought about. In particular, linking to thai-language.com would be useful. There used to be a way to do that, but I don't think it's available any more. Something to look at in the future.
  6.  

     

    I'd also like to find some way later of letting Thai people know the dictionary is there, since I think it would be useful to them, as well as to non-native speakers.

     

    The Farang Can Learn Thai FB group has over 14,000 members and a great deal of them are Thai. If you post there, Thais will make sure it's known.

     

     

    Thanks for the suggestion.  Once the program is a bit more stable (and I've learned how to use Facebook) I'll give that a go.

  7. The slow loading has been fixed.

    Stop reading here if you're not interested in technical details.

    [hr]I use MySQL to store the list of words and to cache individual entries. My hosting provider migrated me from 5.5 to 5.6. This didn't cause problems until I needed to reload the database. Now if you want to use Unicode in 5.6 the collation has changed. Before I was using utf8_unicode_ci collation throughout. Now one is forced to use utf8mb4_unicode collation for tables just to be able to reload them. It didn't occur to me at the time that I also needed to change the collation for the indices. I'd thought doing so might fix the problem. It didn't. The query

    SELECT COUNT(DISTINCT headword) FROM entries

    still ran incredibly slowly - around 25-30 seconds. What is more, it returned the wrong answer - which wasn't particularly a surprise since MySQL's handling of Unicode is atrocious. I tried a couple of things such as forcing a binary query, which gave the correct answer. However, the query which downloads the wordlist, namely

    SELECT DISTINCT headword FROM entries

    completely ignored all tone marks, so rather than returning ก, ก็, กก, ก๊ก... it just returned ก, กก... In other words, Unicode in MySQL 5.6 is now even more broken that it was before.

    The only solution I could think of was to include all duplicates, which slows the download (which isn't a big deal, since it should be a one off event), but also uses more browser local storage, which is finite.

    Anyway, everything is back to normal and running fast again. Curse you, MySQL.
     
    I'm not sure whether there are going to be any other knock-on problems after the database "upgrade".  If you spot anything odd, please do let me know.

  8. Maybe just one small remark, if a wildcard entry doesn't exist, for instance กกก* I get no message telling me this (or any other feedback).


    That was a mistake on my part. The previous message saying how many matches there were got accidentally dropped when I did some major refactoring and I didn't notice. It's back there now.

    I'll work on the slow loading next.
  9. The new version doesn't work at all on my system. The only thing I get is a blank webpage. (tested on firefox and chromium on ubuntu)

     

    It looks like there's a temporary problem with Google Web Fonts.

     

    If you wait about 10-15 seconds the request for the fonts will time out and the dictionary should then display OK.  (At least, that's what's happening for me at the moment.)  This should fix itself in time when Google sorts out the problem at their end.

     

    If there's still a problem I'll have another look on Monday to see if I can sort something out.

  10. It's taken a lot longer (and been much tougher) than I had anticipated, but the latest update to the program is now live.

     

    There's not a lot to see on the surface; there's no new functionality.  However, I now believe all the RID entries are now being displayed correctly (with the exception of two words).

     

    There is one small bug that I know of:  when using wildcards and more than 500 entries match, there's no longer a warning that only the first 500 are displayed.  I'll fix this when I get a chance.

     

    I'm now fairly certain that it's usable.  Next step, add new functionality.

  11. If I might make a small suggestion, I would like to recommend the inclusion of another feature of the Royal Institute website beyond  the dictionary which might be incorporated into your "front-end".

     

    This  is the listing of ลักษณะนาม found at http://www.royin.go.th/th/profile/index.php?SystemModuleKey=265&SystemMenuID=1&SystemMenuIDS= and subsequent pages. This listing, pages 1 - 22, can be accessed by clicking on the page number. This listing is in alphabetical order in two columns, the first column is the noun and the second its classifier. This list is not referenced in the dictionary itself.  I wonder if it would be possible for your front end to have a link to this listing so that whenever a particular noun is chosen by the user, its classifier would show up as well. I apologize for not using correct IT language in this note, but perhaps you can understand what I mean.

     

    The current array is difficult to use because the site contains no information regarding which range of words by alphabet are included in each page. For each word  you wish to look up you need to guess the appropriate page. This process requires a bit of trial and error. This listing is ripe for technical improvement.

     

    Thank you for your consideration.

     

    Perhaps you're not familiar with http://thai-notes.com/tools/classifiers.shtml ?

     

    It uses the data you refer to and allows you to search by word (to find classifier) and by classifier (to find words which use that classifier).

     

    The classifier data are limited to 3900 entries.  That seems rather small compared with the 40,000+ entries in the RID.  Of course, only nouns have classifiers, but the large discrepancy suggests the classifier data are incomplete.

     

    My longer term goal is to allow integrated querying of all three dictionaries (LEXiTRON, RID and classifiers).  The user interface will be similar to what I have now for RID & LEXiTRON.  (Can't remember why I used a slightly different interface for the classifiers.)  The user will be able to select which dictionaries to retrieve matches from, then display the results - though I can't picture yet how the results will be displayed on the right.

  12. There's no Thai word for minimalism.

    Good point. I guess it's rather like the way that rural restaurants seem to acquire more and more fairly lights as the months and years pass on. And benjarong has nothing of the restrained aesthetic of Japanese ceramics. In Thailand it seems less is never more.

    Something else that occurs to me is that probably the majority of Thai web users accesses sites through mobile devices. Swiping up and down a long single page may be more convenient than clicking on links. (That's conjecture on my part. I don't access the Internet through my 'phone.)

  13. If two people are using the same computer it can create problems because if either one has logged in and selected remember my password. This info is saved in a cookie. You should clear your cookies if you are having problems.

    Surely, if one logs off, the application should no longer remember the id/password and remove the cookies?

    This sounds like an application bug to me.

  14. I wasn't sure whether to post this under Thai language or Internet, but I'm hoping I'll get a better response here.

    The Thai language websites I visit are all (without exception) poorly designed, and often broken. For example:

    • truevisions.com - the TV guide didn't work for years, and now, though it works, it only covers terrestrial channels
    • pantip.com - purple text on a purple background
    • bloggang.com - ugly blogs customised by using raw HTML/CSS
    • immigration.go.th - very amateurish design (particularly the animated graphics/links on the left), and parts of it (the appointment booking system) only work in Internet Explorer
    • rirs3.royin.go.th/dictionary.asp (the Royal Institute Dictionary) - a one line change would make this site work 1000% better (by including the page encoding in the HTML) - now when many people visit it they are presented with gibberish and manually have to change the browser's page encoding.
    Generally speaking, there's an over-reliance on Flash (meaning that content isn't being indexed by the search engines), far too much in the way of movement (animated .gifs, rotating banners), uninspiring fonts, and overlong pages requiring one to page down again, and again, and again.

    There are a lot of gateway pages using large Flash images with no HTML link to the content meaning that the entire site is potentially not indexed. A particularly horrible gateway page at the moment is with tescolotus.com which includes loud football chanting. In the Occident website designers dropped gateway pages as a bad idea years ago.

    Where an English language version is "provided", often a lot of the content provided in the form of graphics with Thai text with no translated graphic (e.g. tescolotus.com, bigc.co.th) and sometimes it simply isn't there at all.

    Interestingly, a number of very popular sites have adopted a similar style: very long pages full of tiled images with subtitles. For example, sanook.com, teenee.com, truelife.com, siamsport.com, kapook.com.

    I find the situation quite difficult to understand. Are Thai people not visually literate (which is hard for me to believe), or do the designs actually appeal to them? Or is it that design is irrelevant, and they only care about content? Do the owners not care that the sites don't work properly, or is it that their IT experts lack the skills to produce properly working sites?

    The problems are so pervasive that I'm led to ask: is there even a single, well-designed, Thai-produced, Thai language website out there? If so, I'd love to see it. Please post a link.

  15. ok, I was able to find some irregularities:
     
    Try : *บ้าน*
     
    Some of the matches don't make sense.
     
    Also when I forget to change my keyboard layout after * I get a strange error message.
    Try: *[hko*

     
    The problem with *บ้าน* is the RI data.  What they usually do is put the head word in the first column of a table, then put the definitions in the second column.  The junk that's coming back is because for those words they've put both the head word and the definitions all in the first column.  The nonsensical matches do contain "บ้าน", but in the definitions.  I can possibly fix this with even more rigorous parsing of the stuff coming back from the RI website.
     
    I'm not surprised *[hko* is a problem.  The presence of * and h makes the application think it's a regular expression.  However, the inclusion of [ (which can also be part of a valid regular expression) messes things up since it's not terminated by ].  I probably should validate to make sure that the only non-Thai characters entered as those I recognise as part of my pseudo-regular expressions.
     
    For an initial release I think I'm pretty much functionality-complete.  I now need to make sure I can handle all the RI's random data formats and can handle 100% of the dictionary entries.  Then I'll move on to the next stage which is trying to entice more people to try the application and give feedback.
  16. Just one little thing. When I lookup กรก I get "Not found". Same for กรกฎ. I think it has something to do with the ,  after the entry in the RID.

     

    Yup.  Another problem.  Though the RID itself has as the กรก, กรก- head word, I create separate index entries for กรก and กรก-.  Both should appear in the suggestions (only the first does), and both should result in the กรก, กรก- definition being retrieved (they don't).
     

    I'll look into this tomorrow.

  17. All your changes work fine for me.

    Also the font issue seems to be solved.

    When I type ก* and hit Enter, I get one (the first) entry.

    It's supposed to be like that now, right?

     

    The wildcard lookup looks very fast to me - like instantly.

    Tested on Chrome and firefox on ubuntu.

     

    It think it's a great an useful piece of work. Hope you'll keep this online forever.

     

     

    Literally a couple of minutes after I put that version live I realised that there was an ambiguity:  if you are using wildcards and press enter, do you want to run the wildcard query and retrieve all matches? Or do you want the currently selected suggestion (which defaults to the first suggestion)?

     

    It's been a struggle, but I've changed things so now when you enter a wildcard expression, no matches are shown, just the total number of matches.  Then when you hit Enter or click on the Enter button all the matches are retrieved.

     

    There may be a bug in the code, or possibly duplicates in the database, so the number of matches is occasionally out.  I'll look into this when I have time.

     

    There's also (I think) a problem with the matching/non-matching of tone marks for wildcards.  Again, to be looked into.

  18. Just released an updated version of the program.  Changes include:

     

    (1) Locally storing the word list to save time on subsequent startups.  (The local copy of the list will be updated if the master list changes.)

     

    (2) Added ability to close dictionary entries by clicking on the x in the top right corner of each entry.

     

    (3) Added suggested words for wildcard lookups.  (I had worried this might slow things down, but it seems OK to me.)

     

    (4) Fixed a problem with displaying entries with multiple subdefinitions.

     

    (5) Changed the alphabetic sort order - leading hyphens are now ignored.  (Previously it wasn't possible to select the entry for "ก" because of the entries beginning "-ก" sorted before it.

     

  19. The Droid Sans version available as a web font is a subset; it doesn't include the Thai character set.  (This is to keep the download size small.)

     

    For the first one, I think if you right click on the text, then select "inspect element", then click on the "computed" tab and scroll to the bottom you'll see what font is actually being displayed.  For me it's "OTS derived font" - not Droid Sans Thai as it should be.

     

    Unfortunately, Firefox and Opera don't have a comparable feature (at least that I can find), so what fonts they're actually displaying is a bit of a mystery.

     

    Anyway, unless I get reports of problems I think I'll leave the fonts as they are for the moment.

  20. The database error is because I wasn't checking whether there were zero matches.  My bad.  Will fix.

     

    The font issue is complex.  The page is supposed to be using Google Web Fonts' Droid Sans Thai, however, it doesn't.  Not sure whether this is a known bug from Chrome 33, or a problem with the font itself.  (A similar issue has been reported for a Hebrew web font from Google.)  This resulted in the browser using its default font, which in some browsers (Opera 12, that's you) looks dreadful.  Now I encourage the browser to use one of a list of fonts of my choosing.  If you're a Linux user, the chances are that your default font was Waree, and that's the font I'm specifying for such users.

     

    It's possible to store the word list locally.  I already do this for the LEXiTRON-based dictionary.  There are a number of ways to do this and I'll look into it.  Not 100% certain it will be faster to load from disk, though, since the data from the Internet is compressed for faster transfer.

  21. I've been getting this error after trying to submit a post. It's been reported a few times before, and there hasn't been any answer. What's going on?

    I'm guessing it's a problem with two people sharing a PC, both with accounts here.

    There's a possibly related issue that even after explicitly signing out, I can come back and find myself still logged on.

    Any explanation? Solution?

  22. Just a very minor update:

    (1) I've added the code to handle when the user resizes the browser window. If you go to a very small window it looks a mess, but that can't really be avoided.

    (2) I've added links to the reference pages from the RID covering things such as alphabetic order and etymology. However, the results don't display correctly yet. (Only discovered that after installing the new software version.) It's a pain having to deal with all the idiosyncrasies of the RID's HTML.

    (3) Thai font handling should be better now - though it's not as consistent as I'd like.
  23. May I give one remark? I don't know if it's technically possible to solve it ....

    There are many newlines in the explanations. If I narrow down the width of my browser window and I go to the dictionary I get something like this:

    น. ชื่อไม้ล้มลุกชนิด Typha angustifolia L. ในวงศ์ Typhaceae ขึ้น
    ใน
    น้ำ ช่อดอกคล้ายธูปขนาดใหญ่, กกธูป ธูปฤๅษี ปรือ หรือ เฟื้อ ก็เรียก.

    Do you notice the newline after ใน?
    It kinda messes up the layout.

    Enlarging the browser window does not redraw the content correctly. I have close the browser first, then open a new larger browser window and open the dictionary again.

    I think it might be technically hard to filter out the newlines? So you might consider changing it to a fixed width webpage in your css file?

    Also, can you change the height of the wildcard reference window?


    With the new lines, what you're seeing is the hard coded new lines (after ใน). The break after ขึ้น is being added by your browser because it can't fit all the text onto a single line.

    You don't need to close your browser to get the text to display properly in a larger browser window. Just reload the page (Ctrl-r) after you resize. Getting the application to redraw the window on resize is already on the "to do" list. (The quickest fix would simply be automatically to reload the page on resize, but that would lose any previous query data. Can't decide whether that would be a problem or not.)

    I'll fix the wildcard reference window height with the next release.
  24. If it would become popular, would your server be ale to handle many requests?

    Are you planning to promote it?
    A simple message in the "Farang can learn Thai" Facebook group with 12000 members, would give you many users....
    And how about the copyright? Is there any?


    The current website hosting is a very cheap, shared server plan. Undoubtedly I'd hit problems if the volume of requests was high. If the problems were from the reverse lookups I'd probably disable that feature, or somehow limit it. (The queries are very database intensive.) If I needed to switch to a more expensive server plan I might add advertising to the site (though I'm not sure how much that would raise), or try to solicit donations to cover costs.

    I'm afraid I haven't learnt how to use Facebook. I'm not really sure what it does. I'll look into that later, once the dictionary is out of alpha. I don't want people visiting and finding something's broken and never returning to the site.

    I'd also like to find some way later of letting Thai people know the dictionary is there, since I think it would be useful to them, as well as to non-native speakers.

    The copyright issue is an interesting one. There is no copyright statement on the RI dictionary website itself that I can see (though there is one on the RI website front page). I'm hoping that by just spidering the RI dictionary website and cacheing the results I fall into the same category as a search engine such as Google. If the Royal Institute isn't happy with what I've done, then I'll have to take the site down. However, I'd hope they'd see it as something positive. And if they wanted the code, I'd be happy to give it to them to incorporate into their own website.
×
×
  • Create New...