In a couple of previous postings the question of Mysql vs other databases for the jmdict project has been raised but not commented upon. While experimenting...
Hi everyone, I'm wondering if anyone knew of any existing tools out there that might be able to tell me, given the string of a word or a bunch of words, which...
This message is about Human beings, Democracy, UNHCR, Refugees, The Iraqis, Islam, Kurds, Human rights, Respect, Money, Donations, Angelina Jolie, Pavarotti,...
I guess it's to be expected that if one's name and email address can be preloaded into the amendment form, people want it in the new entry form too. So I blew...
G'day All, Rene & I have been discussing the list of words (available on a WWW page or two) of words that are considered non-PC in various contexts. For...
The papers from last year's Web_as_Corpus workshop are now available online (only 15 months after the event.) See: http://wackybook.sslmit.unibo.it/ Some good...
All, The purpose of this is not to start a flamewar of which is better over that, etc., but to gage what tool would be best for the job. What follows here is...
Mornin' I just noticed that the German meanings seem to have fallen out of Kanjidic2. % grep 'm_lang="de"' kanjidic2.xml | wc -l 0 Did I miss an e-mail about...
Dear all, I don't know whether this index would be useful for KanjiDIC. Lawrence Howell and Hikaru Morimoto set up a database with etymological data of 1940 ...
Hi all, I am new here, and am really interested in the JMDICT project! I am currently interested in converting the XML file to database, as many of you as I...
Here is a possible schema for the jmdict project which I offer for discussion. The attached zip file containins: README.txt -- This file. schema.png -- Schema...
Hi all, I've made some changes to the edict management tool demo - I call it Benedict incidentally. This doesn't include anything from Stuart's schema yet...
Well with a little bit of playing around in Access I have deterimed that there are 1262 words* in Edict that - Have two or more different senses given - Are...
Jim. You have an entry in your wishlist: ENAMDICT gps coordinates for place names Stuart McGraw Dream on 8-)} If they were available, they could be added...
I think this counts as a "wish list" idea but ... In Edict (/JMdict) there are a lot of (exp) type entries and compound words where the headword is actually...
I was trying to work out what "<variant>" meant in the context of kanjidic2.xml, so I picked a random test point using the classic Nelson: [$B10(B] is a...
Those of you who watch the daily diff files may have noticed a steady trickle of $B30Mh8l(B entries being merged. I have been working through a long list...
I have put the "sens" (<misc>) tag into operation, and changed the markings on the PC entries Rene has flagged. They look OK as far as I can see. Adding such a...
It looks like this needs its own thread. I already mentioned the problem with "阿吽" (="Om"/"Aun") -- the iso639 code for sanskrit is "sa", not "sanskr" ...
One thing that a lot of people don't know is that the school grades for kanji also include kanji readings. In the field r_status for ja_kun and ja_on readings...
As many on the list will know, I have been trying to move to stage where the dictionary files can be edited online without everything going past me. I have...
In the process of parsing jmdict for loading info a database, I gathered some information of the use of "keywords" (xml entities and fixed strings used in...
Since all of the "misclass" q_codes in kanjidic2 are actually skip codes, I suggest making <q_code qc_type="misclass">PP2-3-14</q_code> into something like ...
At the moment there is stuff like this in kanjidic2.xml: <literal>$B9~(B</literal> (that is "komu" from moushikomu in case the encodings go wrong). ...
... It certainly is a pointless argument, nobody involved has the slightest say in what words end up being spoken by the citizens of Japan. It also has...
Two changes to the kanjidic2.xml file today: - the hangul readings have been revised. Francis Bond and Kyonghee Paik checked them, and found some mapping...
Some of the hangul in kanjidic2.xml are repeated twice for the same character. There seem to be two romanizations in "korean_r" but the same hangul. I thought...