Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

edict-jmdict · The JMdict/EDICT Group

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 139
  • Category: Other
  • Founded: Jul 18, 2006
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 1 - 30 of 4980   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#1 From: Jim Breen <Jim.Breen@...>
Date: Tue Jul 18, 2006 6:26 am
Subject: Testing the mailing list
breen_jim
Send Email Send Email
 
Greetings,

This is just a test message.

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#2 From: Jim Breen <Jim.Breen@...>
Date: Tue Jul 25, 2006 8:25 am
Subject: Test for Japanese content.
breen_jim
Send Email Send Email
 
こんばんは,

テストだけです。

ジム

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#3 From: Jim Breen <Jim.Breen@...>
Date: Tue Jul 25, 2006 8:37 am
Subject: Another test
breen_jim
Send Email Send Email
 
This time I'm seeing if I can turn off the HTML, etc.

ジム


--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#4 From: "from_csted" <jim@...>
Date: Tue Jul 25, 2006 11:07 am
Subject: Hi Everyone
from_csted
Send Email Send Email
 
My wife and I are slowly getting our lives back together, which is really not so
bad despite the
accident.  Rolomail is almost back up, and I'll be back into development mode
soon and
processing new EDICT versions into Ice Mocha.

I did loose my laptop in the crash, and my iMac lost its CD/DVD drive.  I also
discovered that
the recently released Civilization IV requires a G5 minimum... and the iMac is
only a G4.  One
year old machines are apparently no good anymore.

I haven't been active on the bboards for some time, so I'm not aware of the
latest EDICT
news.  Been too busy registering my company here etc... Not sure if my project
to make kanji
movies from SODs can procede as the scripts for making the animations was on the
laptop...
can't recall if I ever got a modern version of GD running on the iMac or not. 
Don't even know
how many submissions are waiting for me to edit.

Otherwise, can't wait to get into discussions here...

Hope all is well out there... that's my long winded "hello".  Jim Rose

#5 From: "Paul Blay" <blay.paul@...>
Date: Tue Jul 25, 2006 2:52 pm
Subject: 良し 【よし】 (n) OK!; all right!
blay_paul
Send Email Send Email
 
Discuss: Does
よし (int) (uk) OK!; all right! *
really have the kanji 良し ?

There is no indication of that in the 広辞苑 entry.
The 大辞林 entry
http://dictionary.goo.ne.jp/search.php?MT=%A4%E8%A4%B7&kind=jn&mode=0&base=1&row\
=8
does state
〔形容詞「よし」から〕
which you might presume is 良し, but it isn't actually stated.

-  Paul

P.S. At any rate I'll get to see if Japanese text is going
to make it through Yahoo intact.

* I've submitted an amend for the (int) and (uk).

#6 From: "nurbs100" <nurbs1@...>
Date: Wed Jul 26, 2006 12:51 am
Subject: Archives
nurbs100
Send Email Send Email
 
I am not able to view archived messages (all 5 of them) within the
Yahoo Groups HP.

Jim, is there a setting you need to activate from within the
moderator's page to make this possible?

Or it could just be me (most likely).

I currently can only see Home and Post on the left hand side of the
page; normally there are all kinds of other options for things like
posting files, pictures, etc.

Cheers!
Todd

#7 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 12:52 am
Subject: Welcome everyone
breen_jim
Send Email Send Email
 
Welcome everyone to the EDICT/JMdict mailing list. I hope it turns
out to be useful and active.

I hope Yahoo Groups works out OK for the mailing list. It can be slow
at times, and I know the Honyaku people are looking for a new home
for that reason. Please note that Yahoo's WWW interface is pretty
hopeless for sending Japanese emails - they all come out in SGML
entity codes (〹, etc.) which don't work with non-browser email
readers. Please use regular email clients set to operate in ISO2022-JP.

A few bits of news:

As many will know, the EDICT and JMdict files (as well as the ENAMDICT
and KANJIDIC/KANJD212 ones) are in continous release mode. Any changes
(by me at present) go live overnight, with updated copies going on the
Monash ftp site and the WWWJDIC server. WWWJDIC also shows any new
entries which have been submitted by users.

I have begun rewriting the basic documentation of the EDICT and JMdict
version of the main dictionary. See:
http://www.csse.monash.edu.au/~jwb/edict_doc.html
I am treating it as the one dictionary (which is what it is) with
several different formats for distribution.

I have a few changes in mind which I'll discuss in later emails.

Cheers

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#8 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 1:12 am
Subject: Re: 良し 【よし】 (n) OK!; all right!
breen_jim
Send Email Send Email
 
[Paul Blay ([edict-jmdict]
=?ISO-2022-JP?B?GyRCTkkkNxsoQiAbJEIhWiRoJDchWxsoQiAobikgTw==?=
=?ISO-2022-JP?B?SyE7IGFsbCByaWdodCE=?=) writes:]
>> Discuss: Does
>> よし (int) (uk) OK!; all right! *
>> really have the kanji 良し ?

I think so.

>> There is no indication of that in the 広辞苑 entry.
>> The 大辞林 entry
>>
http://dictionary.goo.ne.jp/search.php?MT=%A4%E8%A4%B7&kind=jn&mode=0&base=1&row\
=8
>> does state
>> 〔形容詞「よし」から〕
>> which you might presume is 良し, but it isn't actually stated.

Well, 広辞苑 also has よ・し【良し・善し・好し】 and points at
よい.

That raises the question whether the 良し/よし and
善し/よし entries
are really the same thing.

>> P.S. At any rate I'll get to see if Japanese text is going
>> to make it through Yahoo intact.
>>
>> * I've submitted an amend for the (int) and (uk).

I've made those changes. Tks.

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#9 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 1:03 am
Subject: Re: Archives
breen_jim
Send Email Send Email
 
[nurbs100 ([edict-jmdict] Archives) writes:]
>> I am not able to view archived messages (all 5 of them) within the
>> Yahoo Groups HP.
>>
>> Jim, is there a setting you need to activate from within the
>> moderator's page to make this possible?
>>
>> Or it could just be me (most likely).
>>
>> I currently can only see Home and Post on the left hand side of the
>> page; normally there are all kinds of other options for things like
>> posting files, pictures, etc.

I think you have to "sign in" to see the archives. This means
registering an ID and password with Yahoo. It's pretty painless.

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#10 From: "Alpha Ranger" <nurbs1@...>
Date: Wed Jul 26, 2006 4:07 am
Subject: [OT] Possibly useful only references
nurbs100
Send Email Send Email
 
While these are not the most sophisticated dictionaries out there, they are fairly useful to me (a beginning-intermediate student of Japanese).
 
 
I found this through a suggestion to use the 擬音語 section.
 
Cheers!
Todd

#11 From: log@...
Date: Wed Jul 26, 2006 4:23 am
Subject: Re: [OT] Possibly useful only references
dame_zumari
Send Email Send Email
 
Alpha Ranger <nurbs1@...> wrote:

>
> While these are not the most sophisticated dictionaries out there, they are
> fairly useful to me (a beginning-intermediate student of Japanese).
>
> http://www.alc.co.jp/eng/kaiwa/hyogen/index.html

Even better is the main page, since it also gives you the translated
words in context: <http://www.alc.co.jp/>

________________________________________________________________________
                    Louise Bremner (log at gol dot com)
    If you want a reply by e-mail, don't write to my Yahoo address!

#12 From: "Kim Ahlstr旦m" <kim.ahlstrom@...>
Date: Wed Jul 26, 2006 5:20 am
Subject: EDICT and JMdict not updating since july 22
kim.ahlstrom
Send Email Send Email
 
Hi

As the subject says, the daily updates seem to have stopped a few days
ago. JMdict.gz, edict.gz and edicthdr.txt on the Monash FTP all show
july 22 as the date stamp and none of my daily diff'ings of edict
against the previous day's version have shown any changes since then.

And since this is my first post to the list I'll just briefly
introduce myself (ok, the list is new, but I think I have posted a
total of two messages to SLJ the last few years, so I think an
introduction is appropriate). I'm Kim from Sweden. I study Japanese at
Stockholm Univ and open source project entrepreneurship at a
vocational school. My association with edict/jmdict is that I run
Jisho.org and am working on a Kanjidic2/JMdict search application for
Mac OS X which will most likely be open source.

Glad to be here
Kim

#13 From: "Paul Blay" <blay.paul@...>
Date: Wed Jul 26, 2006 5:55 am
Subject: Re: EDICT and JMdict not updating since july 22
blay_paul
Send Email Send Email
 
Hi Kim,

I've found myself maintaining the (modified) Tanaka Corpus
used by Edict.

> As the subject says, the daily updates seem to have stopped a
> few days ago. JMdict.gz, edict.gz and edicthdr.txt on the
> Monash FTP all show july 22 as the date stamp and none of my
> daily diff'ings of edict against the previous day's version
> have shown any changes since then.

Lacks of update to examples.gz you can blame on me.  The last
update I sent in was on the 21st (I'm partway through a bit
of a backlog at the moment).

> My association with edict/jmdict is that I run
> Jisho.org and am working on a Kanjidic2/JMdict search
> application for Mac OS X which will most likely be open
> source.

Oooh, I like the dictionary links from example sentence words.

I do have a couple of suggestions though.
1. Could you consider having a feedback form for example
sentences (sent/copied to me) as they need all the fixing
they can get.
2. I assume you are using some sort of auto-parsing to get the
word links from the examples.  May I suggest that you would be
better off using the keywords line?

For example you have
[一日]1[個]のリンゴを食べれば[医者]はいらない+
(three words linked)
With the keywords that would be

[一日]1[個][の][リンゴ][を][食べれば]\
[医者]はいらない。
(7 words and particles linked)

Obviously the indexing of records is not yet complete.
There remain over 38,000 records that have some text
not indexed that should be indexed (the above record
being one such).

Best Wishes,

Paul

#14 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 6:00 am
Subject: Re: [OT] Possibly useful only references
breen_jim
Send Email Send Email
 
[log@... (Re: [edict-jmdict] [OT] Possibly useful only references) writes:]
>> Alpha Ranger <nurbs1@...> wrote:
>> > While these are not the most sophisticated dictionaries out there, they are
>> > fairly useful to me (a beginning-intermediate student of Japanese).
>> >
>> > http://www.alc.co.jp/eng/kaiwa/hyogen/index.html
>>
>> Even better is the main page, since it also gives you the translated
>> words in context: <http://www.alc.co.jp/>

Does ALC's 表現 distionary use Eijirou or something else?

Maybe it's time to mention (again) my monster dictionary link collection at:

http://www.csse.monash.edu.au/~jwb/onlinejdic.html

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#15 From: "Alpha Ranger" <nurbs1@...>
Date: Wed Jul 26, 2006 6:07 am
Subject: another intro -- Todd
nurbs100
Send Email Send Email
 
I might as well introduce myself as well.
 
I think Kim's project is great BTW; and I am looking forward to seeing it grow and mature in the future.  My being 1/8 Swedish has nothing to do with that either.
 
I have been studying Japanese off and on (more off than on, so don't be too impressed) since 1984.  I have a B.A. in Japanese from Ohio State University and a B.S. in Materials Science & Engineering.  I will start working on my MBA at Franklin University (back in the United States) this fall.  I have worked as a systems engineer, manager and materials (mostly glass/ceramic, but now plastics) engineer in the States.
 
I am currently living in Japan and working (as an intern!!!) at an automotive injection molding company in Shikoku.  I have been VERY fortunate to see A LOT of Japan during my many visits.
 
Ohio was the home to the first Japanese automotive plant (Honda) in the United States.  Ohio ranks number two for having the most Japanese companies.  California is number one.  Ohio State University has between 55,000 and 60,000 students, so there is always some Japanese related activity going on their as well.
 
I like to make lists!  And I like to make sure the list lists are appropriately detailed, correct and up to date.  Any questions as to why I am here?
 
I am also a decent programmer.  My main guns used to be C/C++, but I have been getting into PIC (et al.) programming as of late.  But my main interest is Python programming.  I love it!  And yes, I have used PERL and Ruby (I was the technical editor for O'Reilly's Ruby book in fact).  I love Python.  I think a lot of other Japanese dictionary folks do too (Jack Halpern comes to mind first).  It is has an approachable learning curve, it does everything (especially nice with RegEx and other string manipulation chores), it is reasonably fast, it runs on any platform, it is easy to read, it is easy to maintain (these two items are really PERL's downfall IMO), and so on.
 
The sad part is that I have never professionally coded Python (yet!).
 
I am currently writing a Python ray tracer that incorporates photon mapping.  And yes, I know this is of NO interest to almost any one on this list.
 
I don't like to be the lone gun on projects.  I am VERY productive on small, strong teams though.
 
I did hang out (mainly lurk) on the honyaku list for a couple of years, but that was several years ago.
 
And please don't hold it against Kim if he is a Mac guy.  Ha!  Hey, what ever tool it takes to get the job done!
 
Kim, what are you language are you coding in primarily at the moment?
 
Finally, in order to keep Jim happy (other than giving him lots a Vegemite sandwiches) [ed: thwack Todd over the head the next time he makes a wisecrack like that!] (Blame it on "Men at Work" for sticking that image in my brain!)[ed: I don't care what crazy 80s bands your stereotypes came from.  Knock it off!](er, well what about giving Jim a Foster's Lager or two?)[ed: Well, that might be acceptable if you give me one as well!], a few suggestions to the gmail users here that haven't done this already--
 
*  Click on Settings/荐絎 in the upper right hand corner of your browser window
*  Select ユ茯 for you language.
*  Then click on the "Save Settings" button at the middle of the bottom of the page
*  The page will automatically refresh
*  Select:
篆<<祉若吾潟潟若:
荅括完
* Click on the "Save Settings" button again (I don't remember the Japanese verbage off the top of me head).
 
And you should be set!  I am sure there are exceptions to this, so please feel free to post them here!
 
Cheers!
Todd
 
2006/7/26, Kim Ahlstr旦m <kim.ahlstrom@...>:
As the subject says, the daily updates seem to have stopped a few days
ago. JMdict.gz, edict.gz and edicthdr.txt on the Monash FTP all show
july 22 as the date stamp and none of my daily diff'ings of edict
against the previous day's version have shown any changes since then.

And since this is my first post to the list I'll just briefly
introduce myself (ok, the list is new, but I think I have posted a
total of two messages to SLJ the last few years, so I think an
introduction is appropriate). I'm Kim from Sweden. I study Japanese at
Stockholm Univ and open source project entrepreneurship at a
vocational school. My association with edict/jmdict is that I run
Jisho.org and am working on a Kanjidic2/JMdict search application for
Mac OS X which will most likely be open source.

Glad to be here
Kim

#16 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 5:54 am
Subject: Re: EDICT and JMdict not updating since july 22
breen_jim
Send Email Send Email
 
[=?UTF-8?Q?Kim_Ahlstr=C3=B6m?= ([edict-jmdict] EDICT and JMdict not updating
since july 22) writes:]
>> As the subject says, the daily updates seem to have stopped a few days
>> ago. JMdict.gz, edict.gz and edicthdr.txt on the Monash FTP all show
>> july 22 as the date stamp and none of my daily diff'ings of edict
>> against the previous day's version have shown any changes since then.

Very odd. I just logged in to the ftp server and ran the update scripts
"by hand", but I don't know why cron  didn't run them. I'll keep an eye
on it, and if they don't run tonight, I'll buzz the sysadmin.

>> And since this is my first post to the list I'll just briefly
>> introduce myself (ok, the list is new, but I think I have posted a
>> total of two messages to SLJ the last few years, so I think an
>> introduction is appropriate). I'm Kim from Sweden. I study Japanese at
>> Stockholm Univ and open source project entrepreneurship at a
>> vocational school. My association with edict/jmdict is that I run
>> Jisho.org and am working on a Kanjidic2/JMdict search application for
>> Mac OS X which will most likely be open source.

Welcome on board. (Kim and I hace corresponded in the past.)

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#17 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 6:27 am
Subject: Re: another intro -- Todd
breen_jim
Send Email Send Email
 
Thanks for the intro, Todd.

On Wed, Jul 26, 2006 at 03:07:26PM +0900, Alpha Ranger wrote:
>
> Finally, in order to keep Jim happy (other than giving him lots a Vegemite
> sandwiches)

Actually I prefer Vegemite on toast; usually at breakfast.

> [ed: thwack Todd over the head the next time he makes a
> wisecrack like that!] (Blame it on "Men at Work" for sticking that image in
> my brain!)[ed: I don't care what crazy 80s bands your stereotypes came
> from.  Knock it off!](er, well what about giving Jim a Foster's Lager or
> two?)[ed: Well, that might be acceptable if you give me one as well!], a few
> suggestions to the gmail users here that haven't done this already--

Fosters? No, of Australian beers I much prefer Coopers (or Cascade).
[http://www.coopers.com.au/] Mostly I am a wine drinker.

Jim

#18 From: "Alpha Ranger" <nurbs1@...>
Date: Wed Jul 26, 2006 6:49 am
Subject: Project Proposals
nurbs100
Send Email Send Email
 
I have several projects (of varying sizes) I have been kicking around.  I thought I would throw out a line and see if I can get any help (help comes in many forms!).
 
Some of these projects could have already been embarked upon and I just don't know they exist.  If that's the case, please let me know!
 
* Verb parser
 
I have never talked to Jim about this, but I would like to work on this just to do it.  And if I don't have any where to implement it, I will be glad to throw the logic/code out into the public domain.
 
This is inspired my Jim's relatively simple, but immenently useful (!), "Translate Words" function in WWWJDIC.  Since he has already "guessed" that a verb exists, the next step is to analyze the following barrage of kana (and others) that following to provide the verb translated to include the verb ending(s).  So this mean that if there exist multiple modifiers, they will all be including inte final translation.  I am sure linguists can work this better, but the point is pretty straight forward I think.
 
Initially this will entail analyzing what could follow the root and then creating a simplistic logic chart of things to follow. It will progressively become more detailed after that.
 
I sure this work exists already in a scholarly context, but it will either need to be re-interpreted or re-thought from the ground up.
 
----------------------------------------------------------------------------------------------------------
* 当用漢字・常用漢字の歴史
 
Again, I have NOT mentioned this to Jim.
 
The current contents of the 常用漢字 related entries in jmdic could be expanded to include the dates when items were added and the information from the 当用漢字.  For most people, this information would not be very useful.  But it might be extremely useful a few researchers (people that like that kind of thing).
 
Initially, I would just like to compile a list that would be easy to manipulate for what ever purpose.
 
----------------------------------------------------------------------------------------------------------
* Photo database
 
I debated whether to mention this one or not, but I figure it is better to have someone to spur me along.
 
Essentially this would be a database of pictures which would be linked to the WWWJDIC servers (initially).  Jim has given a green light on this one.
 
Things that need to be done:
 
- determine picture format (I am currently leaning toward medium resolution .png's with a thumb nail and the full res image also stored but not accessible (initially).)
- collect pictures (mainly for words related to Japan initially)
- determine naming method (something like a 5-digit, zero padded serial number followed by a brief descriptor (typically the main word that points to the picture))
- a database of picture information (including the file names, words pointing to this entry, date added, source, etc.)
- find a host (no bandwidth limits, stable, free, lots of storage space, relatively fast, going to be around in 5 years, etc.)
 
That is it initially.
 
----------------------------------------------------------------------------------------------------------
* Radical database
 
This is a small project.  Basically it would be a relatively small database of radical related information.  I have the framework written down.
 
This would be nice for electronic dictionary creators to link to if it does not exist already.
 
----------------------------------------------------------------------------------------------------------
 
That's it for now.
 
--Todd

#19 From: "Paul Blay" <blay.paul@...>
Date: Wed Jul 26, 2006 7:07 am
Subject: Re: Project Proposals
blay_paul
Send Email Send Email
 
Dear Todd,

> * Verb parser
>
> I have never talked to Jim about this, but I would like to
> work on this just to do it.  And if I don't have any where
> to implement it, I will be glad to throw the logic/code out
> into the public domain.
>
> This is inspired my Jim's relatively simple, but immenently
> useful (!), "Translate Words" function in WWWJDIC.  Since he
> has already "guessed" that a verb exists, the next step is
> to analyze the following barrage of kana (and others) that
> following to provide the verb translated to include the
> verb ending(s).  So this mean that if there exist multiple
> modifiers, they will all be including inte final
> translation.

So what would you actually have for something like 書いてある
in
やさしい英語で書いてあるので、この本は初心者に適している。
?

> Initially this will entail analyzing what could follow the
> root and then creating a simplistic logic chart of things to
> follow. It will progressively become more detailed after that.

One question is how 'far out' you want to go.  For example
the [V] link in WWWJDIC
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1W%BD%F1%A4%AF_v5k
covers quite a few forms and combinations but certainly far
from all.

How are you going to deal with auxillary verbs, casual
contractions and inserted particles?

e.g. 開いておく (置く aux-v)
死んじゃ (contraction of 死んでは)
行ってはいない (inserted は)

An extension to the [V] link that allows people to paste in
a conjugated verb for analysis would be nice.

P.S. If Jim's reading this - you get rather screwy results if you do a
word search on 書く.

Best Wishes,

Paul

#20 From: "Alpha Ranger" <nurbs1@...>
Date: Wed Jul 26, 2006 7:15 am
Subject: Project Proposal Revision
nurbs100
Send Email Send Email
 
Oops!  I forgot about radkfile/kradfile!
 
There is a little historical information that could be added for this file.  And some information to speed parsing such a signifying 常用漢字 directly in the file.
 
As Jim states in the file header, he expects people to modify this file to suit their needs.
 
Let me know if you are interested in exploring this further.
 
--Todd

#21 From: "Alpha Ranger" <nurbs1@...>
Date: Wed Jul 26, 2006 7:43 am
Subject: Re: Project Proposals
nurbs100
Send Email Send Email
 
> * Verb parser
>
So what would you actually have for something like 書いてある
in
やさしい英語で書いてあるので、この本は初心者に適している。
?
 
These are EXACTLY the types of things that would need to be addressed.  Initially, I would just like to work out the actual parsing so you know what you have got first (including contractions, aux verbs and inserted particles as you mention below).  Translating it is the tricky part (although I think we can get away with "general" verbage initially that will be immediately understandable to most using the tool--if they don't understand it, it probably isn't something they should be using--this is kind the same argument as using all kana (as opposed to ローマ字) in dictionaries).  For most cases, following a simplistic/naive approach (as is currently done) might be best (especially at first).  There might be several general forms and then those forms could be overridden for certain verbs/forms that warrant interest.  The "exception" list could grow with time if desired.

> Initially this will entail analyzing what could follow the
> root and then creating a simplistic logic chart of things to
> follow. It will progressively become more detailed after that.

One question is how 'far out' you want to go.  For example
the [V] link in WWWJDIC
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1W%BD%F1%A4%AF_v5k
covers quite a few forms and combinations but certainly far
from all.
 
If it is branched properly, most of the parsing could be recycycled; and if the code is written well, this could be very elegant.  And things generally are fairly standard as you get away from the root (although everyone can think of numerous exceptions).

How are you going to deal with auxillary verbs, casual
contractions and inserted particles?

e.g. 開いておく (置く aux-v)
死んじゃ (contraction of 死んでは)
行ってはいない (inserted は)
 
Ideally?  Yes.  Most of these (at least in my mind) are pretty systematic.  And better yet, they don't typically overlap with other meanings.  In other words, there is only one way to parse the endings--something won't pop up that you won't be able to determine whether it is A or B generally.  IOW, 信じゃ can only be interpreted as 死んでは once the root has been determined, right?  And this can then be generalized for most (all?) V-で+は contractions, right?
 
The biggest issue is selecting the root correctly.  After that, there are only so many paths that can be taken.  The next is to figure out the word based on the first few kana.  You then know the root pronunciation and root meaning and the verb type is known.  Etc....

An extension to the [V] link that allows people to paste in
a conjugated verb for analysis would be nice.
 
Agreed.

P.S. If Jim's reading this - you get rather screwy results if you do a
word search on 書く.
 
Whoa!  That is weird!!

--Todd


#22 From: "Paul Blay" <blay.paul@...>
Date: Wed Jul 26, 2006 7:58 am
Subject: Re: Project Proposals
blay_paul
Send Email Send Email
 
Hi Todd,

> IOW, 信じゃ can only be interpreted as 死んでは once the root
> has been determined, right?  And this can then be generalized
> for most (all?) V-で+は contractions, right?

I suspect you're going to come across various "interesting"
complications along the way but I'd be interested in how things
go.

-  Paul

#23 From: log@...
Date: Wed Jul 26, 2006 8:02 am
Subject: Re: [OT] Possibly useful only references
dame_zumari
Send Email Send Email
 
Jim Breen <Jim.Breen@...> wrote:

> [log@...(Re:[edict-jmdict] [OT] Possibly useful only references) writes:]
> >> Alpha Ranger <nurbs1@...> wrote:
> >> > While these are not the most sophisticated dictionaries out there,
> >> > they are fairly useful to me (a beginning-intermediate student of
> >> > Japanese).
> >> >
> >> > http://www.alc.co.jp/eng/kaiwa/hyogen/index.html
> >>
> >> Even better is the main page, since it also gives you the translated
> >> words in context: <http://www.alc.co.jp/>
>
> Does ALC's ?? distionary use Eijirou or something else?

No idea--sorry
>
> Maybe it's time to mention (again) my monster dictionary link collection at:
>
> http://www.csse.monash.edu.au/~jwb/onlinejdic.html

There are just so many links there, it's difficult to sort through them
all, I'm afraid.

________________________________________________________________________
                    Louise Bremner (log at gol dot com)
    If you want a reply by e-mail, don't write to my Yahoo address!

#24 From: "Kim Ahlstrm" <kim.ahlstrom@...>
Date: Wed Jul 26, 2006 9:19 am
Subject: Re: Re: EDICT and JMdict not updating since july 22
kim.ahlstrom
Send Email Send Email
 
On 7/26/06, Paul Blay <blay.paul@...> wrote:

Hi Paul

>  Lacks of update to examples.gz you can blame on me. The last
>  update I sent in was on the 21st (I'm partway through a bit
>  of a backlog at the moment).

I have to confess that I haven't updated the examples database used on
jisho.org since april. But the format is fast enough to do a daily
import of, so I'll add that in the next update. I'll also add you to
the credits on the about-page.

>  I do have a couple of suggestions though.
>  1. Could you consider having a feedback form for example
>  sentences (sent/copied to me) as they need all the fixing
>  they can get.

Sure. How would you prefer to have that implemented? A link to the
suggestion form in WWWJDIC (i presume that the ID numbers each
sentence has are the sequence they appear in the file) or something
custom? Come to think of it I should probably link to the word
correction page in WWWJDIC as well.

>  2. I assume you are using some sort of auto-parsing to get the
>  word links from the examples. May I suggest that you would be
>  better off using the keywords line?

I wrote the importer script over a year ago so I had to go back and
check, and I actually use the keywords line. But I'm not taking the {}
field into account ... but that was a quick fix. So the next time I
update the server (about once a month) it should link everything.
Thanks for the heads-up!

All the best
Kim

#25 From: "Paul Blay" <blay.paul@...>
Date: Wed Jul 26, 2006 9:41 am
Subject: Re: Re: EDICT and JMdict not updating since july 22
blay_paul
Send Email Send Email
 
Hi Kim,

> I have to confess that I haven't updated the examples database
> used on jisho.org since april. But the format is fast enough
> to do a daily import of, so I'll add that in the next update.

Well a daily import is probably a bit optimistic but a weekly
one would be sensible.  Do you do a check of the date stamp
to see if it's changed or not before downloading?

> I'll also add you to the credits on the about-page.

I wouldn't bother, although I'm not going to stop you.

> Sure. How would you prefer to have that implemented? A link
> to the suggestion form in WWWJDIC (i presume that the ID
> numbers each sentence has are the sequence they appear in the
> file) or something custom?

There's nothing special about the WWWJDIC form - and I don't
know how Jim handles ID's (whether they are preserved or not).
As long as it has a) The sentence(s) to be commented on and
b) Space to type corrections / comments then it's fine.

> Come to think of it I should probably link to the word
> correction page in WWWJDIC as well.
>
> >  2. I assume you are using some sort of auto-parsing to get
> >  the word links from the examples. May I suggest that you
> >  would be better off using the keywords line?
>
> I wrote the importer script over a year ago so I had to go
> back and check, and I actually use the keywords line. But I'm
> not taking the {} field into account ... but that was a quick
> fix. So the next time I update the server (about once a month)
> it should link everything.

Incidentally the keywords in the B line are in the order they
actually appear in the example.  Somebody really enthusiastic
could use that to prevent the wrong characters being
highlighted when there are ambiguous matches.

For example in the following the は is after さん so it
shouldn't get mixed up with the first half of はい.

「はい、ありません」とジョーダンさんは答えた。
"No, I don't," said Mr Jordan.
はい 有る{ありません} と[2] さん は
答える{答えた}

Best,

Paul

#26 From: "Kim Ahlstrm" <kim.ahlstrom@...>
Date: Wed Jul 26, 2006 9:48 am
Subject: Re: another intro -- Todd
kim.ahlstrom
Send Email Send Email
 
2006/7/26, Alpha Ranger <nurbs1@...>:

Greetings

> I think Kim's project is great BTW; and I am looking forward to seeing it
> grow and mature in the future.  My being 1/8 Swedish has nothing to do with
> that either.

Thank you. I'm looking forward to that too. I have a long list of
things to add, but currently it's fighting for my attention with
studying, making money and outdoor activities. But I try to update the
server software about once a month.

> And please don't hold it against Kim if he is a Mac guy.  Ha!  Hey, what
> ever tool it takes to get the job done!

A Mac is the finest tool there is to get the job done! In fact I'll be
attending Apple's World Wide Developers Conference in San Francisco in
two weeks as a student developer.

> Kim, what are you language are you coding in primarily at the moment?

Jisho.org is a 100% Perl backend using a nice web framework called
Catalyst <http://www.catalystframework.org/> with MySQL as the
database. The Mac app is being built with C/Objective-C.

I have fiddled with other languages as well. Java and Python when I
did a year of computational linguistics at Uppsala Univ, and some
poking around with PHP and Ruby - though not to the degree of editing
books on them.

> suggestions to the gmail users here that haven't done this already--

Oh differing encodings how I love thee. But I followed your
instructions so I hope Japanese will pass through here unharmed.

All the best
Kim

#27 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 10:51 am
Subject: Re: Project Proposals
breen_jim
Send Email Send Email
 
[Paul Blay (Re: [edict-jmdict] Project Proposals) writes:]
>>
>> P.S. If Jim's reading this - you get rather screwy results if you do a
>> word search on 書く.

Hmmm. Yes, screwy.  ....... OK, got it. I made a mod a week ago
to pull the (P) entries to the front in a kana-only lookup, but
accidently screwed up the supression of those "xref" entries
for non-kana lookups. Fixed now (I hope.)

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#28 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 10:54 am
Subject: Re: [OT] Possibly useful only references
breen_jim
Send Email Send Email
 
[log@... (Re: [edict-jmdict] [OT] Possibly useful only references) writes:]
>> Jim Breen <Jim.Breen@...> wrote:
>>
>> > Maybe it's time to mention (again) my monster dictionary link collection
at:
>> >
>> > http://www.csse.monash.edu.au/~jwb/onlinejdic.html
>>
>> There are just so many links there, it's difficult to sort through them
>> all, I'm afraid.

You mean all my careful categorization and short pithy descriptions
were a total flop?   8-<}

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#29 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 11:00 am
Subject: Re: Re: EDICT and JMdict not updating since july 22
breen_jim
Send Email Send Email
 
[Paul Blay (Re: Re: [edict-jmdict] EDICT and JMdict not updating since july 22)
writes:]
Kim>> > Sure. How would you prefer to have that implemented? A link
Kim>> > to the suggestion form in WWWJDIC (i presume that the ID
Kim>> > numbers each sentence has are the sequence they appear in the
Kim>> > file) or something custom?
>>
>> There's nothing special about the WWWJDIC form - and I don't
>> know how Jim handles ID's (whether they are preserved or not).
>> As long as it has a) The sentence(s) to be commented on and
>> b) Space to type corrections / comments then it's fine.

Don't use IDs. WWWJDIC runs off the raw text file, plus an index
containing byte offsets to the start of each sentence. The server
opens the big file in memory-mapped mode and seeks to the required
sentence. Thus the [Ex] links in the WWWJDIC pages point to the
starting byte of the sentence.

The feedback form relies on quoting the sentence. Possibly best to
replicate that. Currently teh feedback is emailed to Paul and me.

HTH

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

#30 From: Jim Breen <Jim.Breen@...>
Date: Wed Jul 26, 2006 10:48 am
Subject: Re: Project Proposals
breen_jim
Send Email Send Email
 
[Alpha Ranger ([edict-jmdict] Project Proposals) writes:]
>> * Verb parser
>>
>> I have never talked to Jim about this, but I would like to work on this just
>> to do it.  And if I don't have any where to implement it, I will be glad to
>> throw the logic/code out into the public domain.
>>
>> This is inspired my Jim's relatively simple, but immenently useful (!),
>> "Translate Words" function in WWWJDIC.  Since he has already "guessed" that
>> a verb exists, the next step is to analyze the following barrage of kana
>> (and others) that following to provide the verb translated to include the
>> verb ending(s).  So this mean that if there exist multiple modifiers, they
>> will all be including inte final translation.  I am sure linguists can work
>> this better, but the point is pretty straight forward I think.
>>
>> Initially this will entail analyzing what could follow the root and then
>> creating a simplistic logic chart of things to follow. It will progressively
>> become more detailed after that.
>>
>> I sure this work exists already in a scholarly context, but it will either
>> need to be re-interpreted or re-thought from the ground up.

The bare bones of the verb de-inflector in WWWJDIC are in the GPLed
xjdic code. It's pretty simplistic, and driven by a simple list of
inflections and root forms, e.g.

...
れる    れる    24
れます  れる    1
れました        れる    11
れませんでし    れる    13
れません        れる    12
れましょう      れる    18
れない  れる    0
れず    れる    27
れなけ  れる    29
....

The trailing number flags the label of the inflection.

>> * 当用漢字・常用漢字の歴史
>>
>> Again, I have NOT mentioned this to Jim.
>>
>> The current contents of the 常用漢字 related entries in jmdic could be
expanded
>> to include the dates when items were added and the information from the
>> 当用漢字.  For most people, this information would not be very useful. 
But it
>> might be extremely useful a few researchers (people that like that kind of
>> thing).

"jmdic". Or do you mean kanjidic?  The data on the extra kanji when the
当用漢字 became the 常用漢字 is all in Ken Lunde's book.

>> * Photo database
>>
>> I debated whether to mention this one or not, but I figure it is better to
>> have someone to spur me along.
>>
>> Essentially this would be a database of pictures which would be linked to
>> the WWWJDIC servers (initially).  Jim has given a green light on this one.

Yes, as I said to Todd, I could link to such a set of images if there were
a TOC I could extract. The way I link to jeKai entries is an example.

>> * Radical database
>>
>> This is a small project.  Basically it would be a relatively small database
>> of radical related information.  I have the framework written down.
>>
>> This would be nice for electronic dictionary creators to link to if it does
>> not exist already.

There is the file used by JDIC, xjdic, WWWJDIC, etc. See:
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwraddisp.cgi
That file is in the xjdic tarball.

Then there is the kradfile/radkfile data.

Ganbatte

Jim

--
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)               
ジム・ブリーン@モナシュ大学

Messages 1 - 30 of 4980   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help