Search the web
Sign In
New User? Sign Up
PANLocalization · PAN Localization Support Network
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 439 - 468 of 474   Newest  |  < Newer  |  Older >  |  Oldest
Messages: Show Message Summaries   (Group by Topic) Sort by Date v  
#468 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Fri Nov 6, 2009 7:32 pm
Subject: Research Publication on IDNs released
sarmad001
Offline Offline
Send Email Send Email
 

Dear All,

 

I am pleased to let you know that we have released a new title through PAN Localization project:

               

                From Protocol to Production: Implementing the IDNs

 

The book is available at http://www.panl10n.net/english/outputChart.htm.  The published version is also available.

 

Best regards,
Sarmad



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#467 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Sat Oct 31, 2009 5:03 am
Subject: IDNs lauched
sarmad001
Offline Offline
Send Email Send Email
 

ICANN launches Internationalized Domains at its Seoul meeting.  See www.ICANN.org for details.  IDN Fast Track process starts on 16th Nov.

 

Regards,
Sarmad

 



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#466 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Fri Oct 30, 2009 10:33 am
Subject: Nastalique and other complex Asian fonts on the mobile platform
sarmad001
Offline Offline
Send Email Send Email
 

Dear All,

 

CRULP (www.crulp.org) announces successful deployment of open source Pango rendering engine onto Symbian mobile development platform. This engine allows rendering of complex Asian writing systems through Open Type fonts. Please visit http://www.crulp.org/research/Project-Details/ALSMP.htm for some details.  The current work has been on Arabic script.  The picture shows Nafees Nastalique Open Type Font rendered on Nokia E51.  The project aims to continue to investigate deploying Pango on Adroid mobile platform by Google and develop training material to enable the same for other scripts.

 

The work has been completed through PAN Localization project (www.panl10n.net), with support of IDRC, Canada. 

 

Best regards,

 

 

-----------------

Sarmad Hussain

Center for Research in Urdu Language Processing (www.crulp.org)

National University of Computer and Emerging Sciences (www.nu.edu.pk)

B Block, Faisal Town

Lahore, PAKISTAN

 

Ph: +92 42-111 128 128 (ext. 241, 315)
Fax: +92 42-516 5232

 



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#465 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Mon Oct 5, 2009 12:01 pm
Subject: FW: Workshop at C-DAC,kolkata,IWSLPR-09,25th to 27th November,09
sarmad001
Offline Offline
Send Email Send Email
 
FYI

-----Original Message-----
From: Rajib Roy [mailto:rajibroy@...]
Sent: Monday, October 05, 2009 12:03 PM
To: O-COCOSDA2009

Importance: High

Respected Sir/Madam,
We are glad to inform you that, C-DAC; Kolkata is organizing an
International Workshop on Spoken Language Prosody (IWSLPR-09) at C-DAC,
Kolkata, India, during 25-27 November 2009.  The workshop was approved as an
ISCA supported event.  We are happy to request you to participate or depute
some researcher working with you/your organization in the above workshop.
The workshop brochure is enclosed for your kind reference.

If you have any query, please contact us

We are looking forward to see you in the Workshop

For further details Please visit our site

http://www.cdackolkata.in/IWSLPR-09/INDEX/index.html



With Highest regards,

Rajib Roy,
C-DAC,Kolkata,India
Centre for Development of Advanced Computing, Kolkata Plot-E2/1, Block-GP,
Sector-V, Saltlake, Kolkata-700091 Ph. 91-33-23579846/5989 Ext. 215
Fax: 91-33-23575141
mob:+919830539485




Disclaimer:
http://www.cdackolkata.in/Disclaimer.txt


--
This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.



__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

1 of 1 File(s)


#464 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Mon Sep 14, 2009 7:08 am
Subject: FW: Live Webcast: Nobel Laureates on communication and human development/Diffusion en direct sur le Web :deux lauréats d'un prix Nobel discutent de la communication et du développement humain
sarmad001
Offline Offline
Send Email Send Email
 

FYI. 

 

 


From: IDRC Events [mailto:events@...]
Sent: Friday, September 11, 2009 5:08 PM
To: Pauline Dole
Subject: Live Webcast: Nobel Laureates on communication and human development/Diffusion en direct sur le Web :deux lauréats d'un prix Nobel discutent de la communication et du développement humain




(La version française suit)



  Communication and Human Development: The Freedom Connection?

LIVE WEBCAST 

 A public discussion with

Amartya Sen, Michael Spence, Yochai Benkler, Clotilde Fonseca

Co-sponsored by the International Development Research Centre and
the Berkman Center for Internet & Society at Harvard University
 

 

 

Nobel Laureates Amartya Sen and Michael Spence join Information and Communication Technology (ICT) experts Yochai Benkler and Clotilde Fonseca in a discussion of the role of communication and ICTs in human development, growth, and poverty reduction.

 


Wednesday, September 23, 7:00 pm
Ames Courtroom, Austin Hall, Harvard Law School (map)
Free and open to the public.
Live video and audio-only streams will also be available.

Optional RSVP  via
Facebook, Upcoming

 

Six years ago, the first IDRC-sponsored Harvard Forum, “A Dialogue on ICTs and Poverty Reduction,” brought together Nobel Laureates Amartya Sen and Michael Spence with 30 leading thinkers and practitioners from around the globe. Since then, many transformative changes have occurred in the worlds of Information and Communication Technologies (ICTs) and development.

Communication and knowledge offered by emerging technologies enable or enhance a wide range of benefits and opportunities for the poor, such as improved access to employment and public services. But the expansion of new technologies also presents risks, including the potential for increased political control, invasion of privacy, and vulnerability to cyber-crime.

During this public forum, Professors Sen and Spence will join leading ICT experts Yochai Benkler and Clotilde Fonseca in a discussion of the role of communication and ICTs in human development, growth, and poverty reduction. Panelists and the in-person and online audiences will debate a range of topics, and reflect on what has changed in recent years, been learned and not been learned, and needs to be done most urgently.

Join IDRC and the Berkman Center in the Ames Courtroom, Harvard Law School, or online via live video or audio webcast.

 

You will find more details about this event on the Berkman Center's webpage

§              Webcast: http://cyber.law.harvard.edu/interactive/webcast

Contribute your thoughts and questions for the panelists and join the broader conversation via:

§              Question Tool: http://cyber.law.harvard.edu/questions/idrc09

§              IRC: irc://irc.freenode.net/berkman
    (an IRC client, e.g., Chatzilla for Firefox, is required)

§              Twitter: #idrc09

 

About the panelists:

Amartya Sen is Lamont University Professor, and Professor of Economics and Philosophy, at Harvard University. He has served as President of the Econometric Society, the Indian Economic Association, the American Economic Association and the International Economic Association. In 1998, he was awarded the Nobel Memorial Prize in Economic Sciences.

Michael Spence is a senior fellow at the Hoover Institution and the Philip H. Knight Professor Emeritus of Management in the Graduate School of Business at Stanford University. He is the chairman of the independent Commission on Growth and Development, focusing on growth in developing countries. He was awarded the Nobel Memorial Prize in Economic Sciences in 2001.

Yochai Benkler is the Berkman Professor of Entrepreneurial Legal Studies at Harvard, and faculty co-director of the Berkman Center for Internet & Society. He writes about the Internet and the emergence of networked economy and society, as well as the organization of infrastructure, such as wireless communications.

Clotilde Fonseca is a Founding Director of the Costa Rican Program of Educational Informatics created in 1988 in Costa Rica by the Omar Dengo Foundation and the Ministry of Public Education, a program that has reached over one and half million children and teachers. She has served for two decades as Executive Director of the Omar Dengo Foundation.

 

 

For more information, call 613 696 2075.

--------------------------------------------------------------------------------------------------

 

Pauline Dole,

 

Communication and Human Development: The Freedom Connection?

DIFFUSION EN DIRECT SUR LE WEB

Débat public avec

Amartya Sen, Michael Spence, Yochai Benkler et Clotilde Fonseca


Coparrainé par le Centre de recherches pour le développement international
et le Berkman Center for Internet & Society de l’Université Harvard

Amartya Sen et Michael Spence, deux lauréats d’un prix Nobel, et Yochai Benkler et Clotilde Fonseca, spécialistes des technologies de l’information et de la communication (TIC), participeront à une discussion sur le rôle de la communication et des TIC dans le développement humain, la croissance et la réduction de la pauvreté.

Le mercredi 23 septembre, à 19 h
Ames Courtroom, Austin Hall, Harvard Law School (plan)
Entrée libre et gratuite. Il sera également possible d’assister au débat grâce à la diffusion
vidéo ou audio en direct sur le Web.

RSVP (facultatif) :
FacebookUpcoming

 

*Veuillez notez que la session se passera en anglais.

 


Il y a six ans, la première rencontre de Harvard avait réuni deux lauréats d’un prix Nobel, Amartya Sen et Michael Spence, ainsi que 30 éminents penseurs et praticiens du monde entier. Cette rencontre, parrainée par le CRDI, se voulait un dialogue sur les TIC et la pauvreté. De nombreux changements se sont produits depuis lors, tant dans la sphère des TIC que dans celle du développement.

La communication et les connaissances auxquelles donnent accès les technologies émergentes offrent de très nombreux avantages et de plus grandes possibilités aux pauvres, notamment un meilleur accès à l’emploi et aux services publics. Toutefois, l’expansion des nouvelles technologies comporte également des risques, dont ceux d’un contrôle politique accru, de l’atteinte à la vie privée et de l’exposition à la cybercriminalité.

Pour ce débat public, les professeurs Sen et Spence se joindront à Yochai Benkler et à Clotilde Fonseca, éminents spécialistes des TIC, pour discuter du rôle de la communication et des TIC dans le développement humain, la croissance et la réduction de la pauvreté. Ces quatre experts invités et l’auditoire sur place et en ligne discuteront de toute une gamme de sujets et échangeront sur les changements qui se sont produits au cours des dernières années, sur ce qui a été appris, sur ce qui reste à apprendre et sur les mesures les plus urgentes à prendre.
 
Le CRDI et le Berkman Center vous invitent à assister à ce débat soit sur place, à la Ames Courtroom de l’Harvard Law School, soit en ligne, par l’entremise de la diffusion vidéo ou audio en direct sur le Web.

Vous trouvez des détails supplémentaires à la page web du Berkman Center

§              Diffusion en direct sur le Web: http://cyber.law.harvard.edu/interactive/webcast

Acheminez vos réflexions et vos questions aux experts invités et prenez part à la discussion grâce aux liens suivants :

§              Pour poser une question : http://cyber.law.harvard.edu/questions/idrc09

§              Pour discuter (protocole IRC) : irc://irc.freenode.net/berkman (il faut utiliser un client IRC, ChatZilla pour Firefox par exemple)

§              Twitter: #idrc09


Les experts invités

Amartya Sen est Lamont University Professor et professeur d’économie et de philosophie à l’Université Harvard. Il a été président de la Société d’économétrie, de l’Indian Economic Association, de l’American Economic Association et de l’Association internationale des sciences économiques. Il a reçu le prix Nobel d’économie en 1998.

Michael Spence est senior fellow à la Hoover Institution et professeur émérite Philip H. Knight de gestion à la Graduate School of Business, à l’Université Stanford. Il est président de la Commission sur la croissance et le développement, un organisme indépendant qui s’intéresse à la croissance dans les pays en développement. Il a reçu le prix Nobel d’économie en 2001.

Yochai Benkler est titulaire de la chaire Berkman en droit des entreprises à l’Université Harvard, et codirecteur (faculté) au Berkman Center for Internet & Society. Il est l’auteur d’ouvrages et d’articles sur Internet et l’émergence d’une économie et d’une société réseautées, ainsi que sur l’organisation des infrastructures, telles les communications sans fil.

Clotilde Fonseca est directrice fondatrice du Programa Nacional de Informática Educativa, le programme national d’informatique pédagogique mis sur pied en 1988 par la Fundación Omar Dengo (FOD) et le ministère de l’Éducation publique du Costa Rica dont ont bénéficié plus d’un million et demi d’enfants et d’enseignants. Elle est directrice générale de la FOD depuis deux décennies.

 

Pour vous renseigner, appelez au 613-696-2075.

 


* * * * * * * * * * * * * * * * * * * * * * * * * *
Having trouble with the link? Simply copy and paste the entire address listed below into your web browser: http://guest.cvent.com/i.aspx?1Q,P1,176229FE-D3ED-4CF1-99DC-C32641C05B36

Vous éprouvez des problèmes avec le lien ? Copiez plutôt cette adresse dans votre navigateur Web : http://guest.cvent.com/i.aspx?1Q,P1,176229FE-D3ED-4CF1-99DC-C32641C05B36

* * * * * * * * * * * * * * * * * * * * * * * * * *
If you no longer want to receive emails from IDRC Events please click the link below.  Click here

Si vous désirez ne plus recevoir de courriels de la part de IDRC Events, veuillez cliquer sur le lien ci-dessous.

 

 


cvent.com - Reach the Response

Dgroups is a joint initiative of Bellanet, DFID, Hivos, ICA, ICCO, IICD, OneWorld, UNAIDS and World Bank
--- You are currently subscribed to ict4d-futures as: mng@...
To unsubscribe send a blank email to leave-ict4d-futures-21784W@... Dgroups is a joint initiative of Bellanet, DFID, Hivos, ICA, ICCO, IICD, OneWorld, UNAIDS and World Bank
--- You are currently subscribed to ict4d-futures as: mng@...
To unsubscribe send a blank email to leave-ict4d-futures-21784W@...



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#463 From: "Reinhard Schaler" <Reinhard.Schaler@...>
Date: Tue Sep 1, 2009 1:25 pm
Subject: FW: The Rosetta Foundation Promotes Equal Access to Information
reinhardschaler
Offline Offline
Send Email Send Email
 

Recent Press Release on The Rosetta Foundation also available on http://www.iwr.co.uk/information-world-review/news/2248552/rosetta-foundation-makes-vital.  For details on the Action Week for Global Information Sharing, AGIS ’09, 21-23 September in Limerick, visit www.agis09.org.

 

For further information contact Reinhard.Schaler@... (+353-87-6736414).

 


 

The Rosetta Foundation Promotes Equal Access to Information

                                                                                                  

Charitable organization charges to end global information poverty

The University of Limerick, the Centre for Next Generation Localisation (CNGL) and Welocalize come together to support the launch of the Rosetta Foundation, a not-for-profit organization with the aim to advance the rights of individuals to access life-critical information in their native languages.  With a planned European and North American launch this fall, The Rosetta Foundation invites all to participate in the inception, development and endorsement of a program to end global information poverty. 

The Rosetta Foundation is a spin-off of the Localisation Research Centre at the University of Limerick, Ireland, and the Centre for Next Generation Localisation (CNGL), a major research initiative supported by the Irish Government. The primary purpose of The Rosetta Foundation is to make vital information on basic healthcare available to individuals all over the word irrespective of their social status, linguistic or cultural background, and geographical location. The Rosetta Foundation intends to deploy a localisation technology platform for volunteer translators and not-for-profit organizations that can contribute to the translation and distribution of life-guarding information to communities in need around the world.   

To this end The Rosetta Foundation is implementing an accessible and affordable open-source technology platform around GlobalSight and CrowdSight. Sponsored by Welocalize, Globalsight is an open-source Translation Management System (TMS) that helps automate the critical tasks associated with the creation, translation, review, storage and management of global content. With zero license fees GlobalSight provides a flexible, affordable and sustainable solution for organizations to deliver multilingual content to their end-users worldwide.  CrowdSight is another open-source application fully integrated with GlobalSight used specifically for crowdsourcing or to engage the right “crowd,†group or community to deliver quick-turn translation for on-demand content.

Through GlobalSight The Rosetta Foundation has access to a robust platform for managing, translating and delivering global content and can support the translation efforts of non-profit and non-governmental organizations in providing information to communities in need―in their local language. At the same time, The Rosetta Foundation benefits from an existing GlobalSight community of 1,500 members to solicit volunteers dedicated to promoting equal access to information through language and cultural diversity.

Reinhard Schäler, founder of The Rosetta Institute, explains, “Our initiative to develop an open source translation and localisation platform with GlobalSight as a backbone will widen the narrow focus of current mainstream localisation and bring the digital world closer to the three quarters of the world’s population who currently do not have access to it.â€

The European launch will take place at the AGIS ’09 conference in Limerick, Ireland on September 21-23, 2009.  AGIS, Action for Global Information Sharing, will provide an opportunity for volunteer translators, localization specialists and NGOs to come together to learn, network and celebrate their work.  To register and participate in this FREE event go to www.agis09.org.

The North American launch will take place at the Localization World conference in Santa Clara, California on October 20, 2009.  This pre-conference workshop will provide an overview of the organizational structure, the aims and objectives, and the strategic plan of The Rosetta Foundation. Participants will be introduced to the Foundation’s translation and localisation technology platform – GlobalSight. To register and participate in this FREE event go to www.localizationworld.com.

“The Rosetta Foundation is a commendable, wide-reaching initiative that is helping extend the benefits of the translation industry to the people that most need it", comments Smith Yewell, CEO of Welocalize and board member of The Rosetta Foundation. "Individuals all over the world are deprived of critical information in their native language that could potentially save their lives.  We are honored to support this initiative through the deployment of our open-source GlobalSight TMS and new crowdsourcing tool CrowdSight.  We believe that in order to grow and meet global content demands, we must collaborate to innovate.  The Rosetta Foundation intends to this in their effort to end global information poverty."  

About The Rosetta Foundation

The Rosetta Foundation aims to make information accessible to people independent of their social status, their linguistic and cultural background and their geographical location through the development and the deployment of an intelligent translation and localisation environment. Its work is supported by the translation and localisation community and funding agencies.

The translation and localisation platform development is based on an open source model making the platform available to the translation and localisation community. It is deployed and supported by The Rosetta Foundation for selected not-for-profit organisations and volunteer translators.

For more information on the launch event of The Rosetta Foundation, please visit www.agis09.org.


#462 From: Sarmad Hussain <sarmad.hussain@...>
Date: Thu Jul 2, 2009 2:28 am
Subject: Embedding Fonts in Web pages
sarmad001
Offline Offline
Send Email Send Email
 
I saw and interesting post today which indicates that font embedding is now increasingly being made possible.  See Url http://dev.w3.org/csswg/css3-fonts/#the-font-face-rule using @font-face to specify the URL location from where the font can be acquired if not available on the local machine.  This becomes very significant for online content publishing in complex scripts.

regards,
Sarmad

#461 From: Sarmad Hussain <sarmad.hussain@...>
Date: Sun Jun 28, 2009 7:05 am
Subject: Google Transliteration
sarmad001
Offline Offline
Send Email Send Email
 
Transliteration of some Indic languages released by Google:


regards,
Sarmad

#460 From: Vathena <nethsovathena@...>
Date: Fri Jun 26, 2009 10:00 am
Subject: Re: [PAN Localization] Re: NEED HELP for OCR Project
neth_sovathena
Offline Offline
Send Email Send Email
 
Thank you!

On Fri, Jun 26, 2009 at 4:09 PM, kruyvanna <kruyvanna@...> wrote:


Dear Sovathena,

I have tried to train tesseract with Khmer language.
It's just a trivial test of 3 characters.
So u might want to train it for the complete symbols.

here is the link:
http://vannait.blogspot.com/2009/06/how-to-train-tesseract-ocr.html

Cheers,
Kruy Vanna
GITS, Waseda University.



--- In PANLocalization@yahoogroups.com, "Rajesh Pandey" <pandey.com.np@...> wrote:
>
> Dear Neth you are always welcome.
>
>
> On 6/10/08, Vathena <nethsovathena@...> wrote:
> >
> > Dear Rajesh Pandey,
> >
> > Thanks for your guide to me.
> >
> > Regards,
> >
> > NETH Sovathena
> >
> > On Mon, Jun 9, 2008 at 1:36 PM, Rajesh Pandey <pandey.com.np@...>

> > wrote:
> >
> >> Dear Neth,
> >> I did a small research on Khmer language, installed Catalan Unicode for
> >> Khmer script and found out that the words don't seem to be segmented. Khmer
> >> characters seem quite similar to the Thai characters.
> >>
> >>
> >>
> >> 1. Segmentation involves segmentation of whole document into lines.
> >> 2. Segmentation of lines into words.
> >> 3. Segmentation of words into characters.
> >>
> >> For most ocrs (eg: English OCR, Nepali OCR)
> >>
> >> They work in a top down approach to segment:
> >> *Document -> lines -> words-> characters*
> >>
> >> *For Khmer OCR*
> >> However it looks like you have to approach in this way:
> >> *Document -> lines -> characters*
> >>
> >>
> >> *Character segmentation:*
> >> You have some advantages over Nepali/Devanagari characters:
> >> You don't have to worry much about character segmentation, because Khmer
> >> characters seem to be already segmented.
> >> In our case we have to put an extra effort on segmenting characters
> >> because Nepali/Devanagari characters are joined together in a word.
> >>
> >> *Word segmentation:*
> >> My preliminary research shows that Khmer words are not segmented. Meaning
> >> I did not find spaces between the words. Rather found long sequence of
> >> characters and the whole sentence has a bunch of characters. The speakers
> >> have segmentation according to their syllable or so.
> >> So may be you need to add some more algorithms for word segmentation, or
> >> use a spellchecker and / or grammar checker at the end.
> >>
> >>
> >> The output will be pretty good because there are no spaces between the
> >> words. The input does not have any spaces, so there will not be any spaces
> >> in the output. I guess that will not be a problem.
> >>
> >> Initially you might think of giving a try with Tesseract ocr. I am sure
> >> you will get pretty good results once you have trained.
> >> The homepage for tesseract-ocr is http://code.google.com/p/tesseract-ocr
> >> You might also subscribe to tesseract google groups :
> >> http://groups.google.com/group/tesseract-ocr
> >>
> >> Now good luck with training tesseract-ocr. I think after trying this once
> >> will bring a clear picture of an overall OCR.
> >>
> >>
> >>
> >>
> >>
> >> --- In PANLocalization@yahoogroups.com, "Bal Krishna Bal"
> >> <balkrishna7bal@> wrote:
> >> >
> >> > Dear Neth,
> >> > I have forwarded your email to the Nepali OCR Team and hopefully you
> >> will
> >> > receive a corresponding response very soon.
> >> > Regards,
> >> > Bal Krishna
> >> >
> >> >
> >> > On Mon, Jun 2, 2008 at 1:39 PM, Vathena nethsovathena@ wrote:
> >> >
> >> > > Dear All,
> >> > >
> >> > > My name is NETH Sovathena, a new Software Developer at PAN
> >> Localization
> >> > > Cambodia of IDRC.
> >> > > Now I am responsible for OCR (Optical Character Recognition) project.
> >> > >
> >> > > Now I write this email to all of you for asking some help.
> >> > >
> >> > > I am really difficult with my project--OCR. It is a complicated one
> >> for me
> >> > > while I am a new Software Developer and working with it.
> >> > > After I read any documents related to OCR, I have basic understanding
> >> and
> >> > > know the process of OCR such as Preprocessing, Segmentation, Feature
> >> > > Extraction, Recognition, and Post processing.
> >> > >
> >> > > Currently, I am doing on step " Understanding OCR ", and now focusing
> >> on
> >> > > SEGMENTATION. I try to find and search for Algorithm used for OCR, but
> >> I
> >> > > don't understand and do not find out any more documents and algorithm
> >> yet.
> >> > >
> >> > > Moreover, I do not understand each task that I need to do for this
> >> project
> >> > > such as:
> >> > >
> >> > > * Study OCR Framework
> >> > > * Document scope of OCR (font, sizes, styles, etc.)
> >> > > * Develop Segmentation Strategy
> >> > > * Develop Segmentation Module for Khmer in the frameworks
> >> > > * Test Segmentation Module
> >> > > * Prototype training Module
> >> > > * Collect Training and Test Data
> >> > > * Conduct Training
> >> > > * Conduct Testing
> >> > > * Post Processing
> >> > >
> >> > > etc.
> >> > >
> >> > > * If possible, I would like to ask you for any explanation or more
> >> useful
> >> > > resource for this project.
> >> > >
> >> > > Best regards,
> >> > >
> >> > > NETH Sovathena
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >
> >
> >
> >
>
>
>
> --
> Regards,
> Rajesh Pandey
> Researcher and Developer in Nepali OCR Project
> PAN Localization Project, Nepal
> Madan Puraskar Pustakalaya
> Patan Dhoka, Lalitpur
> Phone: 977-1-5521393, Fax: 977-1-5536390
>




--
N. Sovathena
--------------------------------------------------------------
PAN Localization Cambodia (PLC) of IDRC
Mobile Phone: + 855 17 719 326
Office Phone : + 855 11 811 947
Skype: vathena007

#459 From: "kruyvanna" <kruyvanna@...>
Date: Fri Jun 26, 2009 9:09 am
Subject: Re: NEED HELP for OCR Project
kruyvanna
Offline Offline
Send Email Send Email
 
Dear Sovathena,

I have tried to train tesseract with Khmer language.
It's just a trivial test of 3 characters.
So u might want to train it for the complete symbols.

here is the link:
http://vannait.blogspot.com/2009/06/how-to-train-tesseract-ocr.html

Cheers,
Kruy Vanna
GITS, Waseda University.


--- In PANLocalization@yahoogroups.com, "Rajesh Pandey" <pandey.com.np@...>
wrote:
>
> Dear Neth you are always welcome.
>
>
> On 6/10/08, Vathena <nethsovathena@...> wrote:
> >
> > Dear Rajesh Pandey,
> >
> > Thanks for your guide to me.
> >
> > Regards,
> >
> > NETH Sovathena
> >
> > On Mon, Jun 9, 2008 at 1:36 PM, Rajesh Pandey <pandey.com.np@...>
> > wrote:
> >
> >> Dear Neth,
> >> I did a small research on Khmer language, installed Catalan Unicode for
> >> Khmer script and found out that the words don't seem to be segmented. Khmer
> >> characters seem quite similar to the Thai characters.
> >>
> >>
> >>
> >> 1. Segmentation involves segmentation of whole document into lines.
> >> 2. Segmentation of lines into words.
> >> 3. Segmentation of words into characters.
> >>
> >> For most ocrs (eg: English OCR, Nepali OCR)
> >>
> >> They work in a top down approach to segment:
> >> *Document -> lines -> words-> characters*
> >>
> >> *For Khmer OCR*
> >> However it looks like you have to approach in this way:
> >> *Document -> lines -> characters*
> >>
> >>
> >> *Character segmentation:*
> >> You have some advantages over Nepali/Devanagari characters:
> >> You don't have to worry much about character segmentation, because Khmer
> >> characters seem to be already segmented.
> >> In our case we have to put an extra effort on segmenting characters
> >> because Nepali/Devanagari characters are joined together in a word.
> >>
> >> *Word segmentation:*
> >> My preliminary research shows that Khmer words are not segmented. Meaning
> >> I did not find spaces between the words. Rather found long sequence of
> >> characters and the whole sentence has a bunch of characters. The speakers
> >> have segmentation according to their syllable or so.
> >> So may be you need to add some more algorithms for word segmentation, or
> >> use a spellchecker and / or grammar checker at the end.
> >>
> >>
> >> The output will be pretty good because there are no spaces between the
> >> words. The input does not have any spaces, so there will not be any spaces
> >> in the output. I guess that will not be a problem.
> >>
> >> Initially you might think of giving a try with Tesseract ocr. I am sure
> >> you will get pretty good results once you have trained.
> >> The homepage for tesseract-ocr is http://code.google.com/p/tesseract-ocr
> >> You might also subscribe to tesseract google groups :
> >> http://groups.google.com/group/tesseract-ocr
> >>
> >> Now good luck with training tesseract-ocr. I think after trying this once
> >> will bring a clear picture of an overall OCR.
> >>
> >>
> >>
> >>
> >>
> >> --- In PANLocalization@yahoogroups.com, "Bal Krishna Bal"
> >> <balkrishna7bal@> wrote:
> >> >
> >> > Dear Neth,
> >> > I have forwarded your email to the Nepali OCR Team and hopefully you
> >> will
> >> > receive a corresponding response very soon.
> >> > Regards,
> >> > Bal Krishna
> >> >
> >> >
> >> > On Mon, Jun 2, 2008 at 1:39 PM, Vathena nethsovathena@ wrote:
> >> >
> >> > > Dear All,
> >> > >
> >> > > My name is NETH Sovathena, a new Software Developer at PAN
> >> Localization
> >> > > Cambodia of IDRC.
> >> > > Now I am responsible for OCR (Optical Character Recognition) project.
> >> > >
> >> > > Now I write this email to all of you for asking some help.
> >> > >
> >> > > I am really difficult with my project--OCR. It is a complicated one
> >> for me
> >> > > while I am a new Software Developer and working with it.
> >> > > After I read any documents related to OCR, I have basic understanding
> >> and
> >> > > know the process of OCR such as Preprocessing, Segmentation, Feature
> >> > > Extraction, Recognition, and Post processing.
> >> > >
> >> > > Currently, I am doing on step " Understanding OCR ", and now focusing
> >> on
> >> > > SEGMENTATION. I try to find and search for Algorithm used for OCR, but
> >> I
> >> > > don't understand and do not find out any more documents and algorithm
> >> yet.
> >> > >
> >> > > Moreover, I do not understand each task that I need to do for this
> >> project
> >> > > such as:
> >> > >
> >> > > * Study OCR Framework
> >> > > * Document scope of OCR (font, sizes, styles, etc.)
> >> > > * Develop Segmentation Strategy
> >> > > * Develop Segmentation Module for Khmer in the frameworks
> >> > > * Test Segmentation Module
> >> > > * Prototype training Module
> >> > > * Collect Training and Test Data
> >> > > * Conduct Training
> >> > > * Conduct Testing
> >> > > * Post Processing
> >> > >
> >> > > etc.
> >> > >
> >> > > * If possible, I would like to ask you for any explanation or more
> >> useful
> >> > > resource for this project.
> >> > >
> >> > > Best regards,
> >> > >
> >> > > NETH Sovathena
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >
> >
> >
> >
>
>
>
> --
> Regards,
> Rajesh Pandey
> Researcher and Developer in Nepali OCR Project
> PAN Localization Project, Nepal
> Madan Puraskar Pustakalaya
> Patan Dhoka, Lalitpur
> Phone: 977-1-5521393, Fax: 977-1-5536390
>

#458 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Wed Jun 10, 2009 6:00 am
Subject: FW: Google Translator Toolkit
sarmad001
Offline Offline
Send Email Send Email
 

 

via Google Blogoscoped by Tony Ruscoe on 6/9/09

 

Google Translator Toolkit is a new tool being launched today to help translators organize their work and benefit from shared translations, glossaries and translation memories, the Google China Blog reports (English translation by Google).

Evidence that Google was working on a service like this originally surfaced in August 2008 when references to Google Translation Center appeared in Google’s robots.txt file. At the time, the service was only available to Trusted Testers and most of the pages and screenshots were quickly taken offline. Since those screenshots were produced, it's clear that a lot of changes have been made to the tool.

The Translation Process

Image removed by sender.
The Google Translator Toolkit Workbench, showing side-by-side editing of Wikipedia's Google article.

For those not familiar with standard translation processes, a professional translator is likely to use a Computer-aided translation (CAT) tool to help identify and extract snippets of text for translation from various file types.

Google Translator Toolkit currently only allows users to upload HTML, Microsoft Word, OpenDocument Text, Rich Text and Plain Text documents up to 1MB for translation. Alternatively, it's possible to enter the URL of a file on the web, select a Wikipedia article or a Knol for translation.

Once uploaded or selected, files can be translated using the Workbench interface which shows the source text and the target language translations either side-by-side or above and below each other.

Image removed by sender.
Previously translated segments from the translation memory are suggested and can be rated by yourself and others.

One good reason to share translations with others is so that they can be reviewed for consistency and style. Google allows users to rate translated segments, presumably for style and accuracy. Comments can also be added to the target document, which is especially useful when collaborating with other users.

Translation Memories

Image removed by sender.
In addition to the global translation memory, users can also create and share their own TMs.

Many CAT tools allow the translator to store their human translations in a database called a translation memory. The memory can then be used to help with future translation projects by checking to see whether a certain word, phrase, sentence or segment has been translated before. Even if it's not exactly the same phrase, the translation memory can be used to suggest what's called a fuzzy match, often indicated by a percentage to reflect how similar the text is.

When translating Wikipedia articles and Knols, the translations are stored in a global, shared translation memory that's available to everyone by default. That means previously translated phrases from these articles are stored and available for use by other translators using the service, so if they ever find themselves translating the same piece of text, Google will automatically populate the interface with the previous translations to help save time.

Google's support article explains the process:

Pretranslating your documents

When you upload a document into Google Translator Toolkit, we automatically 'pretranslate' your document as follows:

  1. We divide your document into segments, usually sentences, headers, or bullets.
  2. We search all available translation databases for previous human translations of each segment.
  3. If any previous human translations of the segment exist, we pick the highest-ranked search result and 'pretranslate' the segment with that translation.
  4. If no previous human translation of the segment exists, we use machine translation to produce an 'automatic translation' for the segment, without intervention from human translators.

We realize for some translators, pre-filling with machine translation may actually slow, not speed up, the translation process. In such cases, you can change your settings to pre-fill the segment with the source text, so you can type over the source text instead of making corrections to automatic translation.

Uploaded documents can benefit from using this global TM too, but if users don't want to share their translations with everyone, they can create their own translation memories and control exactly which users can make additions and rate translations.

Translators already using CAT tools may have translation memories stored in the Translation Memory eXchange (.tmx) open standard XML format. Google allows translations contained in those TMs to be uploaded and added to existing Google Translator Toolkit TMs, providing they're no larger than 50MB and confirm to TMX 1.0 or higher.

TMs other than the global TM can also be searched for previously translated segments which can then be rated without opening a translation document.

Glossaries

Glossaries are collections of words and phrases with definitions and notes associate with them. They are often used in the translation process to help choose which phrase is most appropriate and to maintain consistency between translations of technical or specialty subjects. Google Translator Toolkit requires CSV format glossaries to be uploaded (it's not possible to create one from scratch) which will then be automatically searched for terminology in the segments that are currently being translated.

Learn More

For a really quick overview of some of these features in action, you can watch this YouTube video:

How could this be useful to Google?

A machine translation of the Google China Blog explains, "Google's mission is to organize the world's information and make it universally accessible and useful. Translation of information, in our view is the key to access to information."

Google has been working on a statistical machine translation system for a few years now, which it started to use for Google Translate instead of Systran in October 2007. Since then it's been slowly integrating translation into many of its services, including Google Toolbar, Google Talk, Google Reader, Gmail, and YouTube. There's even an AJAX Language API which anyone can use to build upon.

In my opinion, this latest tool has clearly been designed to help improve Google's translation offerings. One thing on which statistical machine translation relies is aligned translations. In very simple terms, to help train a statistical machine translation system, text in one language is fed into the system alongside the same text in another language. Will enough text, the system can start to learn how certain phrases should be translated. Without aligned translations, there's no easy way to know exactly which sentence in the source document relates to the translated version. That's where translation memories are very useful; they contain aligned translations.

There are literally thousands of Wikipedia articles being translated all the time, but the translations aren't usually maintained in a translation memory. Through using Google Translator Toolkit, translators could benefit from seeing previously translated text from the global translation memory and, in return, Google could clearly benefit from translators using its interface to translate any content that's then stored as aligned translations in their global TM, which it can ultimately use to enhance its statistical machine translation system and improve the translations that are provided to end-users of any service using Google Translate.

And as the global TM grows, it might even be possible for end-users to get near-to-human-quality for translations of their documents, websites, blog posts, emails and tweets instantly.

[Thanks TOMHTML!]

Disclaimer: I am an employee of SDL, a translation company that provides translation services and software.

[By Tony Ruscoe | Origin: Google Translator Toolkit | Comments]


[Advertisement] Google books at eBay: background info on Google, AdWords, AdSense, Blogger and more...

 

 

 

Things you can do from here:

 

 



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#457 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Sat May 30, 2009 1:55 pm
Subject: FW: Call for papers - ICTD - New Delhi -11&12 March - New Delhi
sarmad001
Offline Offline
Send Email Send Email
 
FYI

-----Original Message-----
From: P. Vigneswara Ilavarasan [mailto:evignesh@...]
Sent: Wednesday, May 27, 2009 6:06 PM
To: Katie Gartner
Subject: Call for papers - ICTD - New Delhi -11&12 March - New Delhi

Dear All:
Apologies for cross posting.
Regards,
Vignesh.

http://www.iitd.ac.in/events/ICTD2010/

ICTs and Development: An International Workshop for Theory, Practice, &
Policy
11-12 March, 2010
Indian Institute of Technology Delhi, New Delhi
Sponsored by International Development Research Centre, Canada

Unpublished, original empirical papers are invited for the forthcoming
international workshop on ICTs and Development:
An International Workshop for Theory, Practice, & Policy to be
conducted by the Indian Institute of Technology (IIT), New Delhi,
India, during 11-12 March 2010.

The workshop aims to provide a forum for scholars to share their
empirical research with academic experts, policymakers, and
activists from the regional and international development community.
Papers should examine how mobile phones, computers, and the Internet
influence the empowerment of marginal individuals and communities,
including whether ICTs create and enhance livelihood opportunities for
people in the developing world.

Papers should be in the range of 5,000-8,000 words (including abstract
and bibliography) and should include a clear discussion of
the implications of the findings for development policy and/or practice.

No more than twelve papers will be selected by the workshop organizers
for presentation.The first author of each paper chosen will be given
air fare and lodging/meals.

The workshop is part of the project, ICTs and Urban Micro Enterprises:
Identifying and Maximizing Opportunities for Economic Development, and
is supported by the International Development Research Centre, Canada.

The organizers are committed to finding an appropriate publication
venue for all papers accepted for the workshop.

Deadlines:
Submission of manuscripts: 1st October 2009
Announcement of results: 1st December 2009
Submission of final version of the paper: 1st February 2010

For submission of manuscripts and other enquiries, please write to

ICTD2010@...

Workshop Organizers
Dr. P. Vigneswara Ilavarasan (IIT Delhi)
Prof. Mark R. Levy (Michigan State University)


__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#456 From: Wunna Ko Ko <wunnakoko@...>
Date: Thu May 14, 2009 1:48 am
Subject: OpenOffice.org 3.1 include Burmese Locale & Language Pack
wunnakoko
Offline Offline
Send Email Send Email
 
Dr. Samad,

We had successfully included Burmese Locale and Language Pack at OpenOffice.org
3.1 which is published officially on 7th May.

Although we had discussed a lot for providing support to those who work hard, it
was unfortunate for them to work on volunteer basis.

The main contributors are Keith Stribley of www.thanlwinsoft.org and Myo Aung.

Although Myanmar NLP or MCF got a lot of chance to host this project, I feel
sorry for them for not involving on this World's first ever official software
with Burmese Locale and Language pack.

I am still hopeful that PAN Localization can provide help on further
development.

With Best Regards,

Wunna

#455 From: Dr Muhammad Afzal <afzal537@...>
Date: Wed May 13, 2009 4:38 pm
Subject: Re: [PAN Localization] The need for protecting the diversity of languages
afzal3067
Offline Offline
Send Email Send Email
 
Dear Dr Sarmad,

AA

Thank you very much, a lot needs to be done Pakistani Languages.

Regards
afzal

On Wed, May 13, 2009 at 9:03 AM, Sarmad Hussain <sarmad.hussain@...> wrote:



> K. David Harrison discusses how language death leads to intellectual
> impoverishment in all fields of science and culture. Watch as he
> details efforts to sustain, value and revitalize linguistic diversity
> worldwide.

http://www.poptech.org/popcasts/popcasts.aspx?lang=&viewcastid=245

regards,
Sarmad


__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com




#454 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Wed May 13, 2009 4:03 am
Subject: The need for protecting the diversity of languages
sarmad001
Offline Offline
Send Email Send Email
 
> K. David Harrison discusses how language death leads to intellectual
> impoverishment in all fields of science and culture. Watch as he
> details efforts to sustain, value and revitalize linguistic diversity
> worldwide.

http://www.poptech.org/popcasts/popcasts.aspx?lang=&viewcastid=245


regards,
Sarmad


__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#453 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Wed May 13, 2009 2:45 am
Subject: FW: AVIOS student programming contest: show off your VoiceXML skills
sarmad001
Offline Offline
Send Email Send Email
 
FYI

-----Original Message-----
From: www-voice-request@... [mailto:www-voice-request@...] On Behalf
Of James Larson
Sent: Tuesday, May 12, 2009 10:56 PM
To: www-voice
Subject: AVIOS student programming contest: show off your VoiceXML skills

Wednesday, May 6, 2009. Today the Applied Voice Input/Output Society
(AVIOS) announced their fourth annual student speech application contest
sponsored by AT&T, Cepstral, I6Net, Loquendo, Microsoft, and Voxeo.
Applications must involve speech input and/or output, but may be pure
speech or multimodal. Cash and/or equipment prizes valued at over $1000
will be awarded to teams of student programmers who design and create
applications judged to be robust, useful, creative, innovative, and user
friendly.

The contest encourages students to develop applications using speech
technologies such as automatic speech recognition and text to speech
synthesis and to combine them with other modalities. This year, students
may use any of a variety of platforms including AT&T Speech Mashups,
Cepstral VoiceForge TTS service, CMU's RavenClaw/Olympus, Google
Android, I6net VXI*, Loquendo VoxNauta Platform, Lumenvox Speech
Engine Standard License, Opera, Voxeo Prophecy, and Voxeo Tropo.

Students anywhere in the world can submit their creative and innovative
applications to be judged by speech application experts. The contest
also provides a forum for students to show what they can do with the
power of speech applications  For more information and the contest entry
form, go to http://www.avios.org.




__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#452 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Sat May 9, 2009 11:15 am
Subject: FW: [nlpai2005] ICON-2009 CFP: 7th International Conference on NLP
sarmad001
Offline Offline
Send Email Send Email
 

 

 

From: nlpai2005@yahoogroups.com [mailto:nlpai2005@yahoogroups.com] On Behalf Of nlpai2005
Sent: Saturday, May 09, 2009 12:09 PM
To: nlpai2005@yahoogroups.com
Subject: [nlpai2005] ICON-2009 CFP: 7th International Conference on NLP

 




ICON-2009: 7th INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING
Hyderabad, India

14- 17 December 2009

Organized by
NLP Association, India
Central Institute of Indian Languages, Mysore, India
International Institute of Information Technology, Hyderabad, India
University of Hyderabad, Hyderabad, India

FIRST CALL FOR PAPERS

The seventh International Conference on Natural Language Processing(ICON 2009)
will be held at University of Hyderabad, Hyderabad, India during 14-17
December, 2009. The ICON conference series is a forum for promoting interaction
among researchers in the field of Natural Language Processing in India and
abroad. This year ICON-2009 will be co-located with 31st All India Conference
of Linguistics (AICL31)

1. TOPICS:
Papers are invited on substantial, original and unpublished research
on all aspects of Natural Language Processing, with a particular focus on
languages, issues, and applications relevant to India. The areas of interest
include, but are not limited to:

Morphology Parsing
Phonology Word Sense Disambiguation
Syntax Machine Translation
Semantics Information Retrieval
Discourse Text Summarization
Pragmatics Question Answering
Statistical methods Dialog Systems
Knowledge-based methods Performance Evaluation
Annotated Corpora Speech Corpora
Lexical Resources Speech Recognition
Ontology Speech Synthesis
POS tagging

1.1 SPECIAL TRACK:
ICON-2009 will also have a separate track on Linguistic models in NLP. Purely
theoretical papers in linguistics which have implications for computational
linguistics are also invited under this track.

2. FORMAT OF SUBMISSION:
Papers in English, not exceeding 10 pages, should be submitted at
www.iiit.ac.in/icon2009. One can also submit hard copy (four copies) under
special circumstances. Papers should include an abstract of about 100-200
words. Papers outside the specified length are subject to rejection without
review. Please see the style file at
www.aclweb.org/downloads/acl-ftp/Styfiles/Proceedings/

BLIND REVIEW:
Kindly ensure that authors' names and affiliations are given only on a separate
cover sheet. Papers in electronic form can be in plain text, Postscript, PDF,
Latex or Microsoft Word (RTF only). If your paper contains text of languages
other than English, please attach relevant font files along with your
submission.

3. CALL FOR TUTORIALS/WORKSHOPS:
Proposals are invited for pre-conference tutorials. Tutorials/Workshops can be
of half-day or full-day duration. The proposal should be presented in the form
of a 200-word abstract, one page topical outline of the content, description of
the proposers and their qualifications relating to the tutorial content.

Send tutorial/Workshop proposals to the ICON-2009 Secretariat. For further
information, please refer to the Conference URL or contact the ICON-2009
Secretariat.

4. NLP TOOLS CONTEST: Parsing
Efficient Indian language (IL) parsing still remains an open problem. One
reason for this is the unavailability of annotated corpora to experiment.
This contest aims to bring together researchers working/interested in the
area of IL parsing to explore techniques that can improve the present
accuracies, by providing sufficient annotated data. This contest, however,
will only focus on dependency parsing.

CONTEST:
Participants will be provided training, development and testing data to report
the efficiency of their dependency parsers. Languages such as Hindi, Bangla,
Oriya, Marathi, etc. will be explored. Parser efficiency will be measured in
terms of standard measures such as Unlabelled attachment accuracy and Labelled
attachment accuracy. Shortlisted candidates will present their techniques and
results work in a special session at ICON. Note that it is not necessary that
the participating parser be a statistical one, other types such as grammar
driven parsers or hybrid parsers can also participate.

The Contest will have two prizes :
FIRST PRIZE: Rs.7500/-
SECOND PRIZE: Rs.5000/-

Resources: To be Announced

5. STUDENT PAPER COMPETITION IN LANGUAGE TECHNOLOGIES:
ICON-2009 announces STUDENT PAPER COMPETITION in two tracks: Track I : NLP (All
areas) Track II : Linguistics (Morphology, Syntax and Semantics)

Papers may be submitted under the link on the web page. Prizes will be awarded
in each track for up to two papers based on original work carried out. The
prizes are::

FIRST PRIZE: Rs.7500/-
SECOND PRIZE: Rs.5000/-

The short-listed papers in each track will be invited for presentation in a
special session in ICON-2009 conference. Registration, domestic travel and
subsistence expenses will be provided by the conference organizers for one
author of each paper. Up to two winners will be offered summer fellowships at
major NLP Centres in India. For any clarifications, contact Student Paper
Competition Chair on icon09student@....

6. IMPORTANT DATES:
Paper registration deadline Aug 1, 2009
Paper submission deadline Aug 1, 2009
Paper acceptance notification Sep 15, 2009
Camera ready copy due Oct 20, 2009

Tutorial/Workshop proposals due Aug 7, 2009
Tutorial acceptance notification Aug 16, 2009
Lecture Materials for tutorial Nov 15, 2009

NLP Tools Contest registration
deadline Jul 21, 2009
Student Paper Competition
submission deadline Aug 7, 2009

7. COMMITTEES:

ADVISORY COMMITTEE CHAIR
Aravind K Joshi, University of Pennsylvania, USA

CONFERENCE GENERAL CHAIR
Rajeev Sangal, IIIT, Hyderabad, India

PROGRAMME COMMITTEE CHAIR
Dipti Misra Sharma, IIIT, Hyderabad
Vasudeva Varma, IIIT, Hyderabad

TOOLS CONTEST CHAIR
Samar Husain, IIIT, Hyderabad

STUDENT PAPER COMPETITION CHAIR
Sudeshna Sarkar, IIT, Kharagpur

ORGANIZING CHAIR
R K Bagga, IIIT, Hyderabad, India
Ashish Jacob Thomas, University of Hyderabad

8. CONTACT INFORMATION
ICON-2009 Secretariat
Language Technologies Research Centre
International Institute of Information Technology
Gachibowli, Hyderabad - 500 032, India
Ph: +91-40-2300 1412; Fax: +91-40-6653 1413
e-mail: icon2009@...
www.iiit.ac.in/icon2009



__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#451 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Fri May 1, 2009 6:48 pm
Subject: FW: International Conference on Asian Languages Information Processing IALP2009
sarmad001
Offline Offline
Send Email Send Email
 
FYI

-----Original Message-----
From: Lua Kim Teng [mailto:luakt@...]
Sent: Thursday, April 30, 2009 10:06 AM
To: sarmad.hussain@...
Subject: International Conference on Asian Languages Information Processing
IALP2009


IALP 2009 Call for Papers
International Conference on Asian Languages Processing 2009 (IALP2009)
Dec 7-9, 2009, Singapore
Jointly organized by Chinese and Oriental Languages Information Processing
Society (COLIPS)  and IEEE Singapore Computer Chapter (IEEE Singapore CC)
http://www.colips.org/conference/ialp2009
The International Conference on Asian Language Processing (IALP) is a series
of conferences with unique focus on Asian Languages Processing. The
conference aims to advance the science and technology of all the aspect of
Asian Language Processing by providing a forum for researchers in different
fields of language studies all over the world to meet. The first meeting of
IALP held in Singapore in 1986 and was called ICCC (International Conference
on Chinese Computing). This meeting initiated the studies of Chinese and
oriental languages processing in Singapore and resulted in the formation of
COLIPS in Singapore in 1988, and later, the publication of the journal
titled Journal of Chinese Language and Computing since 1991.

Last year, IALP 2008 was held in Chiang Mai University, Thailand and the
proceedings were indexed by ISTP/ISI. The IALP 2009 will be held in
Singapore and co-organized by COLIPS and IEEE Singapore Computer Chapter.
The proceedings will be published by IEEE CPS (Conference Publication
Services). IEEE CPS will submit it for indexing in EI, ISTP/ISI and Current
Contents on Diskette (http://www.computer.org/portal/site/cscps/index.jsp)
Important Dates:
7 June 30, 2009     Paper submissions due
7 July 31, 2009     Paper notification of acceptance
7 Aug 31, 2009     Camera-ready full papers due and registration
7 Dec 7-9, 2009    Conference dates
Topics of interest:
Papers are invited on substantial, original and unpublished research in all
aspects of Asian Language Processing, including, but not limited to:
7 Input and output of large character sets of Asian languages
7 Typesetting and font designs of Asian languages
7 Asian character encoding and compression
7 Multimodal representations and processing
7 Voice input and output
7 Phonology and morphology
7 Lexical semantics and word sense
7 Grammars, syntax, semantics and discourse
7 Word segmentation, chunking, tagging and syntactic parsing
7 Word sense disambiguation, semantic role labeling and semantic
parsing
7 Discourse analysis
7 Language, linguistic and speech resource development
7 Evaluation methods and user studies
7 Machine learning for natural language
7 Text analysis, understanding, summarization and generation
7 Text mining and information extraction, summarization and retrieval
7 Text entailment and paraphrasing
7 Text Sentiment analysis, opinion mining and question answering
7 Machine translation and multilingual processing
7 Linguistic, psychological and mathematical models of language,
computational psycholinguistics, computational linguistics and mathematical
linguistics
7 Language modeling, statistical methods in natural language
processing and speech processing
7 Spoken language processing, understanding, generation and
translation
7 Rich transcription and spoken information retrieval
7 Speech recognition and synthesis
7 Natural language applications, tools and resources, system
evaluation
7 Asian language learning, teaching and computer-aided language
learning
7 NLP in vertical domains, such as biomedical, chemical and legal text
7 NLP on noisy unstructured text, such as email, blogs, and SMS

7 Special hardware and software for Asian language computing
IALP2009 is now organized into 6 tracks:
1. Linguistics and Language Studies
2. Computational Linguistics
3. Language Information Processing (Information extraction, analysis
etc)
4. Language Technologies (Input, output, font, recognition, synthesis
etc)
5.  Computer-aided Language Learning
6. Machine Translation
Submissions:

Submissions must describe substantial, original, completed and unpublished
work. Wherever appropriate, concrete evaluation and analysis should be
included. Submissions will be judged on correctness, originality, technical
strength, significance, relevance to the conference, and interest to the
attendees. Each submission will be reviewed by three program committee
members. Accepted papers will be presented in one of the oral sessions or
poster sessions as determined by the program committee. Papers must be
written in English.
All submissions must be in PDF file only and are electronic using paper
submission software at:<https://www.softconf.com>.
Format:
Full paper submissions should follow the IEEE Proceedings' format without
exceeding six (6) pages including references.  We strongly recommend the use
of the LaTeX style files or Microsoft Word style files tailored for this
year's conference according to IEEE Proceedings' format, which will be
available on the conference website soon. Submissions must conform to the
official style guidelines, and they must be electronic in PDF.
As the reviewing will be blind, manuscripts must not include the authors'
names and affiliations.  Authors should ensure that their identities are not
revealed in any way in the paper.  Papers that do not conform to these
requirements will be rejected without review.


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#450 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Fri May 1, 2009 5:56 pm
Subject: FW: DEADLINE EXTENSION : The 7th Workshop on Asian Language Resources (ALR2009)
sarmad001
Offline Offline
Send Email Send Email
 

Please submit your papers here.  Good forum for presentation.

 

Regards,
Sarmad

 

From: Virach Sornlertlamvanich [mailto:virach@...]
Sent: Friday, May 01, 2009 9:23 AM
To: Hammam Riza
Cc: Ammara Shabbir; 'Hasnat'; 'Mumit Khan'; 'Md. Masum Billah'; labony@...; 'Pema Choejey'; Tenzin Dendup; dnamgyelster@...; 'Leang'; 'KHEM Sochenda'; Lseangmeng@...; 'Noy Shoung'; 'Noy Shoung'; Chea Sok Huor; 'Ⱥŵ'; nmzx@...; Sarmad Hussain; Sardjoeni Moedjiono; 'Chooi Ling Goh'; a.altangerel@...; a.altangerel@...; batpurev@...; Purev J; purevj@...; 'munkhzul'; munkhzul_77@...; 'Bal Krishna Bal'; 'Amar Gurung'; rajendrapoudel@...; huda.sarfraz@...; 'Mudasir Mustafa'; 'Ahmed Muaz'; Emmanuel C. Lallana, PhD; 'Mayette Macapagal'; borraa(Borra,Allan B); Dwayne Bailey; 'Ruvan Weerasinghe'; dlh@...; 'Wasin Sinthupinyo'; 'Chai Wutiwiwatchai'; 'Regional Secretariat'; Valaxay DALALOY; 'Phonpasit'; oskar@...; teduh@...
Subject: DEADLINE EXTENSION : The 7th Workshop on Asian Language Resources (ALR2009)

 

Dear Colleagues,

                    SUBMISSION DATELINE EXTENDED TO 8 MAY 2009

------------------ APOLOGIES FOR MULTIPLE POSTINGS --------------------

                                  SECOND CALL FOR PAPERS

          The 7th  Workshop on Asian Language Resources (ALR2009)

                                               6-7 August 2009

                Singapore International Convention & Exhibition  Centre,
                                         Suntec City, Singapore

(AN OFFICIAL  ACL-IJCNLP 2009 WORKSHOP, http://www.acl-ijcnlp-2009.org/main/workshops.html )


Description

Language resources play an important role as corpus-based, stochastic, and learning approaches are introduced to natural language processing research. Many research units put great efforts on developing corpora for their particular purpose, and some even focus on compiling various kinds of language resources. Asia, the land of language variation, are suffering from the shortage of sharing the resource and cross language problem solving experience. There are several reports referring to the success of constructing and using corpora in many dimensions. But however, there are few efforts in establishing common formats or frameworks for handling these languages. The re-organizing the existing resources and finding for the guideline in corpus development become significant issue in the current research. The workshop is organised under the Asian Language Resources Committee (ALRC) of AFNLP aiming at the following goals.

  * To investigate the situation of Asian Language Resources, and to
    make a catalog of the result of this investigation
  * To investigate and discuss the problems related to the standards
    and specification on creating and sharing various levels of
    language resources
  * To promote communications between developers and users of various
    language resources in order to fill the gap between language
    resources and practical applications
  * To introduce the status of Asian language resources to researchers
    in other regions

To achieve these goals, we call for the technical (and non-technical) papers concerning, but not limited to the following issues.

  * Infrastructure for constructing and sharing language resources
  * Meta data for resource classification and discovery
  * Exchange and annotation schemata
  * Exchange formats
  * Standards or specifications for language resources
  * Standards or specifications for content management
  * Language resources for basic NLP tasks (such as word segmentation,
    named entity recognition, syntactic analysis, semantic analysis,
    discourse analysis, speech recognition, speech synthesis, etc.)
  * Language Resources for HLT applications (such as information
    retrieval, information extraction, question answering, machine
    translation, speech translation, etc.)
  * Text corpora, speech corpora
  * Lexicons
  * Grammars
  * Machine-readable dictionaries
  * Ontology
  * Strategies and priorities for EU-US and Asian cooperation
  * Strategies for collaboration with international/regional public
    orgnizations, such as UNESCO, ACCU, etc.
  * Licensing and copyright issues


Important Dates

Paper submission due     8 May, 2009 [extended]
Demo session requests due     8 May, 2009
Notification of acceptance     1 June, 2009
Camera-ready papers due     7 June, 2009
ACL-IJNLP 2009 Workshops     6-7 August, 2009


Submission Information

Submissions must describe substantial, original, and unpublished work. Submissions will be judged on correctness, originality, technical strength, significance and relevance to the conference, and interest to the attendees. Full papers may consist of up to eight (8) pages in total (references included) and will be presented orally. The deadline for paper submission is May 8, 2009 (GMT + 8) [extended].
The official style files for ACL/IJCNLP 2009 are available at:
http://www.acl-ijcnlp-2009.org/main/authors/stylefiles/.
The workshop submissions should use the same formatting guidelines. As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Papers that do not conform to these requirements will be rejected without review.
Submission is electronic using paper submission software at:
https://www.softconf.com/acl-ijcnlp09/ALR/

Program Committee

  * Hammam Riza (co-chair) - IPTEKnet-BPPT, Indonesia
  * Virach Sornlertlamvanich (co-chair) - NECTEC, Thailand
  * Pushpak Bhattacharyya - IIT-Bombay, India
  * Thatsanee Charoenporn - NECTEC, Thailand
  * Key-Sun Choi - KAIST, Korea
  * Chu-Ren Huang - Hong Kong Polytechnic University, Hong Kong, and
    Academia Sinica, Taiwan
  * Sarmad Hussain - National University of Computer & Emerging
    Sciences, Pakistan
  * Hitoshi Isahara - NICT, Japan
  * Shuichi Itahashi - NII, Japan
  * Lin-Shan Lee - National Taiwan University, Taiwan
  * Haizhou Li - I2R, Singapore
  * Chi Mai Luong- Institute of Information Technology, Vietnamese
    Academy of Science and Technology, Vietnam
  * Yoshiki Mikami - Nagaoka University of Technology, Japan
  * Sakrange Turance Nandasara - University of Colombo School of
    Computing, Sri Lanka
  * Thein Oo - Myanmar Computer Federation, Myanmar
  * Phonpasit Phissamay - NAST, Lao PDR
  * Oskar Riandi - ICT Center-BPPT, Indonesia
  * Rachel Edita O Roxas - De La Salle University, Philippines
  * Kiyoaki Shirai - JAIST, Japan
  * Myint Myint Than - Myanmar Computer Federation, Myanmar
  * Takenobu Tokunaga - Tokyo Institute of Technology, Japan
  * Chiuyu Tseng - Academia Sinica - Taiwan
  * Chai Wutiwiwatchai - NECTEC, Thailand

------------------------------------------------------------------------------------




__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


#449 From: lengieng ing <lengieng_ing@...>
Date: Wed Apr 29, 2009 1:14 am
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way
lengieng_ing
Offline Offline
Send Email Send Email
 
Thanks for the information. We will try that for Khmer as well.

Regards,
LengIeng


From: Ruvan Weerasinghe <arw@...>
To: PANLocalization@yahoogroups.com
Cc: Mr.DulipLakmalHerath <dlh@...>; Viraj Welgama <wvw@...>; Turrance Nandasara <stn@...>; DamithSandaruwan <dsr@...>; HD wijayawardhana <harsha@...>
Sent: Tuesday, April 28, 2009 8:55:10 PM
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way

Thanks for sharing this Chris. Need to get it tried for Sinhala. Others in the list, can we all try this?

Regards,

Ruvan.


Christopher Fynn wrote:

Dear Ruvan and everybody

Without waiting for script specific support from Adobe, by experimenting
a bit I have got Tibetan script working in InDesgn CS3 and above as
follows:

In an OpenType font I put a set of additional lookups under the
combination of "DFLT" script tag and "dflt" language tag. These
additional lookups use only 'generic' OpenType features (i.e. those used
in Adobe's Latin OT fonts) such as ccmp, rlig, liga, calt, kern. (the
init, medi, fina, mark & mkmk features probably also work since some
Adobe fonts use them - but I haven't tried these yet) - as well as the
regular set of lookups under the script specific set of features
Microsoft's Uniscribe uses.

In other words my Tibetan font now has two sets of lookups - one under
DFLT script tag and dflt language tag - the other set under tibt script
tag and dflt language tag.

The two sets of lookups are very similar even though the OT་་feature
sets are different. This required some experimentation and I had to
modify a few things in the font and include some additional ligatures to
get it to work. The key thing is that Adobe OpenType engine recognises
lookups under the DFLT script tag (must be capitalized) and applies them
irrespective of the Unicode range of the initial characters. MS
Uniscribe totally ignores features and lookups under the DFLT script tag
and applies script specific sets of features to specific Unicode ranges.

My guess is that one can get OpenType fonts for other complex scripts
working in InDesign and other Adobe apps like this - without waiting for
Adobe to add specific support your script.

Interestingly InDesign CS3 already recognises the correct line break
opportunities for Tibetan.

With best regards.

- Chris

Ruvan Weerasinghe wrote:
> We left this discussion sometime ago, but I thought let's start a
> community kind of pressure group for getting Complex Script support in
> Adobe.
>
> Please don't dismiss this as a fruitless exercise - let's try it!
>
> Here's the URL: http://www.adobeforums.com/webx/.59b54384
>
> You need to register - I did with my gmail address, but you seem to be
> able to even put fake address since there is no authentication. But
> maybe, just maybe, Adobe may reply me even if not in this public forum...
>
> Please do this - it takes 5 mins - and get others in the region to do so...
>
> Ruvan.



#448 From: Ruvan Weerasinghe <arw@...>
Date: Tue Apr 28, 2009 1:55 pm
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way
arweerasinghe
Offline Offline
Send Email Send Email
 
Thanks for sharing this Chris. Need to get it tried for Sinhala. Others in the list, can we all try this?

Regards,

Ruvan.


Christopher Fynn wrote:

Dear Ruvan and everybody

Without waiting for script specific support from Adobe, by experimenting
a bit I have got Tibetan script working in InDesgn CS3 and above as
follows:

In an OpenType font I put a set of additional lookups under the
combination of "DFLT" script tag and "dflt" language tag. These
additional lookups use only 'generic' OpenType features (i.e. those used
in Adobe's Latin OT fonts) such as ccmp, rlig, liga, calt, kern. (the
init, medi, fina, mark & mkmk features probably also work since some
Adobe fonts use them - but I haven't tried these yet) - as well as the
regular set of lookups under the script specific set of features
Microsoft's Uniscribe uses.

In other words my Tibetan font now has two sets of lookups - one under
DFLT script tag and dflt language tag - the other set under tibt script
tag and dflt language tag.

The two sets of lookups are very similar even though the OT་་feature
sets are different. This required some experimentation and I had to
modify a few things in the font and include some additional ligatures to
get it to work. The key thing is that Adobe OpenType engine recognises
lookups under the DFLT script tag (must be capitalized) and applies them
irrespective of the Unicode range of the initial characters. MS
Uniscribe totally ignores features and lookups under the DFLT script tag
and applies script specific sets of features to specific Unicode ranges.

My guess is that one can get OpenType fonts for other complex scripts
working in InDesign and other Adobe apps like this - without waiting for
Adobe to add specific support your script.

Interestingly InDesign CS3 already recognises the correct line break
opportunities for Tibetan.

With best regards.

- Chris

Ruvan Weerasinghe wrote:
> We left this discussion sometime ago, but I thought let's start a
> community kind of pressure group for getting Complex Script support in
> Adobe.
>
> Please don't dismiss this as a fruitless exercise - let's try it!
>
> Here's the URL: http://www.adobeforums.com/webx/.59b54384
>
> You need to register - I did with my gmail address, but you seem to be
> able to even put fake address since there is no authentication. But
> maybe, just maybe, Adobe may reply me even if not in this public forum...
>
> Please do this - it takes 5 mins - and get others in the region to do so...
>
> Ruvan.


#447 From: Christopher Fynn <cfynn@...>
Date: Sun Apr 19, 2009 3:57 pm
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way
christopher_...
Offline Offline
Send Email Send Email
 
Dear Ruvan and everybody

Without waiting for script specific support from Adobe, by experimenting
a bit I have got Tibetan script working in InDesgn CS3 and above as
follows:

In an OpenType font I put a set of additional lookups under the
combination of "DFLT" script tag and "dflt" language tag. These
additional lookups use only 'generic' OpenType features (i.e. those used
in Adobe's Latin OT fonts) such as ccmp, rlig, liga, calt, kern. (the
init, medi, fina, mark & mkmk features probably also work since some
Adobe fonts use them - but I haven't tried these yet)  - as well as the
regular set of lookups under the script specific set of features
Microsoft's Uniscribe uses.

In other words my Tibetan font now has two sets of lookups - one under
DFLT script tag and dflt language tag - the other set under tibt script
tag and dflt language tag.

The two sets of lookups are very similar even though the OT་་feature
sets are different. This required some experimentation and I had to
modify a few things in the font and include some additional ligatures to
get it to work. The key thing is that Adobe OpenType engine recognises
lookups under the DFLT script tag (must be capitalized) and applies them
irrespective of the Unicode range of the initial characters. MS
Uniscribe totally ignores features and lookups under the DFLT script tag
and applies script specific sets of features to specific Unicode ranges.

My guess is that one can get OpenType fonts for other complex scripts
working in InDesign and other Adobe apps like this - without waiting for
Adobe to add specific support your script.

Interestingly InDesign CS3 already recognises the correct line break
opportunities for Tibetan.

With best regards.

- Chris

Ruvan Weerasinghe wrote:
> We left this discussion sometime ago, but I thought let's start a
> community kind of pressure group for getting Complex Script support in
> Adobe.
>
> Please don't dismiss this as a fruitless exercise - let's try it!
>
> Here's the URL: http://www.adobeforums.com/webx/.59b54384
>
> You need to register - I did with my gmail address, but you seem to be
> able to even put fake address since there is no authentication. But
> maybe, just maybe, Adobe may reply me even if not in this public forum...
>
> Please do this - it takes 5 mins - and get others in the region to do so...
>
> Ruvan.

#446 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Thu Apr 16, 2009 9:18 am
Subject: 3 PhD positions in Natural Language Processing and Visualization
sarmad001
Offline Offline
Send Email Send Email
 


Institute for Natural Language Processing, University of Stuttgart and Computer Science Department, University of Stuttgart GERMANY

The Institute for Natural Language Processing (IfNLP) and the Computer Science Department of the University of Stuttgart, Germany, invite applications for three PhD
positions.

IfNLP is one of the leading NLP research institutions worldwide with four professors in different areas of NLP, a research staff of 40 and an undergraduate program in NLP. We offer the opportunity to work on cutting-edge research projects in a dynamic and international research team and up-to-date infrastructure and resources.

3 PhD positions are available immediately in two different projects funded by Deutsche Forschungsgemeinschaft.

SEMISUPERVISED COREFERENCE RESOLUTION 
2 PhD positions
Supervisors: Profs. Gunther Heidemann, Hans Kamp, and Hinrich Schuetze

This project will develop interactive visualization methods for the semi-supervised annotation of large amounts of training data for statistical coreference resolution.

INTERACTIVE VISUAL ANALYSIS OF COMPLEX INFORMATION SPACES
1 PhD position
Supervisor: Prof. Hinrich Schuetze


This project will integrate statistical NLP and user-tailored interactive visual exploration methods and apply them to the analysis of patents.

Candidates should have an excellent university degree in a relevant field of study such as computational linguistics or computer science.

To apply, send your CV in PDF format to sabine (at) ims.uni-stuttgart.de by May 15, 2009. Please use the subject line "PhD positions". You should also provide two references.

The University of Stuttgart is committed to increasing the proportion of women in research and teaching. Qualified women are encouraged to apply.




__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#445 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Wed Apr 15, 2009 12:55 pm
Subject: FW: ICON-2009 CFP: 7th International Conference on NLP
sarmad001
Offline Offline
Send Email Send Email
 
FYI

-----Original Message-----
From: Lexical Resource Egroup [mailto:lr_egroup@...]
Sent: Wednesday, April 15, 2009 4:13 PM
To: undisclosed-recipients:
Subject: ICON-2009 CFP: 7th International Conference on NLP

              ICON-2009: 7th INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE
PROCESSING
                                    Hyderabad, India

                                 14 - 17 December 2009

                                     Organized by
                                 NLP Association, India
                    Central Institute of Indian Languages, Mysore, India
              International Institute of Information Technology, Hyderabad,
India
                        University of Hyderabad, Hyderabad, India


                                 FIRST CALL FOR PAPERS

The seventh International Conference on Natural Language Processing(ICON
2009) will be held at University of Hyderabad, Hyderabad, India during
14-17 December, 2009. The ICON conference series is a forum for promoting
interaction among researchers in the field of Natural Language Processing
in India and abroad. This year ICON-2009 will be co-located with 31st All
India Conference of Linguistics (AICL31)

1. TOPICS:
Papers are invited on substantial, original and unpublished research on all
aspects of Natural Language Processing, with a particular focus on
languages,
issues, and applications relevant to India. The areas of interest include,
but
are not limited to:

         Morphology            Parsing
         Phonology            Word Sense Disambiguation
         Syntax                    Machine Translation
         Semantics 	           Information Retrieval
         Discourse 	           Text Summarization
         Pragmatics            Question Answering
         Statistical methods         Dialog Systems
         Knowledge-based methods     Performance Evaluation
         Annotated Corpora 	   Speech Corpora
         Lexical Resources 	   Speech Recognition
         Ontology 	           Speech Synthesis
         POS tagging

1.1 SPECIAL TRACK:
ICON-2009 will also have a separate track on Linguistic models in NLP.
Purely
theoretical papers in linguistics which have implications for computational
linguistics are also invited under this track.


2. FORMAT OF SUBMISSION:
Papers in English, not exceeding 10 pages, should be submitted at
www.iiit.ac.in/icon2009. One can also submit hard copy (four copies) under
special circumstances. Papers should include an abstract of about 100-200
words. Papers outside the specified length are subject to rejection without
review. Please see the style file at
www.aclweb.org/downloads/acl-ftp/Styfiles/Proceedings/

BLIND REVIEW:
Kindly ensure that authors' names and affiliations are given only on a
separate
cover sheet. Papers in electronic form can be in plain text, Postscript,
PDF,
Latex or Microsoft Word (RTF only). If your paper contains text of languages

other than English, please attach relevant font files along with your
submission.

3. CALL FOR TUTORIALS/WORKSHOPS:
Proposals are invited for pre-conference tutorials. Tutorials/Workshops can
be
of half-day or full-day duration. The proposal should be presented in the
form
of a 200-word abstract, one page topical outline of the content, description
of
the proposers and their qualifications relating to the tutorial content.

Send tutorial/Workshop proposals to the ICON-2009 Secretariat. For further
information, please refer to the Conference URL or contact the ICON-2009
Secretariat.

4. NLP TOOLS CONTEST: Parsing
Efficient Indian language (IL) parsing still remains an open problem. One
reason for this is the unavailability of annotated corpora to experiment.
This
contest aims to bring together researchers working/interested in the area of
IL
parsing to explore techniques that can improve the present accuracies, by
providing sufficient annotated data. This contest, however, will only focus
on
dependency parsing.

CONTEST:
Participants will be provided training, development and testing data to
report
the efficiency of their dependency parsers. Languages such as Hindi, Bangla,

Oriya, Marathi, etc. will be explored. Parser efficiency will be measured in

terms of standard measures such as Unlabelled attachment accuracy and
Labelled
attachment accuracy. Shortlisted candidates will present their techniques
and
results work in a special session at ICON. Note that it is not necessary
that
the participating parser be a statistical one, other types such as grammar
driven parsers or hybrid parsers can also participate.

The Contest will have two prizes :
     FIRST PRIZE: Rs.7500/-
     SECOND PRIZE: Rs.5000/-

Resources: To be Announced

5. STUDENT PAPER COMPETITION IN LANGUAGE TECHNOLOGIES:
ICON-2009 announces STUDENT PAPER COMPETITION in two tracks:
     Track I : NLP (All areas)
     Track II : Linguistics (Morphology, Syntax and Semantics)

Papers may be submitted under the link on the web page. Prizes will be
awarded
in each track for up to two papers based on original work carried out. The
prizes are::

     FIRST PRIZE: Rs.7500/-
     SECOND PRIZE: Rs.5000/-

The short-listed papers in each track will be invited for presentation in a
special session in ICON-2009 conference. Registration, domestic travel and
subsistence expenses will be provided by the conference organizers for one
author of each paper. Up to two winners will be offered summer fellowships
at
major NLP Centres in India. For any clarifications, contact Student Paper
Competition Chair on icon09student@....

6. IMPORTANT DATES:
       Paper registration deadline Aug 1, 2009
       Paper submission deadline         Aug 1, 2009
       Paper acceptance notification Sep 15, 2009
       Camera ready copy due         Oct 20, 2009

       Tutorial/Workshop proposals due    Aug 7, 2009
       Tutorial acceptance notification   Aug 16, 2009
       Lecture Materials for tutorial Nov 15, 2009

       NLP Tools Contest registration
         deadline                         Jul 21, 2009
       Student Paper Competition
        submission deadline         Aug 7, 2009


7. COMMITTEES:

     ADVISORY COMMITTEE CHAIR
       Aravind K Joshi, University of Pennsylvania, USA

     CONFERENCE GENERAL CHAIR
       Rajeev Sangal, IIIT, Hyderabad, India

     PROGRAMME COMMITTEE CHAIR
       Dipti Misra Sharma, IIIT, Hyderabad
       Vasudeva Varma, IIIT, Hyderabad

     TOOLS CONTEST CHAIR
       Samar Husain, IIIT, Hyderabad

     STUDENT PAPER COMPETITION CHAIR
       Sudeshna Sarkar, IIT, Kharagpur

     ORGANIZING CHAIR
       R K Bagga, IIIT, Hyderabad, India
       Ashish Jacob Thomas, University of Hyderabad

8. CONTACT INFORMATION
     ICON-2009 Secretariat
     Language Technologies Research Centre
     International Institute of Information Technology
     Gachibowli, Hyderabad - 500 032, India
     Ph: +91-40-2300 1412; Fax: +91-40-6653 1413
     e-mail: icon2009@...
     www.iiit.ac.in/icon2009

#444 From: pema choejey <pema_psg@...>
Date: Mon Mar 9, 2009 4:05 am
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way
pema_psg
Offline Offline
Send Email Send Email
 
I agree on this, we need to do something to convince Adobe to support Complext scripts in their products. But one country cannot do it alone, all Asian countries must put a united front and start voicing our issues, even if Adobe refuses to hear us, but for how long?

--- On Fri, 1/23/09, Ruvan Weerasinghe <arw@...> wrote:
From: Ruvan Weerasinghe <arw@...>
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way
To: "Firoj Alam" <firojalam04@...>
Cc: PANLocalization@yahoogroups.com
Date: Friday, January 23, 2009, 5:08 AM

thanks firoj and others who've responded. but surely we can do better? just how many of us *are* there on this list? and if each of us can send it to our own country folk, we'd have much more than 6 responses by now.

mind you, it doesn't seem like anyone at adobe is reading it! but let's first show them that we have the numbers... anyone care to post the populations of the language groups that we are talking about here... from ethnologue for instance?


Firoj Alam wrote:

Dear Ruvan,
Thanks. I'm eager to do it. I talked to Quark team couple of months ago for getting complex script support in Quark. They replied me in such a way that they are not interested on it. Though everybody of us know about its importance but i want to add something to it. All of our print media use adobe products for publishing, specially Quark express. They need another cost to convert the data into unicode. That's why most of the publishing house don't convert their data into unicode. All of them keep their data in ASCII format which is not a good idea.
So everybody should talk about it to reduce this technological limitation. I don't think we have any better solution other than this.

Regards
Firoj


From: Ruvan Weerasinghe <arw@.... ac.lk>
To: PANLocalization@ yahoogroups. com
Sent: Friday, 16 January, 2009 11:41:25
Subject: [PAN Localization] Putting pressure on Adobe... the community way

We left this discussion sometime ago, but I thought let's start a community kind of pressure group for getting Complex Script support in Adobe.

Please don't dismiss this as a fruitless exercise - let's try it!

Here's the URL: http://www.adobefor ums.com/webx/ .59b54384

You need to register - I did with my gmail address, but you seem to be able to even put fake address since there is no authentication. But maybe, just maybe, Adobe may reply me even if not in this public forum...

Please do this - it takes 5 mins - and get others in the region to do so...

Ruvan.




#443 From: pema choejey <pema_psg@...>
Date: Mon Mar 9, 2009 3:24 am
Subject: Re: [PAN Localization] Bangla text-to-speech and OCR launch at BRAC U on 19/02/09 3pm
pema_psg
Offline Offline
Send Email Send Email
 
Congratulation to Bangla Team for official launching of TTS and OCR. Hoping to learn from your experiences.


Pema C

--- On Mon, 2/16/09, Mumit Khan <khan@...> wrote:
From: Mumit Khan <khan@...>
Subject: [PAN Localization] Bangla text-to-speech and OCR launch at BRAC U on 19/02/09 3pm
To: PANLocalization@yahoogroups.com
Date: Monday, February 16, 2009, 8:48 AM

We are pleased to announce the first official release of our Bangla language processing software packages “Katha†(text-to-speech) and BanglaOCR (optical character recognition) . We invite you to join us in celebrating this occasion on February 19, 2009 at 3 pm at BRAC University. See the computer create Bangla unicode text from scanned images and then read out the text. Meet the people behind it.

We have come a long way, but we have even a longer way to go.

The TTS and OCR run on Linux, Windows and Mac OSX. There is also a web-enabled front-end for the TTS (and under development for the OCR), making these tools available at anytime and from anywhere. We are working on better integration with screen readers in collaboration with the vision impaired community.

The Bangla language processing tools developed at CRBLP are free and open source software, released under GNU Public License v2, and supported by IDRC's PAN Localization Project and BRAC University. Please visit CRBLP website http://www.bracu. ac.bd/research/ crblp/ for more information on who we are and what we do.

Location: 203 Aarong House, BRAC University, 66 Mohakhali C/A
(mention "CRBLP seminar")
Time: 3 pm

-- 

Mumit Khan, Ph.D.

Professor of Computer Science and Engineering

Head, Center for Research on Bangla Language Processing

BRAC University, Dhaka, Bangladesh

mumit@bracu. ac.bd

+(880-2) 882-4051 Extension 4019





#442 From: "Sarmad Hussain" <sarmad.hussain@...>
Date: Fri Feb 20, 2009 3:31 am
Subject: FW: [Mt-list] Call for Software Demos at ACL-IJCNLP-09
sarmad001
Offline Offline
Send Email Send Email
 
FYI
-----Original Message-----
From: mt-list-bounces@... [mailto:mt-list-bounces@...] On Behalf
Of away@...
Sent: Thursday, February 19, 2009 8:16 PM
To: mt-list@...
Cc: kanmy@...; gblee@...
Subject: [Mt-list] Call for Software Demos at ACL-IJCNLP-09

CALL FOR SOFTWARE DEMONSTRATIONS
at the joint conference of the 47th Annual Meeting of the ACL and the 4th
IJCNLP

Suntec, Singapore
August 2-7, 2009

http://isoft.postech.ac.kr/~hernus/acl_demo/

Deadline for submission: March 30, 2009

-------------------------------------------------------------------------

The ACL-IJCNLP-09 Program Committee invites proposals for the
Demonstrations Program. We encourage both the submission of early
research prototypes and interesting mature systems. Commercial sales and
marketing activities are not appropriate in the Demonstrations Program,
and should be arranged as part of the Exhibit Program.


Areas of Interest

Areas of interest include all topics related to theoretical and
applied computational linguistics, such as (but not limited to) the topics
listed for the conference paper submission:
http://www.acl-ijcnlp-2009.org/main/callforpapers.html

The systems may be of the following kinds:
* Natural Language Processing systems or system components
* Application systems using language technology components
* Software tools for computational linguistics resesarch
* Software for demonstration or evaluation
* Development tools


Format for Submissions

Demo proposals consist of the following parts, which should all be sent to
the Demo Chairs. Please use the main ACL-IJCNLP paper
formatting guidelines.

* An extended abstract of the technical content to be demonstrated,
   including title, authors, full contact information, references, and
acknowledgements.

* A "script outline" of the demo presentation, including accompanying
   narrative, and either a Web address for accessing the demo or visual
aids (e.g., screenshots, snapshots, or diagrams).

* A detailed description of the hardware, software and internet
   service requirements expected to be provided by the local organizer.

The entire proposal should not be more than four pages.


Submissions Procedure

The deadline for proposals is March 30, 2009. Submissions must be
received electronically. Please submit your proposals and any
inquiries to (both chairs):

Gary Geunbae Lee (gblee@...) and
Sabine Schulte im Walde (schulte@...)

Submissions will be evaluated on the basis of their relevance to
computational linguistics, innovation, scientific contribution,
presentation, as well as potential logistical constraints.

Accepted submissions will be allocated maximum four pages in the Companion
Volume to the Proceedings of the Conference.


Demonstrations Chairs

Gary Geunbae Lee (Pohang University of Science and Technology)
Sabine Schulte im Walde (University of Stuttgart)


Demonstrations Program Committee

Paul Buitelaar (DFKI, Germany)
Massimiliano Ciaramita (Google, Switzerland)
Sadao Kurohasi (Kyoto University, Japan)
Ee-Peng Lim (Singapore Management University, Singapore)
Dekang Lin (Google, USA)
Jong Park (KAIST, Korea)
Ted Pedersen (University of Minnesota, USA)
Dan Roth (University of Illinois at Urbana-Champaign, USA)
Ming Zhou (MSRA, China)
Heike Zinsmeister (University of Konstanz, Germany)





_______________________________________________
Mt-list mailing list


__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__________ Information from ESET NOD32 Antivirus, version of virus signature
database 3811 (20090129) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

#441 From: Mumit Khan <khan@...>
Date: Mon Feb 16, 2009 4:48 pm
Subject: Bangla text-to-speech and OCR launch at BRAC U on 19/02/09 3pm
mumit_khan
Offline Offline
Send Email Send Email
 
We are pleased to announce the first official release of our Bangla language processing software packages “Katha” (text-to-speech) and BanglaOCR (optical character recognition). We invite you to join us in celebrating this occasion on February 19, 2009 at 3 pm at BRAC University. See the computer create Bangla unicode text from scanned images and then read out the text. Meet the people behind it.

We have come a long way, but we have even a longer way to go.

The TTS and OCR run on Linux, Windows and Mac OSX. There is also a web-enabled front-end for the TTS (and under development for the OCR), making these tools available at anytime and from anywhere. We are working on better integration with screen readers in collaboration with the vision impaired community.

The Bangla language processing tools developed at CRBLP are free and open source software, released under GNU Public License v2, and supported by IDRC's PAN Localization Project and BRAC University. Please visit CRBLP website http://www.bracu.ac.bd/research/crblp/ for more information on who we are and what we do.

Location: 203 Aarong House, BRAC University, 66 Mohakhali C/A
(mention "CRBLP seminar")
Time: 3 pm

-- 

Mumit Khan, Ph.D.

Professor of Computer Science and Engineering

Head, Center for Research on Bangla Language Processing

BRAC University, Dhaka, Bangladesh

mumit@...

+(880-2) 882-4051 Extension 4019




#440 From: Ruvan Weerasinghe <arw@...>
Date: Sun Jan 25, 2009 4:32 pm
Subject: [Fwd: Computational Linguistics Journal Goes Open Access]
arweerasinghe
Offline Offline
Send Email Send Email
 
hi all,

this is the premier journal in computational linguistics... and now it is free! great news for our part of the world...

regards,

ruvan.


-------- Original Message --------
Subject: [Corpora-List] Computational Linguistics Journal Goes Open Access
Date: Thu, 22 Jan 2009 16:53:11 +1100
From: Robert Dale <rdale@...>
Reply-To: rdale@...
Organization: Macquarie University
To: Corpora List <CORPORA@...>


During the last few years, although the Computational Linguistics journal
has been a toll-access journal, we have endeavoured to make all published
papers freely available via the ACL Anthology at
http://aclweb.org/anthology-new/ after a one-year embargo period.

Thanks to the support of the Association for Computational Linguistics, as
of January 1st 2009, the journal is now completely open access and
electronic only.  Articles may now be freely downloaded from the MIT Press
website at http://www.mitpressjournals.org/loi/coli as soon as they are
published.  All past issues are also freely available from this location.

Robert Dale
CL Editor


_______________________________________________
Corpora mailing list
Corpora@...
http://mailman.uib.no/listinfo/corpora


#439 From: Ruvan Weerasinghe <arw@...>
Date: Fri Jan 23, 2009 1:08 pm
Subject: Re: [PAN Localization] Putting pressure on Adobe... the community way
arweerasinghe
Offline Offline
Send Email Send Email
 
thanks firoj and others who've responded. but surely we can do better? just how many of us *are* there on this list? and if each of us can send it to our own country folk, we'd have much more than 6 responses by now.

mind you, it doesn't seem like anyone at adobe is reading it! but let's first show them that we have the numbers... anyone care to post the populations of the language groups that we are talking about here... from ethnologue for instance?


Firoj Alam wrote:
Dear Ruvan,
Thanks. I'm eager to do it. I talked to Quark team couple of months ago for getting complex script support in Quark. They replied me in such a way that they are not interested on it. Though everybody of us know about its importance but i want to add something to it. All of our print media use adobe products for publishing, specially Quark express. They need another cost to convert the data into unicode. That's why most of the publishing house don't convert their data into unicode. All of them keep their data in ASCII format which is not a good idea.
So everybody should talk about it to reduce this technological limitation. I don't think we have any better solution other than this.

Regards
Firoj


From: Ruvan Weerasinghe <arw@...>
To: PANLocalization@yahoogroups.com
Sent: Friday, 16 January, 2009 11:41:25
Subject: [PAN Localization] Putting pressure on Adobe... the community way

We left this discussion sometime ago, but I thought let's start a community kind of pressure group for getting Complex Script support in Adobe.

Please don't dismiss this as a fruitless exercise - let's try it!

Here's the URL: http://www.adobefor ums.com/webx/ .59b54384

You need to register - I did with my gmail address, but you seem to be able to even put fake address since there is no authentication. But maybe, just maybe, Adobe may reply me even if not in this public forum...

Please do this - it takes 5 mins - and get others in the region to do so...

Ruvan.



Messages 439 - 468 of 474   Newest  |  < Newer  |  Older >  |  Oldest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help