Search the web
Sign In
New User? Sign Up
i18n-prog · Discussion of Internationalization programming issues (i18n)
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 2014 - 2044 of 2074   Newest  |  < Newer  |  Older >  |  Oldest
Messages: Show Message Summaries   (Group by Topic) Sort by Date v  
#2044 From: Martin Wunderlich <martin_wu@...>
Date: Sun Oct 5, 2008 7:45 pm
Subject: Re[2]: Java question: How to construct a ResourceBundle from a file
wundiman
Offline Offline
Send Email Send Email
 
Hi Phil,

Thanks a lot for the reply!

> Martin
>  
> So, if I understand, you'd like to be able to open and read a bunch
> of properties files and determine their locale at runtime just
> before writing out the TMX with associated lang attributes?

That's right.

>  
> Can you determine the locale from the properties' file name
> (Volumes/users/me/myBundle_fr_FR.properties)? Then use either the
> PropertyResourceBundle or Properties classes?

I could get at the locale ID that way, but there is no way to set the
locale on a PropertyResourceBundle. At least no obvious way, like a
.setLocale(string locID) method.

>  
> If you're just using these classes to get over the encoding in the
> properties files it might be easier to use native2ascii to convert
> them all first to Unicode and then use some of the suggestions below:
>  
> 1. Regular Expressions
> 2. Alchemy Catalyst
> 3. Rainbow


That sounds like an option, even though I'd rather have a dedicated
tool.
I also checked the source code of OmegaT,
which has a filter for .properties. They are just parsing the
.properties file as a text file and doing the conversion on a
character level. I was hoping there'd be an easier way.

The Maxprograms tool ProptertiesViewer somehow does it. I wonder how.

>  
> I don't know for sure if 2 or 3 can help but I think they can. I
> can also recommend a good LSP who's engineering team might be able to help.

:-))

Cheers,

Martin

>  
> Phil Ritchie.

> --- On Sun, 5/10/08, Martin Wunderlich <martin_wu@...> wrote:

> From: Martin Wunderlich <martin_wu@...>
> Subject: [i18n-prog] Java question: How to construct a ResourceBundle from a
file
> To: "Ian Davies" <i18n-prog@yahoogroups.com>
> Date: Sunday, 5 October, 2008, 6:58 PM






> Hi all,

> I hope someone on this list might know the answer to the following
> question:
> I am creating a program to process ResourceBundles and do stuff to
> the keys and values (to be more precise, I want to construct a TMX
> file from a bunch of existing translations in .properties files).

> I have come across the problem that there is no constructor for the
> ResourceBundle class that would take a file
> name as input. The only way I can create ResourceBundles, it seems,
> is by referring to the .properties file with a fully qualified
> class name (e.g. com.example. MyBundle) as the base name.

> Any ideas how I can get a working ResourceBundle object by passing in
> the file name (e.g. "/Volumes/users/ me/myBundle. properties" )?? I can
> create a PropertyResourceBun dle object from a FileInputStream
> alright, but when I call .getLocale() on that PropertyResourceBun dle,
> I only get null.

> Any help would be very much appreciated.

> Cheers,

> Martin

>














>


--
----------------------------------------------------------
Martin Wunderlich, M.A.
Translation/Localisation EN <-> DE

  www.martinwunderlich.com
----------------------------------------------------------
Free / open-source software for translation/localisation:
  www.martinwunderlich.com/foss-links.html
----------------------------------------------------------
Random aphorism:
  "Aquela reinspiração, sem a qual traduzir é apenas parafrasear noutra
língua."
"That new inspiration - without which to translate merely means to paraphrase
into another language."
  - Fernando Pessoa
----------------------------------------------------------

#2043 From: Phil Ritchie <endigitalmind@...>
Date: Sun Oct 5, 2008 6:43 pm
Subject: Re: Java question: How to construct a ResourceBundle from a file
endigitalmind
Offline Offline
Send Email Send Email
 
Martin
 
So, if I understand, you'd like to be able to open and read a bunch of properties files and determine their locale at runtime just before writing out the TMX with associated lang attributes?
 
Can you determine the locale from the properties' file name (Volumes/users/me/myBundle_fr_FR.properties)? Then use either the PropertyResourceBundle or Properties classes?
 
If you're just using these classes to get over the encoding in the properties files it might be easier to use native2ascii to convert them all first to Unicode and then use some of the suggestions below:
 
1. Regular Expressions
2. Alchemy Catalyst
3. Rainbow
 
I don't know for sure if 2 or 3 can help but I think they can. I can also recommend a good LSP who's engineering team might be able to help.
 
Phil Ritchie.

--- On Sun, 5/10/08, Martin Wunderlich <martin_wu@...> wrote:
From: Martin Wunderlich <martin_wu@...>
Subject: [i18n-prog] Java question: How to construct a ResourceBundle from a file
To: "Ian Davies" <i18n-prog@yahoogroups.com>
Date: Sunday, 5 October, 2008, 6:58 PM

Hi all,

I hope someone on this list might know the answer to the following
question:
I am creating a program to process ResourceBundles and do stuff to
the keys and values (to be more precise, I want to construct a TMX
file from a bunch of existing translations in .properties files).

I have come across the problem that there is no constructor for the ResourceBundle class that would take a file
name as input. The only way I can create ResourceBundles, it seems,
is by referring to the .properties file with a fully qualified class name (e.g. com.example. MyBundle) as the base name.

Any ideas how I can get a working ResourceBundle object by passing in
the file name (e.g. "/Volumes/users/ me/myBundle. properties" )?? I can
create a PropertyResourceBun dle object from a FileInputStream
alright, but when I call .getLocale() on that PropertyResourceBun dle,
I only get null.

Any help would be very much appreciated.

Cheers,

Martin



#2042 From: Martin Wunderlich <martin_wu@...>
Date: Sun Oct 5, 2008 5:58 pm
Subject: Java question: How to construct a ResourceBundle from a file
wundiman
Offline Offline
Send Email Send Email
 
Hi all,

I hope someone on this list might know the answer to the following
question:
I am creating a program to process ResourceBundles and do stuff to
the keys and values (to be more precise, I want to construct a TMX
file from a bunch of existing translations in .properties files).

I have come across the problem that there is no constructor for the
ResourceBundle class that would take a file
name as input. The only way I can create ResourceBundles, it seems,
is by referring to the .properties file with a fully qualified class name (e.g.
com.example.MyBundle) as the base name.

Any ideas how I can get a working ResourceBundle object by passing in
the file name (e.g. "/Volumes/users/me/myBundle.properties")?? I can
create a PropertyResourceBundle object from a FileInputStream
alright, but when I call .getLocale() on that PropertyResourceBundle,
I only get null.

Any help would be very much appreciated.

Cheers,

Martin

#2041 From: "Ian Davies" <ianbingofish@...>
Date: Wed Sep 10, 2008 5:34 pm
Subject: Olsen Database DST issues
ianbingofish
Online Now Online Now
Send Email Send Email
 
I have some issues with implementing DST rules from the Olsen DB via
the data fetched from the JRE 6.0

Olsen says that DST change in Central Europe is at 1am, however in
Germany and all other countries in this time zone convention says it
is always at 3am, when the time moves to 2am

Central European Time [Europe/Paris, UTC+1hrs]
Olsen/JRE DST end rule >>>> Last Sunday of Oct at 0100 hrs
MicroSoft DST end rule >>>> 26/10/2008 03:00 (apparently correct also)

This appears to be the Olsen DB in error, or are the Olsen times
actually in UTC?  This would explain as at that time UTC 0100 would in
fact be CET 0300 at the point when CET is rolled back one hour?

If not do people take the Olsen DB and modify it for their
implementations?  Linux/OSX/ etc etc?

#2040 From: Caroline Cox <carolinecox@...>
Date: Tue Aug 26, 2008 8:26 pm
Subject: Searching for Internationalization QA Engineer in Bay Area, CA
linerrodgers
Offline Offline
Send Email Send Email
 
Acclaro, Inc. (www.acclaro.com) is looking for an Internationalization QA Engineer for a contract position in the Bay Area. This will be an experienced quality engineer that clearly understands the international issues that occur in software products.
The I18n Quality Engineer can perform white box testing as well as black box testing on products and globalization technologies. He/she can write thorough test plans and scripts to uncover international issues and identify international defects in software products.  
Responsibilities include making sure the product meets the quality expectations of international end-user and the company standard; review and implement test plans for the localization testing team; develop some ad-hoc tools that would facilitate the automation of the localization testing processes; help resolve internationalization or process issues; manage testing environment for specific localization projects; etc.
We are looking for someone with 5 years of experience in software industry, with a strong knowledge of character encoding systems; able to work independently to identify issues, define architecture, and respect tasks; and with very good knowledge of Localization and Internationalization processes.
If interested, please send a cover letter and resume to ccox@...

#2039 From: "bai.sara" <bai.sara@...>
Date: Wed Jul 30, 2008 3:51 pm
Subject: java exams ?
bai.sara
Offline Offline
Send Email Send Email
 
iam looking to sit for sun java exams and
was wondering how good are hotcerts.com prep for java exams  .
please let me know before i shell out $85 for their exams

#2038 From: "Tim Greenwood" <timothy@...>
Date: Thu Jul 3, 2008 4:14 pm
Subject: Re: French data sorting
timgreenwood
Offline Offline
Send Email Send Email
 
Hi Anuj,

The definitive works on this were written by Alain LaBonté, whose
personal page is on http://cyberiel.iquebec.com/. I used to have a
paper copy of the standard defining it all, but can no longer locate
it. It may have been ISO 14651, which I see is now withdrawn.

Michael Kaplan has some material on this on
http://blogs.msdn.com/michkap/archive/2004/12/31/344739.aspx

Tim



On Thu, Jul 3, 2008 at 6:37 AM, Anuj Magazine <amagazine@...> wrote:
> Hi, I had a question regarding the French sorting rules. Can somebody tell
> me or point me to a resource where i can get information on algorithm used
> for sorting the French data and characters ?
>
> Regards,
> Anuj
>

#2037 From: "Yves Savourel" <ysavourel@...>
Date: Thu Jul 3, 2008 3:51 pm
Subject: RE: Digest Number 832
yves_savourel
Offline Offline
Send Email Send Email
 
Hello Anuj,

> Hi, I had a question regarding the French sorting rules. Can
> somebody tell me or point me to a resource where i can get
> information on algorithm used for sorting the French data and
> characters ?

There is the ICU Locale Explorer that has some information:
http://demo.icu-project.org/icu-bin/locexp?d_=en&_=fr

But I think it is more complicated than just using an ordered character table.
You have to separate diacritics from the base letters
and order their weights in reverse order ...or something like that.

I haven't worked on any of this since a long time, so maybe something like the
blog entry from Michka here:
http://blogs.msdn.com/michkap/archive/2004/12/31/344739.aspx will give you more
useful information.

Hope this helps,
Kenavo,
-yves

#2036 From: "Anuj Magazine" <amagazine@...>
Date: Thu Jul 3, 2008 10:37 am
Subject: French data sorting
anujmsqm
Offline Offline
Send Email Send Email
 
Hi, I had a question regarding the French sorting rules. Can somebody tell me or point me to a resource where i can get information on algorithm used for sorting the French data and characters ?
 
Regards,
Anuj

#2035 From: deepinder singh <deepindersingh_leo@...>
Date: Thu Jun 26, 2008 1:50 pm
Subject: Re: Translation Memory
deepindersin...
Offline Offline
Send Email Send Email
 
TinyTM has released their first development version .01 of TM. You can visit the link for more details http://tinytm.sourceforge.net/

Hope this helps you for a start.

--- On Mon, 23/6/08, Gurpreet Singh <gurpreetssingh@...> wrote:
From: Gurpreet Singh <gurpreetssingh@...>
Subject: [i18n-prog] Translation Memory
To: i18n-prog@yahoogroups.com
Date: Monday, 23 June, 2008, 3:23 PM

hi,
       Can somebody guide me  how to create Translation Memory database.



Meet people who discuss and share your passions. Join them now.

#2034 From: Gurpreet Singh <gurpreetssingh@...>
Date: Mon Jun 23, 2008 9:53 am
Subject: Translation Memory
gurpreetssingh
Offline Offline
Send Email Send Email
 
hi,
       Can somebody guide me  how to create Translation Memory database.


#2033 From: Martin Wunderlich <martin_wu@...>
Date: Sun Jun 22, 2008 11:33 am
Subject: Re[2]: How to manage multiple English sites
wundiman
Offline Offline
Send Email Send Email
 
Hiya Achim,

Thanks a lot for the quick reply.

I guess I should have provided a
bit more background about the kind of web presence we are talking
about there. It's a large corporate site for an IT company with a
relatively complex product offering. The whole OpenCms structure,
including the localised sites, was conceived before I joined the
company. There are at least a dozen people involved in maintaining the site.

So, what I am trying to say is that starting from scratch is not an
option (at the moment anyway). I have to take what is there and try to evolve it
into something
more managable. I will certainly look into the mechanisms you
describe to see, if they can be used somehow.

Cheers,

Martin

> Hi Martin,

> why didn't you use the standard OpenCms localization mechanisms?
> E.g.:

> Create an XML content type containing price information to enter for
> editors along with text for the page. Create folders in VFS for the
> languages (en_US, en_UK,...) and set their locale property to this
> value. Add the languages in opencms-system.xml and restart tomcat. Copy
> the pages as sibling into the other languages. Translate the pages by
> using the lanugage selector in the Editor. This would have been a KIS
> approach. Now what OpenCms does not support in basic installation is XML
> content nodes that are language - independant. This perhaps could have
> been done by programming your own XML content handler and assing it in
> the configuration to your own pricing XML content. More info could be
> found on the OpenCms mailing list or the wiki.

> kind regards,

> Achim


> Martin Wunderlich wrote:
>> Hi all,
>>
>> I have a question on managing multiple English sites in a content
>> management system. I hope this is related enough to topics of I18N to
>> allow posting in this forum.
>>
>> We have a multilingual web presence with currently 10 languages. The
>> pages sit in a CMS (OpenCms) from where they are pushed for
>> translation into a translation management system (TMS). These
>> international pages sit in
>> a subfolder for each languages, e.g. www.example.com/de or
>> www.example.com/jp.
>>
>> The tricky thing
>> now, which has been causing us headaches, is that there are also 5
>> distinct English sites: US, UK, EU, Apac and AU. The content on these
>> EN sites is nearly identical, except for things like pricing,
>> contact details and the internal URLs. The TMS is set up in such a
>> way that the URLs are changed automatically when the translated pages
>> are post-processed, e.g. changing a link to www.example.com/somesite
>> is changed to www.example.com/de/somesite.
>>
>> The main issue is around keeping the non-US sites synchronised with
>> the US site (from where the content originates). At the moment, this
>> is a manual procedure. The US don't always notify me when they make
>> changes and consequently the non-US pages are often out of date.
>>
>> The matter is complicated further by the fact that Google doesn't
>> like pages with identical content. They seem to be seen as
>> duplicates, even if they in different languages subfolders, and are
>> not shown. So, a user searching for UK page might be present with the
>> US page.
>>
>> Changing the basic site structure, e.g. moving to one EN site only,
>> is not an option, since the structure is required for search engine
>> optimisation.
>>
>> Has anyone come across a similar situation? Is there something like
>> best practices around managing these English sites? Whatever the
>> solution, I would favour something automated to keep the non-US pages
>> in synch, while at the same time making Google happy.
>>
>> Sorry about the long post. I hope someone has a good idea.
>>
>> Kind regards,
>>
>> Martin
>>
>>
>> ------------------------------------
>>
>> Yahoo! Groups Links
>>
>>
>>


--
----------------------------------------------------------
Martin Wunderlich, M.A.
Translation/Localisation EN <-> DE

  www.martinwunderlich.com
----------------------------------------------------------
Free / open-source software for translation/localisation:
  www.martinwunderlich.com/foss-links.html
----------------------------------------------------------
Random aphorism:
  "Nothing that is worth knowing can be taught."
  - Oscar Wilde
----------------------------------------------------------

#2032 From: Achim Westermann <Achim.Westermann@...>
Date: Sun Jun 22, 2008 10:35 am
Subject: Re: How to manage multiple English sites
achim_wester...
Offline Offline
Send Email Send Email
 
Hi Martin,

why didn't you use the standard OpenCms localization mechanisms?
E.g.:

Create an XML content type containing price information to enter for
editors along with text for the page. Create folders in VFS for the
languages (en_US, en_UK,...) and set their locale property to this
value. Add the languages in opencms-system.xml and restart tomcat. Copy
the pages as sibling into the other languages. Translate the pages by
using the lanugage selector in the Editor. This would have been a KIS
approach. Now what OpenCms does not support in basic installation is XML
content nodes that are language - independant. This perhaps could have
been done by programming your own XML content handler and assing it in
the configuration to your own pricing XML content. More info could be
found on the OpenCms mailing list or the wiki.

kind regards,

Achim


Martin Wunderlich wrote:
> Hi all,
>
> I have a question on managing multiple English sites in a content
> management system. I hope this is related enough to topics of I18N to
> allow posting in this forum.
>
> We have a multilingual web presence with currently 10 languages. The
> pages sit in a CMS (OpenCms) from where they are pushed for
> translation into a translation management system (TMS). These
> international pages sit in
> a subfolder for each languages, e.g. www.example.com/de or
> www.example.com/jp.
>
> The tricky thing
> now, which has been causing us headaches, is that there are also 5
> distinct English sites: US, UK, EU, Apac and AU. The content on these
> EN sites is nearly identical, except for things like pricing,
> contact details and the internal URLs. The TMS is set up in such a
> way that the URLs are changed automatically when the translated pages
> are post-processed, e.g. changing a link to www.example.com/somesite
> is changed to www.example.com/de/somesite.
>
> The main issue is around keeping the non-US sites synchronised with
> the US site (from where the content originates). At the moment, this
> is a manual procedure. The US don't always notify me when they make
> changes and consequently the non-US pages are often out of date.
>
> The matter is complicated further by the fact that Google doesn't
> like pages with identical content. They seem to be seen as
> duplicates, even if they in different languages subfolders, and are
> not shown. So, a user searching for UK page might be present with the
> US page.
>
> Changing the basic site structure, e.g. moving to one EN site only,
> is not an option, since the structure is required for search engine
> optimisation.
>
> Has anyone come across a similar situation? Is there something like
> best practices around managing these English sites? Whatever the
> solution, I would favour something automated to keep the non-US pages
> in synch, while at the same time making Google happy.
>
> Sorry about the long post. I hope someone has a good idea.
>
> Kind regards,
>
> Martin
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>

#2031 From: Martin Wunderlich <martin_wu@...>
Date: Sun Jun 22, 2008 10:04 am
Subject: How to manage multiple English sites
wundiman
Offline Offline
Send Email Send Email
 
Hi all,

I have a question on managing multiple English sites in a content
management system. I hope this is related enough to topics of I18N to
allow posting in this forum.

We have a multilingual web presence with currently 10 languages. The
pages sit in a CMS (OpenCms) from where they are pushed for
translation into a translation management system (TMS). These
international pages sit in
a subfolder for each languages, e.g. www.example.com/de or
www.example.com/jp.

The tricky thing
now, which has been causing us headaches, is that there are also 5
distinct English sites: US, UK, EU, Apac and AU. The content on these
EN sites is nearly identical, except for things like pricing,
contact details and the internal URLs. The TMS is set up in such a
way that the URLs are changed automatically when the translated pages
are post-processed, e.g. changing a link to www.example.com/somesite
is changed to www.example.com/de/somesite.

The main issue is around keeping the non-US sites synchronised with
the US site (from where the content originates). At the moment, this
is a manual procedure. The US don't always notify me when they make
changes and consequently the non-US pages are often out of date.

The matter is complicated further by the fact that Google doesn't
like pages with identical content. They seem to be seen as
duplicates, even if they in different languages subfolders, and are
not shown. So, a user searching for UK page might be present with the
US page.

Changing the basic site structure, e.g. moving to one EN site only,
is not an option, since the structure is required for search engine
optimisation.

Has anyone come across a similar situation? Is there something like
best practices around managing these English sites? Whatever the
solution, I would favour something automated to keep the non-US pages
in synch, while at the same time making Google happy.

Sorry about the long post. I hope someone has a good idea.

Kind regards,

Martin

#2030 From: "Tim Greenwood" <timothy@...>
Date: Mon Jun 16, 2008 2:41 pm
Subject: Re: Bytes per character
timgreenwood
Offline Offline
Send Email Send Email
 
Good. Another Unicode converter that I use frequently is web based
from Richard Ishida
http://people.w3.org/rishida/scripts/uniview/conversion.php

For entering data his character pickers are also very useful
http://rishida.net/scripts/pickers/

- Tim

On Mon, Jun 16, 2008 at 10:36 AM, Anuj Magazine <amagazine@...> wrote:
> Thanks Tim for your help.
> I managed to locate a tool that can give the bytes per character based on
> encoding. Details in the link below-
> http://www.testingmentor.com/tool_info/str2val.html
>
>
> On 6/16/08, Tim Greenwood <timothy@...> wrote:
>>
>> Do you know which encoding the application is using? I would guess
>> that it is UTF-8 from your description below. To learn about UTF-8
>> look at http://en.wikipedia.org/wiki/Utf8 or www.unicode.org
>>
>> Tim
>>
>> On Fri, Jun 13, 2008 at 10:50 PM, Anuj Magazine <amagazine@...>
>> wrote:
>> > Thanks very much, Tim. Your responses gives me some insight.
>> > Are you aware of any tool or utility available that can help me tell the
>> > bytes per certain character based on the encoding systems used ?
>> >
>> > To give a brief background of my request- recently while testing an
>> > internationalized application, i had come across an issue in which a
>> > text
>> > field with character length as 75 characters accepted all the English
>> > characters into the database but when i tried to fill in max 75
>> > characters
>> > in the text field with French characters, it failed to save all the
>> > characters in the database (because of some characters occupying more
>> > bytes)
>> > and as a result, truncations were observed on the UI wherever the data
>> > was
>> > being read.
>> >
>> > In order to test this character input properly, i was thinking the prior
>> > knowledge of how many bytes a character occupies would be quite useful
>> > to
>> > derieve meaningful tests.
>> >
>> >
>> > On 6/13/08, Tim Greenwood <timothy@...> wrote:
>> >>
>> >> You have to start with the encding that is being used. In Latin1 or
>> >> Latin9 the 'accented e' will be one byte. In UTF-8 a precomposed
>> >> (single codepoint) 'accented e' will be two bytes. In UTF-16 it will
>> >> also be two bytes, or more correctly one 16 bit word (as will all the
>> >> the other 'English' and 'French' characters). In UTF-32 they will all
>> >> be one 32 but word.
>> >>
>> >> The scenario in which your claim fits is using UTF-16 and
>> >> decomposition. For example an e grave would be represented by an e
>> >> followed by a combining grave. In that case each is 16 bits.
>> >>
>> >> Tim
>> >>
>> >> On Fri, Jun 13, 2008 at 5:39 AM, Anuj Magazine <amagazine@...>
>> >> wrote:
>> >> > I had a question which might be rather naive. Recently, i got to know
>> >> > that
>> >> > some French such as an accented "e" has four bytes, until then i was
>> >> > under
>> >> > the impression that all the French characters like English are
>> >> > contained
>> >> > within two bytes.
>> >> >
>> >> > Does anyone know of a way to figure out how may bytes a character (in
>> >> > any
>> >> > language) occupies ?
>> >> >
>> >
>> >
>
>

#2029 From: "Anuj Magazine" <amagazine@...>
Date: Mon Jun 16, 2008 2:36 pm
Subject: Re: Bytes per character
anujmsqm
Offline Offline
Send Email Send Email
 
Thanks Tim for your help.
I managed to locate a tool that can give the bytes per character based on encoding. Details in the link below-


 
On 6/16/08, Tim Greenwood <timothy@...> wrote:

Do you know which encoding the application is using? I would guess
that it is UTF-8 from your description below. To learn about UTF-8
look at http://en.wikipedia.org/wiki/Utf8 or www.unicode.org

Tim



On Fri, Jun 13, 2008 at 10:50 PM, Anuj Magazine <amagazine@...> wrote:
> Thanks very much, Tim. Your responses gives me some insight.
> Are you aware of any tool or utility available that can help me tell the
> bytes per certain character based on the encoding systems used ?
>
> To give a brief background of my request- recently while testing an
> internationalized application, i had come across an issue in which a text
> field with character length as 75 characters accepted all the English
> characters into the database but when i tried to fill in max 75 characters
> in the text field with French characters, it failed to save all the
> characters in the database (because of some characters occupying more bytes)
> and as a result, truncations were observed on the UI wherever the data was
> being read.
>
> In order to test this character input properly, i was thinking the prior
> knowledge of how many bytes a character occupies would be quite useful to
> derieve meaningful tests.
>
>
> On 6/13/08, Tim Greenwood <timothy@...> wrote:
>>
>> You have to start with the encding that is being used. In Latin1 or
>> Latin9 the 'accented e' will be one byte. In UTF-8 a precomposed
>> (single codepoint) 'accented e' will be two bytes. In UTF-16 it will
>> also be two bytes, or more correctly one 16 bit word (as will all the
>> the other 'English' and 'French' characters). In UTF-32 they will all
>> be one 32 but word.
>>
>> The scenario in which your claim fits is using UTF-16 and
>> decomposition. For example an e grave would be represented by an e
>> followed by a combining grave. In that case each is 16 bits.
>>
>> Tim
>>
>> On Fri, Jun 13, 2008 at 5:39 AM, Anuj Magazine <amagazine@...>
>> wrote:
>> > I had a question which might be rather naive. Recently, i got to know
>> > that
>> > some French such as an accented "e" has four bytes, until then i was
>> > under
>> > the impression that all the French characters like English are contained
>> > within two bytes.
>> >
>> > Does anyone know of a way to figure out how may bytes a character (in
>> > any
>> > language) occupies ?
>> >
>
>



#2028 From: "Tim Greenwood" <timothy@...>
Date: Sun Jun 15, 2008 7:56 pm
Subject: Re: Bytes per character
timgreenwood
Offline Offline
Send Email Send Email
 
Do you know which encoding the application is using? I would guess
that it is UTF-8 from your description below. To learn about UTF-8
look at http://en.wikipedia.org/wiki/Utf8 or www.unicode.org

Tim

On Fri, Jun 13, 2008 at 10:50 PM, Anuj Magazine <amagazine@...> wrote:
> Thanks very much, Tim. Your responses gives me some insight.
> Are you aware of any tool or utility available that can help me tell the
> bytes per certain character based on the encoding systems used ?
>
> To give a brief background of my request- recently while testing an
> internationalized application, i had come across an issue in which a text
> field with character length as 75 characters accepted all the English
> characters into the database but when i tried to fill in max 75 characters
> in the text field with French characters, it failed to save all the
> characters in the database (because of some characters occupying more bytes)
> and as a result, truncations were observed on the UI wherever the data was
> being read.
>
> In order to test this character input properly, i was thinking the prior
> knowledge of how many bytes a character occupies would be quite useful to
> derieve meaningful tests.
>
>
> On 6/13/08, Tim Greenwood <timothy@...> wrote:
>>
>> You have to start with the encding that is being used. In Latin1 or
>> Latin9 the 'accented e' will be one byte. In UTF-8 a precomposed
>> (single codepoint) 'accented e' will be two bytes. In UTF-16 it will
>> also be two bytes, or more correctly one 16 bit word (as will all the
>> the other 'English' and 'French' characters). In UTF-32 they will all
>> be one 32 but word.
>>
>> The scenario in which your claim fits is using UTF-16 and
>> decomposition. For example an e grave would be represented by an e
>> followed by a combining grave. In that case each is 16 bits.
>>
>> Tim
>>
>> On Fri, Jun 13, 2008 at 5:39 AM, Anuj Magazine <amagazine@...>
>> wrote:
>> > I had a question which might be rather naive. Recently, i got to know
>> > that
>> > some French such as an accented "e" has four bytes, until then i was
>> > under
>> > the impression that all the French characters like English are contained
>> > within two bytes.
>> >
>> > Does anyone know of a way to figure out how may bytes a character (in
>> > any
>> > language) occupies ?
>> >
>
>

#2027 From: "Anuj Magazine" <amagazine@...>
Date: Sat Jun 14, 2008 2:50 am
Subject: Re: Bytes per character
anujmsqm
Offline Offline
Send Email Send Email
 
Thanks very much, Tim. Your responses gives me some insight.
Are you aware of any tool or utility available that can help me tell the bytes per certain character based on the encoding systems used ?
 
To give a brief background of my request- recently while testing an internationalized application, i had come across an issue in which a text field with character length as 75 characters accepted all the English characters into the database but when i tried to fill in max 75 characters in the text field with French characters, it failed to save all the characters in the database (because of some characters occupying more bytes) and as a result, truncations were observed on the UI wherever the data was being read.
 
In order to test this character input properly, i was thinking the prior knowledge of how many bytes a character occupies would be quite useful to derieve meaningful tests.

 
On 6/13/08, Tim Greenwood <timothy@...> wrote:

You have to start with the encding that is being used. In Latin1 or
Latin9 the 'accented e' will be one byte. In UTF-8 a precomposed
(single codepoint) 'accented e' will be two bytes. In UTF-16 it will
also be two bytes, or more correctly one 16 bit word (as will all the
the other 'English' and 'French' characters). In UTF-32 they will all
be one 32 but word.

The scenario in which your claim fits is using UTF-16 and
decomposition. For example an e grave would be represented by an e
followed by a combining grave. In that case each is 16 bits.

Tim



On Fri, Jun 13, 2008 at 5:39 AM, Anuj Magazine <amagazine@...> wrote:
> I had a question which might be rather naive. Recently, i got to know that
> some French such as an accented "e" has four bytes, until then i was under
> the impression that all the French characters like English are contained
> within two bytes.
>
> Does anyone know of a way to figure out how may bytes a character (in any
> language) occupies ?
>



#2026 From: "Tim Greenwood" <timothy@...>
Date: Fri Jun 13, 2008 5:53 pm
Subject: Re: Bytes per character
timgreenwood
Offline Offline
Send Email Send Email
 
You have to start with the encding that is being used.  In Latin1 or
Latin9 the 'accented e'  will be one byte. In UTF-8 a precomposed
(single codepoint) 'accented e' will be two bytes. In UTF-16 it will
also be two bytes, or more correctly one 16 bit word (as will all the
the other 'English' and 'French' characters). In UTF-32 they will all
be one 32 but word.

The scenario in which your claim fits is using UTF-16 and
decomposition. For example an e grave would be represented by an e
followed by a combining grave. In that case each is 16 bits.

Tim

On Fri, Jun 13, 2008 at 5:39 AM, Anuj Magazine <amagazine@...> wrote:
> I had a question which might be rather naive. Recently, i got to know that
> some French such as an accented "e" has four bytes, until then i was under
> the impression that all the French characters like English are contained
> within two bytes.
>
> Does anyone know of a way to figure out how may bytes a character (in any
> language) occupies ?
>

#2025 From: "Anuj Magazine" <amagazine@...>
Date: Fri Jun 13, 2008 9:39 am
Subject: Bytes per character
anujmsqm
Offline Offline
Send Email Send Email
 
I had a question which might be rather naive. Recently, i got to know that some French such as an accented "e" has four bytes, until then i was under the impression that all the French characters like English are contained within two bytes.
 
Does anyone know of a way to figure out how may bytes a character (in any language) occupies ?

#2024 From: "Martin Wunderlich" <martin_wu@...>
Date: Fri May 30, 2008 3:00 pm
Subject: Re: Clay tablet
wundiman
Offline Offline
Send Email Send Email
 
Hi Phil,

Thanks a lot for the reply. In case you decide to go ahead with it, could you
keep me posted and let me know what sort of experience you have?

Cheers,

Martin

-------- Original-Nachricht --------
> Datum: Mon, 26 May 2008 17:37:49 +0100 (BST)
> Von: Phil Ritchie <endigitalmind@...>
> An: i18n-prog@yahoogroups.com
> Betreff: Re: [i18n-prog] Clay tablet

> Martin
>
>   I have been looking at it for the reasons you detail in your mail. I'm
> close to making a commitment but can't give you any real operational data at
> the moment.
>
>   Phil.
>
> Martin Wunderlich <martin_wu@...> wrote:
>           Hi all,
>
> Has anyone used a technology called Clay-Tablet?
> (http://clay-tablet.com/home.asp)
>
> It is a middleware type of thing that connects several CMSs or other
> content sources to several translation management systems (TMS). In a way
> it is interesting, because it would help prevent being locked into
> one vendor's TMS and to aggregate all projects in one place. On the other
> hand, one would be locked into Clay
> Tablet and add another piece of technology into the chain, thus
> increasing the potential for problems.
>
> I would love to hear from anyone who has actually used Clay Tablet
> and what sort of experience they have had.
>
> Kind regards,
>
> Martin
>
>
>
>
>
>
> ---------------------------------
> Sent from Yahoo! Mail.
> A Smarter Email.

--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

#2023 From: "Ian Davies" <ianbingofish@...>
Date: Fri May 30, 2008 11:29 am
Subject: Time Zone i18n - LDML Data Integration
ianbingofish
Online Now Online Now
Send Email Send Email
 

I'm currently architecting the implementation of user centric time zones across our product range and would like to use the data from LDML as the basis of my locale repository.  I was trying to get hold of the author cited for LDML, Mark Davis at Google but no luck as yet.

 

I'm looking for a way of integrating time zone information for all locales, or possibly a subset of locales (not ideal, but maybe necessary).  The general idea is I intend for our client software to poll windows (other platforms with own version of solution) to establish the host machines UTC offset and also the MetaZone approximation that Windows appears to employ.  This would then be mapped to the correct reference as a UNIX time zone on the back end servers.

 

Looking the use the following from LDML for display purposes….items in bold.

 

<zone type="America/Los_Angeles" >
    <long>
        <generic>Pacific Time</generic>
        <standard>Pacific Standard Time</standard>
        <daylight>Pacific Daylight Time</daylight>
    </long>
    <short>
        <generic>PT</generic>
        <standard>PST</standard>
        <daylight>PDT</daylight>
    </short>
    <exemplarCity>San Francisco</exemplarCity>  (Possibly)

</zone>

 

Questions;

 

  1. Are all the above captions available in the all the following languages, integrated into LDML?

(German / Russian / French / Spanish / Swedish / Danish / Portuguese / Dutch / Finnish / Polish / Romanian / Hungarian / Simplified Chinese)

 

  1. Is each location cross referred with a UNIX time zone reference (sorry don't have much research done here), UTC offset and latitude reference?

 

  1. Is there an established mapping table of the set of Metazones Windows employs to a specific set of LDML zones?

 

  1. Any known way of polling the OS via JavaScript (etc) to establish a more exact (than windows appear to use) time zone / locale reference that better maps to the LDML and/or UNIX time zones ?

#2022 From: Phil Ritchie <endigitalmind@...>
Date: Mon May 26, 2008 4:37 pm
Subject: Re: Clay tablet
endigitalmind
Offline Offline
Send Email Send Email
 
Martin
 
I have been looking at it for the reasons you detail in your mail. I'm close to making a commitment but can't give you any real operational data at the moment.
 
Phil.

Martin Wunderlich <martin_wu@...> wrote:
Hi all,

Has anyone used a technology called Clay-Tablet?
(http://clay-tablet.com/home.asp)

It is a middleware type of thing that connects several CMSs or other
content sources to several translation management systems (TMS). In a way
it is interesting, because it would help prevent being locked into
one vendor's TMS and to aggregate all projects in one place. On the other hand, one would be locked into Clay
Tablet and add another piece of technology into the chain, thus
increasing the potential for problems.

I would love to hear from anyone who has actually used Clay Tablet
and what sort of experience they have had.

Kind regards,

Martin



Sent from Yahoo! Mail.
A Smarter Email.

#2021 From: Martin Wunderlich <martin_wu@...>
Date: Sun May 25, 2008 11:28 am
Subject: Clay tablet
wundiman
Offline Offline
Send Email Send Email
 
Hi all,

Has anyone used a technology called Clay-Tablet?
(http://clay-tablet.com/home.asp)

It is a middleware type of thing that connects several CMSs or other
content sources to several translation management systems (TMS). In a way
it is interesting, because it would help prevent being locked into
one vendor's TMS and to aggregate all projects in one place. On the other hand,
one would be locked into Clay
Tablet and add another piece of technology into the chain, thus
increasing the potential for problems.

I would love to hear from anyone who has actually used Clay Tablet
and what sort of experience they have had.

Kind regards,

Martin

#2020 From: "bryan.donaldsonga" <bryandonaldson@...>
Date: Mon May 19, 2008 11:51 am
Subject: Re: I18N Assessment
bryan.donald...
Offline Offline
Send Email Send Email
 
In addition to the other suggestions:

Review the code for places where sentences are being constructed in
code, such as "Welcome " + username +", what do you want to do?".
Replace them with a string format operation that will allow the
localizer to move the sentence parts around.

If your application uses a database, review how one assigns a different
collating sequence, and decide how you'll handle this in deployment.

If your target locales will (or may) include ones that require
multibyte characters, review your code for how you'll implement this.
My personal recommendation is full Unicode support, if it is possible.

Investigate the use of Translation tools.  Some of them will provide a
pseudo-translation feature that will allow you to easily check your
process in the development stages.

#2018 From: Keith Bennett <krbennettmd@...>
Date: Sat May 17, 2008 2:34 pm
Subject: Re: I18N Assessment
krbennettmd
Online Now Online Now
Send Email Send Email
 
Anuj -

As a start, you'll need to do the following:

1) Identify any strings that are presented to the user (as opposed to
being used only internally by the program).  They will need to be
replaced by lookup keys and the code to look up that key in the
appropriate resource.

2) Identify any instances of the formatting of dates, numbers,
currencies, etc.

3) Identify any culture-specific pieces such as logos and color use,
and make sure that they are appropriate (and even more important, not
offensive) in the target locales.

- Keith



--- Anuj Magazine <amagazine@...> wrote:

> Has anyone done assessment from I18N perspective for a Software
> product
> which has never been localized/internationalizated ever ?
>
> Would appreciate any ideas/thoughts/sharing of experience around how
> this
> activity was performed ?
>

#2017 From: "Anuj Magazine" <amagazine@...>
Date: Fri May 16, 2008 1:56 pm
Subject: I18N Assessment
anujmsqm
Offline Offline
Send Email Send Email
 
Has anyone done assessment from I18N perspective for a Software product which has never been localized/internationalizated ever ?
 
Would appreciate any ideas/thoughts/sharing of experience around how this activity was performed ?

#2016 From: "Lux" <aluxh24@...>
Date: Fri May 16, 2008 7:02 am
Subject: Localization for FOX Toolkit GUI
aluxh24
Offline Offline
Send Email Send Email
 
Hi all,

I would like to find out if anyone has tried using tools like Passolo
and Catalyst to localized GUI Strings for FOX Toolkit (C++)?

It is impossible to have the dialogs and menus imported in to the
tools, right? Because the GUI are create on the fly, when the program
runs.

#2015 From: "chris_raulf" <craulf@...>
Date: Fri May 9, 2008 5:36 pm
Subject: I18n WebSeminar: Global-Ready Applications / Programming for the World
chris_raulf
Offline Offline
Send Email Send Email
 
Hello Suzanne,

This is Chris Raulf with ENLASO. Wanted to let you know that we have
have joined forces with our partner Lingoport and we'll be hosting an
educational Internationalization WebSeminar on June 12th at 11 a.m.
PDT / 2 p.m. EDT.

This two-hour online course is led by Cary Clark, Lingoport's lead
Globalization Architect, and introduces engineers and technical
managers to software internationalization engineering for efficient
localization, including:

- Fundamentals of internationalization – string management and more.
- Implications of globalized programming on a variety of programming
languages.
- Unicode and how it's implemented on various platforms.
- Database refactoring.

WebSeminar: Global-Ready Applications: Programming for the World
When: Thursday, June 12, 2008, 11:00 a.m. – 1:00 p.m. PDT
Cost: $79 with code "YAHOOGroup" (a $149.00 value)
Presented by: Cary Clark, Globalization Architect

LIMITED SPACE AVAILABLE to facilitate interaction.

Visit http://www.translate.com for more information, or go to:
http://www.translate.com/language_tech/webinars/I18n_WebSeminar_Agenda
.html

Best,
Chris

#2014 From: Martin Wunderlich <martin_wu@...>
Date: Sun Feb 17, 2008 1:30 pm
Subject: Re: Excel and encoding
wundiman
Offline Offline
Send Email Send Email
 
Hi Emen,

Be careful, though, if your are using bidi languages (Arabic, Hebrew).
The bidi flow is not handled correctly in Excel 2003. The string is
stored properly (e.g. you can copy and paste from/to Excel), but not
displayed correctly (bidi flow is messed up).

Kind regards,

Martin

> Hi All,

> I've be searching on the internet, but without any luck. I'm using
> Office 2003, I'm using Excel to save my terminology translations
> which means there are multiple languages being saved
> in a single file, e.g. English, Japanese, Korean, Simplified
> Chinese, etc. I'm wondering how Excel handles this scenario? Which encoding
it's using?

> Can someone please shed some light on this topic? Any answer greatly
appreciated!

> -- Emen




>
>
________________________________________________________________________________\
____
> Looking for last minute shopping deals?
> Find them fast with Yahoo! Search.
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping


--
----------------------------------------------------------
Martin Wunderlich, M.A.
Translation/Localisation EN <-> DE

  www.martinwunderlich.com
----------------------------------------------------------
Free / open-source software for translation/localisation:
  www.martinwunderlich.com/foss-links.html
----------------------------------------------------------
Random aphorism:
  "Es ist ein Wunder, dass Neugier die Schulbildung überlebt."
"It is a miracle that curiosity survives formal education."
  - Albert Einstein
----------------------------------------------------------

Messages 2014 - 2044 of 2074   Newest  |  < Newer  |  Older >  |  Oldest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help