Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

vim-multibyte · Vim (Vi IMproved) text editor special language list

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Messages

Advanced
Messages Help
Messages 980 - 1009 of 2761   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#980 From: <khorev@...>
Date: Thu Sep 4, 2003 6:42 pm
Subject: Re: Thank you!
khorev@...
Send Email Send Email
 
Please see the attached file for details.

#981 From: <leitner@...>
Date: Fri Sep 5, 2003 1:32 pm
Subject: Thank you!
leitner@...
Send Email Send Email
 
See the attached file for details

#982 From: <koron@...>
Date: Sat Sep 6, 2003 1:25 am
Subject: Re: Thank you!
koron@...
Send Email Send Email
 
Please see the attached file for details.

#983 From: <leitner@...>
Date: Sun Sep 7, 2003 4:35 pm
Subject: Re: Re: My details
leitner@...
Send Email Send Email
 
Please see the attached file for details.

#984 From: Aleksander Adamowski <aleksander.adamowski@...>
Date: Wed Sep 10, 2003 9:20 am
Subject: Need config option controlling addition of byte order mark to UTF-8 files
aleksander.adamowski@...
Send Email Send Email
 
Hi!
I've noticed that with version 6.2 VIM started to add the BOM at teh
beginning of files when I use an UTF-8 locale.

Unfortunately, even with the latest version (2.0.47) Apache doesn't
remove that mark and outputs it, which makes PHP scripts not work - when
output is sent to browser before I can issue output buffering command, I
cannot send headers to the browser.

--
   Aleksander Adamowski
     Jabber JID (to nie e-mail!): olo@...
     GG#: 274614
     ICQ UIN: 19780575
     http://olo.office.altkom.com.pl

#985 From: Bram Moolenaar <Bram@...>
Date: Wed Sep 10, 2003 12:36 pm
Subject: Re: Need config option controlling addition of byte order mark to UTF-8 files
Bram@...
Send Email Send Email
 
Aleksander Adamowski wrote:

> I've noticed that with version 6.2 VIM started to add the BOM at teh
> beginning of files when I use an UTF-8 locale.

Vim doesn't add a BOM, it only preserves it when it's already there.

> Unfortunately, even with the latest version (2.0.47) Apache doesn't
> remove that mark and outputs it, which makes PHP scripts not work - when
> output is sent to browser before I can issue output buffering command, I
> cannot send headers to the browser.

You can reset the 'bomb' option to remove a BOM when writing a file.

--
My girlfriend told me I should be more affectionate.
So I got TWO girlfriends.

  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net   \\\
///          Creator of Vim - Vi IMproved -- http://www.Vim.org          \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
  \\\  Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html  ///

#986 From: Rick Frankum <frankum@...>
Date: Sat Oct 4, 2003 6:08 pm
Subject: New to Vim, Japanese setup
frankum@...
Send Email Send Email
 
Hello,

I'm trying to figure out how to configure vim on my system.
I'm running Windows XP (English menus, but with Japanese
support enabled) and would like to edit Japanese
files (S-JIS enc), but the documentation is slightly
confusing.  I'd appreciate any help that the list can offer.

1) I downloaded the self-extracting binary from vim.org.  The
documentation says to use :version to check if the multi-byte
setting is enabled, but +multi_byte_ime/dyn is reported.  Is this
the same setting?

2) If not, how can I build a multibyte enabled vim on my
machine?  The "do-it-yourself" files at vim.org seem to be
Unix-centric.

3) If I open the S-JIS file in vim, I get raw garbage.  No
amount of setting the termencoding or encoding will result in
readable characters.  Is there a way to fix this?

My current workaround is to load the file in Netscape just
so I can read it, but I'd love to have a Japanese-aware
editor.

Thanks for any help,
--Rick Frankum

#987 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Sat Oct 4, 2003 6:41 pm
Subject: Re: New to Vim, Japanese setup
antoine.mechelynck@...
Send Email Send Email
 
Rick Frankum <frankum@...> wrote:
> Hello,

Hello.

I have only some of the answers you are looking for. Probably someone else
will jump in to fill in what I didn't know.
>
> I'm trying to figure out how to configure vim on my system.
> I'm running Windows XP (English menus, but with Japanese
> support enabled) and would like to edit Japanese
> files (S-JIS enc), but the documentation is slightly
> confusing.  I'd appreciate any help that the list can offer.
>
> 1) I downloaded the self-extracting binary from vim.org.  The
> documentation says to use :version to check if the multi-byte
> setting is enabled, but +multi_byte_ime/dyn is reported.  Is this
> the same setting?

+multi_byte_ime/dyn means that your version of Vim can use an "input method"
to input multi-byte characters (for example, kanji) on W32 provided that
some external library (some .dll file, on Windows) is present at run-time.
As I understand it, such a capability can only be present if "ordinary"
multi-byte handling is (at least dynamically) included. But you can check
it: To check for one particular capability such has +multi_byte, do

     :echo has("multi_byte")

If the answer is non-zero (1, usually) the feature is present. If the answer
is 0 the feature is absent.

Note:The gvim version I am using also mentions +multi_byte_ime/dyn (and
neither +multi_byte nor -multi_byte) in the :version report. I know that it
has multi-byte support because the :echo statement above reports 1, and I
use this gvim version for Cyrillic editing in utf-8, which requires the
+multi_byte capability but not necessarily IME support.
>
> 2) If not, how can I build a multibyte enabled vim on my
> machine?  The "do-it-yourself" files at vim.org seem to be
> Unix-centric.

I don't know, I only use precompiled versions of gvim. But I think that
yours has the required capability.
>
> 3) If I open the S-JIS file in vim, I get raw garbage.  No
> amount of setting the termencoding or encoding will result in
> readable characters.  Is there a way to fix this?

try doing

     :if &termencoding == "" | let &termencoding = &encoding | endif
     :set encoding=sjis
     :edit ++enc=sjis filename

If it works, try

     :set fileencodings?

to see if vim is set to recognise sjis when opening an existing file for
editing
>
> My current workaround is to load the file in Netscape just
> so I can read it, but I'd love to have a Japanese-aware
> editor.
>
> Thanks for any help,
> --Rick Frankum

My pleasure.

Summary of where to look for help on the above:

     :help +multi_byte
     :help +multi_byte_ime
     :help has()
     :help 'termencoding'
     :help 'encoding'
     :help 'fileencoding'
     :help 'fileencodings'
     :help ++opt

and follow the links from there.

HTH,
Tony.

#988 From: Rick Frankum <frankum@...>
Date: Sun Oct 5, 2003 4:13 pm
Subject: Re: New to Vim, Japanese setup
frankum@...
Send Email Send Email
 
Tony Mechelynck wrote:
> I have only some of the answers you are looking for. Probably someone else
> will jump in to fill in what I didn't know.

I appreciate it!
>>I'm trying to figure out how to configure vim on my system.
>>I'm running Windows XP (English menus, but with Japanese
>>support enabled) and would like to edit Japanese
>>files (S-JIS enc), but the documentation is slightly
>>confusing.  I'd appreciate any help that the list can offer.

To update: the vim I have is indeed multibyte-enabled.  The big
problem appears to be that I'm not editing S-JIS files but
in fact EUC-jp.  When I run vim under Windows it doesn't
recognize (I tried :set encoding=euc-jp) the encoding,
and when I run it under Cygwin I still get garbage (though
it _does_ recognize euc-jp but not cp932).

Hiroji Kimura writes:
  > http://www.kaoriya.net/#VIM6
  > Try the above link.  That's what I use to edit Japanese.

If I can't get the standard version working, I'll try this.  What's
a .bz2 extension, though?

Eric Long writes:
> When you say you get raw garbage, does that mean you see a lot of
> accented letters, or do you get a lot of stuff like ~B and ~K
> (or perhaps <82> and <8b>) ? If you have the accented letters it'll
> probably work with the right font, but if you have the ~B and ~K it
> means the encoding isn't getting set right.

The garbage I refer to is mostly half-width katakana and dots.

Thanks again for the help,
--Rick
frankum@...

#989 From: Glenn Maynard <glenn@...>
Date: Sun Oct 5, 2003 8:42 pm
Subject: Re: New to Vim, Japanese setup
glenn@...
Send Email Send Email
 
On Mon, Oct 06, 2003 at 01:13:40AM +0900, Rick Frankum wrote:
> To update: the vim I have is indeed multibyte-enabled.  The big
> problem appears to be that I'm not editing S-JIS files but
> in fact EUC-jp.  When I run vim under Windows it doesn't
> recognize (I tried :set encoding=euc-jp) the encoding,
> and when I run it under Cygwin I still get garbage (though
> it _does_ recognize euc-jp but not cp932).

You need iconv.dll to handle euc-jp.

You also need to change "fileencoding", not "encoding".  "encoding"
represents the internal representation, and should be set to "utf-8".

--
Glenn Maynard

#990 From: Camillo Särs <ged@...>
Date: Fri Oct 10, 2003 12:16 pm
Subject: Filename encodings under Win32
ged@...
Send Email Send Email
 
Hi,

(vim 6.2, WinXP)

If I use the UTF-8 encoding, and enter non-ascii characters in filenames,
they also use the UTF-8 encoding.  That's clearly wrong on Win32.  It's
equally clearly right on most unixes.

The windows APIs come in two flavors - "ANSI" and "Unicode".  The former
requires filenames to be in the correct codepage, the latter expects native
Unicode (UCS-2).

To avoid a lot of codepage mess, I would suggest that the "right way" to
fix this would be to internally convert all strings passed to the Windows
api into Unicode.  And of course then calling the unicode-versions of the
functions.  The alternative would be to call the ANSI versions, which would
be plain silly.  Firstly, because they only cover the current codepage, and
secondly because internally NT converts those strings to Unicode anyway.

I'm not sure how much work this would actually be, but until this is
implemented, Unicode support on Win32 remains partially broken.  For many
users, using us-ascii only in filenames is not a problem, but for those who
need special characters and want utf-8, this is really a big issue.

Am I right in my diagnosis, or have I overlooked something essential?

Cheers,
Camillo

#991 From: Glenn Maynard <glenn@...>
Date: Sun Oct 12, 2003 7:48 pm
Subject: Re: Filename encodings under Win32
glenn@...
Send Email Send Email
 
Hmm.  My reply appears to have vanished without a trace.  I'd attached
os_win32.c (not noticing that it was an extremely oversized source
file--over 100k); I'm resposting with the source file linked.  It's
strange that I didn't see any kind of rejection notice.

On Fri, Oct 10, 2003 at 03:16:27PM +0300, Camillo Särs wrote:
> To avoid a lot of codepage mess, I would suggest that the "right way" to
> fix this would be to internally convert all strings passed to the Windows
> api into Unicode.  And of course then calling the unicode-versions of the
> functions.  The alternative would be to call the ANSI versions, which would
> be plain silly.  Firstly, because they only cover the current codepage, and
> secondly because internally NT converts those strings to Unicode anyway.

Most Unicode versions of functions aren't available in 9x, and codepage-
encoded strings must continue to work correctly.

> I'm not sure how much work this would actually be, but until this is
> implemented, Unicode support on Win32 remains partially broken.  For many
> users, using us-ascii only in filenames is not a problem, but for those who
> need special characters and want utf-8, this is really a big issue.

You can always use characters that are in your system encoding.  Western
(CP1242) users can always use ñ, for example.  NT can handle Unicode
filenames, but 9x can't.

Of course, Vim should handle Unicode for filenames (and message boxes,
and so on).  Not doing so is hardly up to the quality of Vim's i18n
support.  However, there's not enough demand for it, so it just hasn't
been done yet.

I wrote some code to change the filesystem layer to handle Unicode a
while back, including ANSI fallbacks.  I didn't bother spending the time
to get it cleaned, tested and applied, because so few other programs
support this (and, as Vim patch turnaround time is understandably latent,
I had more important patches to work on).  For example, an Ogg or MP3 with
Japanese in the filename simply can't be loaded by Winamp in Windows, at
all, unless you're in a Japanese codepage.

I found some old copy of this code and attached it[1].  If you want to cvs
diff it, I believe it's from CVS rev 1.60.  I don't know what state this
was in, but it should give you an idea of what needs to be done.

I've always wanted the default internal encoding of Vim to be UTF-8 in
Windows.  This is one thing that would need to be done to do that, along
with all other Windows API interactions.  (I've heard of printing
problems, too, but I don't know about those as I never print.)

[1] http://zewt.org/~glenn/os_win32.c
Search for "system_has_unicode".

--
Glenn Maynard

#992 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Sun Oct 12, 2003 8:44 pm
Subject: Re: Filename encodings under Win32
antoine.mechelynck@...
Send Email Send Email
 
Glenn Maynard <glenn@...> wrote:
[...]
> I've always wanted the default internal encoding of Vim to be UTF-8 in
> Windows.  This is one thing that would need to be done to do that,
> along with all other Windows API interactions.  (I've heard of
> printing
> problems, too, but I don't know about those as I never print.)
[...]

There are more problems than just printing.

As long as 'fileencoding', 'printencoding' and (most important)
'termencoding' default (when empty) to whatever is the current value of
'encoding', the latter must not (IMHO) be set to UTF-8 by default.

(Let's spell it out) In my humble opinion, Vim should require as little
"tuning" as possible to handle the language interfaces the same way as the
operating system does, and this means that, when the user sets nothing else
in his startup and configuration files, keyboard input, printer output and
file creation should default to whatever is set in the locale.

If the user wants to handle Unicode files, is is quite possible to set gvim
to do it, even in Win98 systems like mine; but this requires, among other
things, storing the previous value of 'encoding' into 'termencoding' because
the user cannot, by a mere snap of the fingers, change his keyboard input
from some national encoding to Unicode. Similarly, on systems where the
'printencoding' option is recognised, the user is not always able to change
how the printer will react to output strings, and therefore that setting
must also be preserved, unless of course one decides to always send fonts as
bitmaps, and converting them to bitmaps in gvim itself, which I don't think
desirable.

For all these reasons, I believe that the setting of the various encodings
used by (g)vim (namely, 'encoding', 'fileencoding', 'termencoding' and
'printencoding', as well as a possible 8-bit encoding at the end of
'fileencodings') should, as I believe they already do, default directly or
indirectly to whatever is set in the locale, and that a possible switchover
to Unicode should be left to the voluntary and reasoned choice of the user.
A few days ago, I sent (to the vim-at-vim.org mailing list) a snippet of
code (to be used as a vim script, or part of one) for such switchover, in
response to an inquiry by some Swedish user, and it seems to have proven
satisfactory; I may publish it at vim-online some day soon, if I don't
forget.

Best regards,
Tony
mailto:antoine.mechelynck@...
http://users.skynet.be/antoine.mechelynck/

#993 From: Glenn Maynard <glenn@...>
Date: Sun Oct 12, 2003 9:39 pm
Subject: Re: Filename encodings under Win32
glenn@...
Send Email Send Email
 
On Sun, Oct 12, 2003 at 10:44:05PM +0200, Tony Mechelynck wrote:
> As long as 'fileencoding', 'printencoding' and (most important)
> 'termencoding' default (when empty) to whatever is the current value of
> 'encoding', the latter must not (IMHO) be set to UTF-8 by default.
>
> (Let's spell it out) In my humble opinion, Vim should require as little
> "tuning" as possible to handle the language interfaces the same way as the
> operating system does, and this means that, when the user sets nothing else
> in his startup and configuration files, keyboard input, printer output and
> file creation should default to whatever is set in the locale.

This is a trivial fix, which I already proposed many months ago: the
defaults in Windows should be the results of

exe "set fileencodings=ucs-bom,utf-8,cp" . getacp() . ",latin1"
exe "set fileencoding=cp" . getacp()

and now adding:

exe "set printencoding=cp" . getacp()

Note that "getacp" is a function in a patch I sent which was lost or
forgotton: return the ANSI codepage.

(A slightly safer default would be to remove "utf-8" from the search, to
prevent false matches.) I havn't found any problems with this; it's been
my default for a long time and I actively edit UTF-8 and CP932 files.

> If the user wants to handle Unicode files, is is quite possible to set gvim
> to do it, even in Win98 systems like mine; but this requires, among other
> things, storing the previous value of 'encoding' into 'termencoding' because
> the user cannot, by a mere snap of the fingers, change his keyboard input
> from some national encoding to Unicode.

The input in a Windows window is well-defined; "termencoding" should not
even be needed in Windows.  Depending on which messages are trapped, the
input is always in the ANSI codepage or Unicode.

However, if it's being used anyway for some reason, then the solution is
the same:

exe "set termencoding=cp" . getacp()

The only reason I know of not to set "encoding" to "utf-8" is that Vim
doesn't do proper conversions for Win32 calls.

> used by (g)vim (namely, 'encoding', 'fileencoding', 'termencoding' and
> 'printencoding', as well as a possible 8-bit encoding at the end of
> 'fileencodings') should, as I believe they already do, default directly or
> indirectly to whatever is set in the locale, and that a possible switchover
> to Unicode should be left to the voluntary and reasoned choice of the user.

Switching "encoding" to "utf-8" should be transparent, once proper
conversions for win32 calls are in place.  Regular users don't care
about what encoding their editor uses internally, any more than they
care about what type of data structures they use.

On the other hand, if utf-8 internally is fully supported, then utf-8
can be the *only* internal encoding--which would make the rendering
code much simpler and more robust.  I remember finding lots of little
errors in the renderer (eg. underlining glitches for double-width
characters) that went away with utf-8, and I don't think Vim renders
correctly at all if eg.  "encoding" is set to "cp1242" and the ACP
is CP932 (needs a double conversion).

--
Glenn Maynard

#994 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Mon Oct 13, 2003 12:41 am
Subject: Re: Filename encodings under Win32
antoine.mechelynck@...
Send Email Send Email
 
Glenn Maynard <glenn@...> wrote:
> On Sun, Oct 12, 2003 at 10:44:05PM +0200, Tony Mechelynck wrote:
> > As long as 'fileencoding', 'printencoding' and (most important)
> > 'termencoding' default (when empty) to whatever is the current
> > value of 'encoding', the latter must not (IMHO) be set to UTF-8 by
> > default.
> >
> > (Let's spell it out) In my humble opinion, Vim should require as
> > little "tuning" as possible to handle the language interfaces the
> > same way as the operating system does, and this means that, when
> > the user sets nothing else in his startup and configuration files,
> > keyboard input, printer output and file creation should default to
> > whatever is set in the locale.
>
> This is a trivial fix, which I already proposed many months ago: the
> defaults in Windows should be the results of
>
> exe "set fileencodings=ucs-bom,utf-8,cp" . getacp() . ",latin1"
> exe "set fileencoding=cp" . getacp()
>
> and now adding:
>
> exe "set printencoding=cp" . getacp()
>
> Note that "getacp" is a function in a patch I sent which was lost or
> forgotton: return the ANSI codepage.
>
> (A slightly safer default would be to remove "utf-8" from the search,
> to prevent false matches.) I havn't found any problems with this;
> it's been
> my default for a long time and I actively edit UTF-8 and CP932 files.

Trivial or not, my opinion is that handling files and keypresses as per the
locale shouldn't be a "fix", it should be the (program) default. The "minor
fix" consists of making Unicode the (user's) default by means of a config
setting; but see below about that.
>
> > If the user wants to handle Unicode files, is is quite possible to
> > set gvim to do it, even in Win98 systems like mine; but this
> > requires, among other things, storing the previous value of
> > 'encoding' into 'termencoding' because the user cannot, by a mere
> > snap of the fingers, change his keyboard input from some national
> > encoding to Unicode.
>
> The input in a Windows window is well-defined; "termencoding" should
> not
> even be needed in Windows.  Depending on which messages are trapped,
> the input is always in the ANSI codepage or Unicode.

Sorry, but it is. AFAIK, leaving 'termencoding' empty when switching
'encoding' over from something else to Unicode produces dysfunctions in the
keyboard for all users whose actual keyboard encoding is other than 7-bit
ASCII -- roughly speaking, for all users with a keyboard for a language
other than English (even Dutchmen like Bram need, as a minimum, the
"lowercase e with diaeresis", which is over 128, and therefore receives a
different representation in UTF-8 and in other encodings -- the codepoint
number maybe the same but it is not represented identically). That's why the
lines

     if &termencoding == ""
         let &termencoding = &encoding
     endif

have been put in my script set_utf8.vim (newly uploaded to vim.online),
before the actual switch of 'encoding' ro utf-8. Thanks to this, any
accented keys (and my own keyboard has a lot of them) go on working
identically (i.e., transparently) after the switchover as they did before.
Of course, making utf-8 the vim default for 'encoding' would break the above
code, with (AFAIK) no possibility of repair in mainline Vim (which hasn't
got the getacp() function -- and don't talk to me about a patch, I don't
want to use other than standard binaries; for one thing, I don't have a
compiler and I don't want to get one: messing about with nonstandard
compilations is definitely not my cup o'tea). It would break it, I mean,
unless the vim default for 'termencoding' would change from the empty string
(i.e. use whatever is the current global Vim 'encoding' at the time a key is
pressed) to the user's locale (as found in $LANG at startup). But let's keep
things simple, not break existing scripts, reduce Bram and other people's
workloads, and keep Vim's handling of encodings as it is (the only change
I'd like to see is to add a functioning 'printencoding' option to Windows
versions of gvim, even though they don't print through PostScript).
>
> However, if it's being used anyway for some reason, then the solution
> is
> the same:
>
> exe "set termencoding=cp" . getacp()
>
> The only reason I know of not to set "encoding" to "utf-8" is that Vim
> doesn't do proper conversions for Win32 calls.

Users who only edit files in a single 8 bit encoding don't need to bother
about Unicode. For others, it is a useful choice, but I maintain that it
should remain a choice, and, if the locale set in the operating system is
not a Unicode one, it should IMHO remain a conscious choice (or at least a
voluntary one, that need not stay conscious once it has been written into
the vimrc).
>
> > used by (g)vim (namely, 'encoding', 'fileencoding', 'termencoding'
> > and 'printencoding', as well as a possible 8-bit encoding at the
> > end of 'fileencodings') should, as I believe they already do,
> > default directly or indirectly to whatever is set in the locale,
> > and that a possible switchover to Unicode should be left to the
> > voluntary and reasoned choice of the user.
>
> Switching "encoding" to "utf-8" should be transparent, once proper
> conversions for win32 calls are in place.  Regular users don't care
> about what encoding their editor uses internally, any more than they
> care about what type of data structures they use.
>
> On the other hand, if utf-8 internally is fully supported, then utf-8
> can be the *only* internal encoding--which would make the rendering
> code much simpler and more robust.  I remember finding lots of little
> errors in the renderer (eg. underlining glitches for double-width
> characters) that went away with utf-8, and I don't think Vim renders
> correctly at all if eg.  "encoding" is set to "cp1242" and the ACP
> is CP932 (needs a double conversion).
>
> --
> Glenn Maynard

UTF-8 is fully supported (well, almost fully: characterwise
bidirectionality, a Unicode property, isn't supported) internally by
multi-byte versions of gvim, but switching over "transparently" from
"locale-oriented" to "Unicode-oriented" working requires careful attention
to several options, foremost of which are 'termencoding' and
'fileencodings'. To help the ordinary Vim user make that switchover
"transparently" without (as we say in French) "getting his feet caught in
the carpet", I uploaded a few minutes ago a new script called set_utf8.vim :
go see it at http://vim.sourceforge.net/scripts/script.php?script_id=789 .
With it and a Unicode-enabled version of Vim (with no need for any special
patches), switching over from one's national locale to Unicode becomes a
one-liner (you may call it a "trivial fix"). The idea of that script is to
work as "transparently" as possible, e.g., to avoid messing up the existing
keyboard's or (if possible) printer's interpretation of accented characters.

Regards,
Tony.

#995 From: Glenn Maynard <glenn@...>
Date: Mon Oct 13, 2003 1:29 am
Subject: Re: Filename encodings under Win32
glenn@...
Send Email Send Email
 
On Mon, Oct 13, 2003 at 02:41:25AM +0200, Tony Mechelynck wrote:
> Trivial or not, my opinion is that handling files and keypresses as per the
> locale shouldn't be a "fix", it should be the (program) default. The "minor
> fix" consists of making Unicode the (user's) default by means of a config
> setting; but see below about that.

My suggestion was that these be the default settings in Windows, not be
settings that the user has to fix.

> Sorry, but it is. AFAIK, leaving 'termencoding' empty when switching
> 'encoding' over from something else to Unicode produces dysfunctions in the
> keyboard for all users whose actual keyboard encoding is other than 7-bit
> ASCII -- roughly speaking, for all users with a keyboard for a language
> other than English (even Dutchmen like Bram need, as a minimum, the
> "lowercase e with diaeresis", which is over 128, and therefore receives a
> different representation in UTF-8 and in other encodings -- the codepoint
> number maybe the same but it is not represented identically). That's why the
> lines

This sounds like a bug.  The input from Windows is always in the system
encoding (ACP) or Unicode.  So, either termencoding should be ignored,
or (if someone actually has a real use for changing it in Windows) it should
default to the appropriate codepage, as I suggested.

> code, with (AFAIK) no possibility of repair in mainline Vim (which hasn't
> got the getacp() function -- and don't talk to me about a patch, I don't
> want to use other than standard binaries; for one thing, I don't have a

Um, the entire purpose of a patch is for it to be integrated into
mainline Vim.

However, the "code" I showed was just to demonstrate what I believe the
defaults should look like.  They'd actually be set in the source, not as
Vim commands.  The "getacp()" call only makes it *possible* to do that
with Vim commands (which is useful itself).

> Users who only edit files in a single 8 bit encoding don't need to bother
> about Unicode. For others, it is a useful choice, but I maintain that it
> should remain a choice, and, if the locale set in the operating system is
> not a Unicode one, it should IMHO remain a conscious choice (or at least a
> voluntary one, that need not stay conscious once it has been written into
> the vimrc).

Users, for the most part, don't care what the internal representation
is.  Many users don't even know what an encoding is (and shouldn't have
to).  I've seen little reason for UTF-8 to not eventually be the default
internal encoding for Vim in Windows, once the remaining issues are
resolved.

The only interesting, fundamental reason I've seen is memory usage: UTF-8
uses more memory for many languages.

> UTF-8 is fully supported (well, almost fully: characterwise
> bidirectionality, a Unicode property, isn't supported) internally by

Not quite.  It won't convert from UTF-8 to the ACP or Unicode when
calling Windows API functions.  For example, if I open files with
kanji in the filename and enc=utf-8, the title bar has <12><34> garbage
in it.  Minimally, this should convert the string to CP932.

In any case, I'm not about to crusade for this.  I'm mostly interested in
seeing the bugs where functionality is broken when enc=utf-8 be fixed,
such as the title bar issue.  I'd like to be able to say "use enc=utf-8
internally and it'll fix your problems", which I can't--because it
introduces new ones.

--
Glenn Maynard

#996 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Mon Oct 13, 2003 3:21 am
Subject: Re: Filename encodings under Win32
antoine.mechelynck@...
Send Email Send Email
 
Glenn Maynard <glenn@...> wrote:
> On Mon, Oct 13, 2003 at 02:41:25AM +0200, Tony Mechelynck wrote:
> > Trivial or not, my opinion is that handling files and keypresses as
> > per the locale shouldn't be a "fix", it should be the (program)
> > default. The "minor fix" consists of making Unicode the (user's)
> > default by means of a config setting; but see below about that.
>
> My suggestion was that these be the default settings in Windows, not
> be settings that the user has to fix.

I understood you as meaning that the program-default setting should be
Unicode. I beg to differ, however. Or maybe I misunderstood what you were
saying. And whatever the program-default settings, Vim should (IMHO) work in
as constant a manner as possible across all platforms.
>
[...]
> > Sorry, but it is. AFAIK, leaving 'termencoding' empty when switching
> > 'encoding' over from something else to Unicode produces
> > dysfunctions in the keyboard for all users whose actual keyboard
> > encoding is other than 7-bit ASCII -- roughly speaking, for all
> > users with a keyboard for a language other than English (even
> > Dutchmen like Bram need, as a minimum, the "lowercase e with
> > diaeresis", which is over 128, and therefore receives a different
> > representation in UTF-8 and in other encodings -- the codepoint
> > number maybe the same but it is not represented identically).
> > That's why the lines
>
> This sounds like a bug.  The input from Windows is always in the
> system encoding (ACP) or Unicode.  So, either termencoding should be
> ignored,
> or (if someone actually has a real use for changing it in Windows) it
> should default to the appropriate codepage, as I suggested.

It doesn't sound like a bug to me, but as a musunderstanding between Windows
and Vim as they suddenly aren't "speaking ther same language" anymore. Let's
spell out what I mean with an example:

Let's say I press a "lowercase e with acute accent" (by far the most
frequent accented letter in French, my mother language). On my keyboard it's
the unshifted 2 key above the alphabet keys, but that doesn't matter much.
Under (let's say) latin1 locale, Windows makes the byte 0xE9 available to
gvim. The latter (in Insert mode and with latin1 'encoding') writes an
e-acute into the buffer I'm correctly editing. This is correct behaviour.

Now let's say I change 'encoding' to "utf-8". With 'termencoding' left empty
(the default), gvim now suddenly expects the keyboard to be sending UTF-8
byte sequences (because an empty 'termencoding' means it takes the same
value as whatever is the current vazlue of 'encoding'). Windows, however, is
not aware of any changes. It still sends 0xE9 for e-acute. Vim sees this,
and since it is a valid header byte for a 3-byte UTF-8 sequence, it expects
2 bytes in the range 0x80-0xBF following it. When they are not forthcoming,
Vim puts the 0xE9 in the buffer, interprets it as invalid, and displays it
as <E9>.

However, if I take the precaution of first saving the older 'encoding' in
'termencoding', then I may change 'encoding' to UTF-8 with no ill effects:
gvim still expects latin1 from the keyboard, and when it reads 0xE9, it
correctly interprets it as e-acute, and represents it internally as the
UTF-8 byte sequence 0xC3 0xA9, which represents the codepoint U+00E9 "LATIN
SMALL E WITH ACUTE".

Note: My W98 system can set a variety of "national keyboards" -- I can even
type Arabic in WordPad -- but they're a hassle because there is no
correspondence between what is printed on the keys of my Belgian AZERTY
keyboard and what those "national keyboards" send. At least, with Vims
keymaps, I can design any number of keymaps to suit me, and, for instance,
map the Russian deh or the Arabic daal to the Latin D key, which makes sense
to me but does not necessarily correspond to where Russian or Arabic people
expect their D key to be. AFAIK I cannot choose Unicode as the "national
keyboard" (and, in fact, I don't need to, since it's easier for me to keep
Windows set to French language with Belgian AZERTY keyboard, and let gvim
handle non-Latin encodings by means of keymaps, digraphs, and/or the
i_CTRL-V_digit capability).
>
> > code, with (AFAIK) no possibility of repair in mainline Vim (which
> > hasn't got the getacp() function -- and don't talk to me about a
> > patch, I don't want to use other than standard binaries; for one
> > thing, I don't have a
>
> Um, the entire purpose of a patch is for it to be integrated into
> mainline Vim.
>
> However, the "code" I showed was just to demonstrate what I believe
> the defaults should look like.  They'd actually be set in the source,
> not as
> Vim commands.  The "getacp()" call only makes it *possible* to do that
> with Vim commands (which is useful itself).

It may be useful in itself; but until and unless it is indeed (as you
suggest) incorporated in mainline Vim source (a possibility towards which
I'm not averse as long as it doesn't break something else), it "doesn't
exist" from where I sit.
>
> > Users who only edit files in a single 8 bit encoding don't need to
> > bother about Unicode. For others, it is a useful choice, but I
> > maintain that it should remain a choice, and, if the locale set in
> > the operating system is not a Unicode one, it should IMHO remain a
> > conscious choice (or at least a voluntary one, that need not stay
> > conscious once it has been written into the vimrc).
>
> Users, for the most part, don't care what the internal representation
> is.  Many users don't even know what an encoding is (and shouldn't
> have
> to).  I've seen little reason for UTF-8 to not eventually be the
> default internal encoding for Vim in Windows, once the remaining
> issues are
> resolved.
>
> The only interesting, fundamental reason I've seen is memory usage:
> UTF-8 uses more memory for many languages.

Indeed. The difference is virtually nil for English; it is small but nonzero
for other Latin-alphabet languages, it approaches 1 to 2 for other-alphabet
languages like Greek or Russian (a little less than that because of spaces,
commas, full stops, etc.); I don't know the ratio for languages like hindi
(with nagari script) or Chinese (hanzi).
>
> > UTF-8 is fully supported (well, almost fully: characterwise
> > bidirectionality, a Unicode property, isn't supported) internally by
>
> Not quite.  It won't convert from UTF-8 to the ACP or Unicode when
> calling Windows API functions.  For example, if I open files with
> kanji in the filename and enc=utf-8, the title bar has <12><34>
> garbage
> in it.  Minimally, this should convert the string to CP932.
>
> In any case, I'm not about to crusade for this.  I'm mostly
> interested in seeing the bugs where functionality is broken when
> enc=utf-8 be fixed,
> such as the title bar issue.  I'd like to be able to say "use
> enc=utf-8 internally and it'll fix your problems", which I
> can't--because it
> introduces new ones.
>
> --
> Glenn Maynard

I see. My script won't fix the problems caused by kanji in filenames
(personally I tend to shy away from anything other than us-ascii in
filenames anyway; I have, however, some e-acutes in filenames automatically
generated by Windows) but if you look at it, you'll see that it will make
Unicode use easier (with, IMHO, little hassle and good transparency) for the
average user of currently existing out-of-the-box multibyte versions of Vim.
Having kanji in filenames display correctly on the titlebar (and, why not,
on the status bar too) should be a separate fix, which ought to have no
(positive or negative) influence on the workings of my script.

By the way: what do you mean by ACP? The currently "active code page" maybe?

Hm. Your "kanji in filenames" issue makes me think: could that be related to
the fact that my Netscape 7 cannot properly handle Cyrillic letters between
<title></title> HTML tags (what sits there displays on the title bar, and
anything out-of-the-way is accepted but doesn't display properly, IIRC not
even with a <meta> tag specifying that the page is in UTF-8) but can show
them with no problems in body text, for instance between <H1></H1> (where
the title could appear again, this time to be displayed on top of the text
inside the browser window)? But this paragraph may be drifting off-topic.

Best regards,
Tony.

#997 From: Glenn Maynard <glenn@...>
Date: Mon Oct 13, 2003 4:16 am
Subject: Re: Filename encodings under Win32
glenn@...
Send Email Send Email
 
On Mon, Oct 13, 2003 at 05:21:04AM +0200, Tony Mechelynck wrote:
> I understood you as meaning that the program-default setting should be
> Unicode. I beg to differ, however. Or maybe I misunderstood what you were
> saying. And whatever the program-default settings, Vim should (IMHO) work in
> as constant a manner as possible across all platforms.

I believe that the *internal* encoding ("encoding") can, if the various
bugs are fixed, reasonably be UTF-8, unless there's outcry about memory
usage.  I agree that it's very important that keyboard input, file
reading and writing, and so on operate in the ACP by default.

> Now let's say I change 'encoding' to "utf-8". With 'termencoding' left empty
> (the default), gvim now suddenly expects the keyboard to be sending UTF-8
> byte sequences (because an empty 'termencoding' means it takes the same
> value as whatever is the current vazlue of 'encoding'). Windows, however, is

Right: I believe this is poor behavior for Windows.  Windows input is
always in the ACP[1], and if it's not, it should always be possible to find
out what it is.  (That is, I don't know exactly what Windows does if you
have multiple keyboard mappings and change languages, but it shouldn't
require special changing of tenc.)

For example, Vim always expects data from the IME in the encoding it
sends (Unicode).  termencoding is not used.  If I set tenc=cp1242, I
can still enter Japanese kanji with the IME--Vim knows that data is
alwyas in the same format, and handles it correctly, even though it's
not CP1242.  Keyboard input is the same: the encoding should always
be predictable.

(I don't know if anyone is using tenc in Windows to do weird things;
I can't think of any practical use for intentionally setting tenc to
a value that doesn't match the ACP.)

> It may be useful in itself; but until and unless it is indeed (as you
> suggest) incorporated in mainline Vim source (a possibility towards which
> I'm not averse as long as it doesn't break something else), it "doesn't
> exist" from where I sit.

That's nice, but not relevant.  :)  Again, I wasn't suggesting anyone
use the Vim script I supplied, but only using it to demonstrate what the
internal defaults could be.

> Indeed. The difference is virtually nil for English; it is small but nonzero
> for other Latin-alphabet languages, it approaches 1 to 2 for other-alphabet
> languages like Greek or Russian (a little less than that because of spaces,
> commas, full stops, etc.); I don't know the ratio for languages like hindi
> (with nagari script) or Chinese (hanzi).

The penalty is about 50% for CJK languages (two byte encodings become
three byte sequences).

> By the way: what do you mean by ACP? The currently "active code page" maybe?

ANSI codepage.  It's the system codepage, set in the "regional settigs"
control panel (or whatever; MS changes the control panels weekly).  It's
the codepage that "*A" (ANSI) functions expect (which are the ones Vim
uses, for the most part).  Essentially, the ACP is to Windows 9x as
"encoding" is to Vim.  In NT, everything is UCS-16 internally--or
is it UTF-16?--and the "*A" functions convert to and from the ACP.

In a sense, MS did with NT what I wish Vim would do--standardize on Unicode
internally, to make the internals simpler, in a way that is transparent
to users.

> Hm. Your "kanji in filenames" issue makes me think: could that be related to
> the fact that my Netscape 7 cannot properly handle Cyrillic letters between
> <title></title> HTML tags (what sits there displays on the title bar, and
> anything out-of-the-way is accepted but doesn't display properly, IIRC not
> even with a <meta> tag specifying that the page is in UTF-8) but can show
> them with no problems in body text, for instance between <H1></H1> (where
> the title could appear again, this time to be displayed on top of the text
> inside the browser window)? But this paragraph may be drifting off-topic.

It's related, but not exactly the same.

Vim's problem with titlebars is that it's not converting titlebar
strings to the ACP.  ("ºù.txt" shows up as <8d><f7>.txt, and 8df7
looks like the Unicode value of ºù; I'm not entirely sure how that's
happening and havn't looked at the code.)  Fixing this will allow
displaying characters in the ANSI codepage: a system set to Japanese
will be able to display Kanji, but not Arabic.

For displaying full Unicode, it needs to test if Unicode is available,
create a Unicode window (instead of an ANSI window), and set the title
with the corresponding wide function.  This isn't too hard, but it does
take more work and a great deal more testing (to make sure it doesn't
break anything in 9x).  This would be nice, but it's above and beyond
"don't break anything in UTF-8 that works in the normal ANSI codepage".

Whoops.  I just tried saving "ºù.txt", and ended up with
"(garbage)¡à.txt".
That explains the "<8d><f7>.txt".  Looks like file saving isn't working
right when enc=utf-8.  This is a much more serious bug, but not one I'm
up to fixing right now, as, like you, I rarely edit files with non-ASCII
characters in the filename.  (I'm still using 6.1, though, so this might
well be fixed.)

[1] or in Unicode in NT if you use the correct Windows messages, but I
don't recall which of those work in 9x (probably none)

--
Glenn Maynard

#998 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Mon Oct 13, 2003 5:28 am
Subject: Re: Filename encodings under Win32
antoine.mechelynck@...
Send Email Send Email
 
Glenn Maynard <glenn@...> wrote:
> On Mon, Oct 13, 2003 at 05:21:04AM +0200, Tony Mechelynck wrote:
> > I understood you as meaning that the program-default setting should
> > be Unicode. I beg to differ, however. Or maybe I misunderstood what
> > you were saying. And whatever the program-default settings, Vim
> > should (IMHO) work in as constant a manner as possible across all
> > platforms.
>
> I believe that the *internal* encoding ("encoding") can, if the
> various
> bugs are fixed, reasonably be UTF-8, unless there's outcry about
> memory usage.  I agree that it's very important that keyboard input,
> file
> reading and writing, and so on operate in the ACP by default.

so, IIUC, if we want to keep keyboard input, printer output, and file
creation to operate by default according to the geographic locale, then one
thing that I can see is that 'termencoding' cannot default to empty (as it
can when 'encoding' defaults to the encoding defined by $LANG), it must
default to the keyboard's national encoding. Similarly for 'printencoding'
(where present and functioning), for the global side of 'fileencoding', and
for the non-Unicode part of 'fileencodings', which could then for instance
be set by default to "ucs-bom,utf-8,cp937" if cp937 is the "national"
encoding as defined by the Windows country settings.
>
> > Now let's say I change 'encoding' to "utf-8". With 'termencoding'
> > left empty (the default), gvim now suddenly expects the keyboard to
> > be sending UTF-8 byte sequences (because an empty 'termencoding'
> > means it takes the same value as whatever is the current vazlue of
> > 'encoding'). Windows, however, is
>
> Right: I believe this is poor behavior for Windows.  Windows input is
> always in the ACP[1], and if it's not, it should always be possible
> to find out what it is.  (That is, I don't know exactly what Windows
> does if you
> have multiple keyboard mappings and change languages, but it shouldn't
> require special changing of tenc.)

WordPad is somehow able to detect it "on the fly" when I change the setting
of the "international keyboard" feature. AFAIK, Vim isn't, so it's simpler
not to touch that feature when working with Vim. OTOH, as long as
'termencoding' is nonempty and consistent with what the keyboard driver is
sending to the program, the internal 'encoding' of gvim can be changed to
anything compatible with what I'm doing, and in particular to UTF-8, which
ought to be compatible with everything (within limits: I mustn't set
'fileencoding' to latin1, for instance, if I've typed kanji into the
buffer).
>
> For example, Vim always expects data from the IME in the encoding it
> sends (Unicode).  termencoding is not used.  If I set tenc=cp1242, I
> can still enter Japanese kanji with the IME--Vim knows that data is
> alwyas in the same format, and handles it correctly, even though it's
> not CP1242.  Keyboard input is the same: the encoding should always
> be predictable.

I see. I think I have Window's Global IME installed, but I don't know how to
use it -- how, for instance, to input an East-Asian ideogram, of which I
know the shape, and maybe the meaning or part of it, but not the sound. For
"ordinary" text input, or for keymapped text input, Vim interprets the keys
coming from the keyboard driver in the light of the current 'termencoding'.
>
> (I don't know if anyone is using tenc in Windows to do weird things;
> I can't think of any practical use for intentionally setting tenc to
> a value that doesn't match the ACP.)

Neither can I. That's why it shouldn't stay empty if and when 'encoding' is
changed away from the ACP.
>
> > It may be useful in itself; but until and unless it is indeed (as
> > you suggest) incorporated in mainline Vim source (a possibility
> > towards which I'm not averse as long as it doesn't break something
> > else), it "doesn't exist" from where I sit.
>
> That's nice, but not relevant.  :)  Again, I wasn't suggesting anyone
> use the Vim script I supplied, but only using it to demonstrate what
> the internal defaults could be.
> [...]
> > Indeed. The difference is virtually nil for English; it is small
> > but nonzero for other Latin-alphabet languages, it approaches 1 to
> > 2 for other-alphabet languages like Greek or Russian (a little less
> > than that because of spaces, commas, full stops, etc.); I don't
> > know the ratio for languages like hindi (with nagari script) or
> > Chinese (hanzi).
>
> The penalty is about 50% for CJK languages (two byte encodings become
> three byte sequences).
>
> > By the way: what do you mean by ACP? The currently "active code
> > page" maybe?
>
> ANSI codepage.  It's the system codepage, set in the "regional
> settigs" control panel (or whatever; MS changes the control panels
> weekly).  It's
> the codepage that "*A" (ANSI) functions expect (which are the ones Vim
> uses, for the most part).  Essentially, the ACP is to Windows 9x as
> "encoding" is to Vim.  In NT, everything is UCS-16 internally--or
> is it UTF-16?--and the "*A" functions convert to and from the ACP.

You can call it UCS-2 or UTF-16. I've been told there are a few differences
between the two, but IIUC they won't show themselves if you limit yourself
to valid codepoints not higher than U+FFFF.
>
> In a sense, MS did with NT what I wish Vim would do--standardize on
> Unicode internally, to make the internals simpler, in a way that is
> transparent
> to users.
>
> > Hm. Your "kanji in filenames" issue makes me think: could that be
> > related to the fact that my Netscape 7 cannot properly handle
> > Cyrillic letters between <title></title> HTML tags (what sits there
> > displays on the title bar, and anything out-of-the-way is accepted
> > but doesn't display properly, IIRC not even with a <meta> tag
> > specifying that the page is in UTF-8) but can show them with no
> > problems in body text, for instance between <H1></H1> (where the
> > title could appear again, this time to be displayed on top of the
> > text inside the browser window)? But this paragraph may be drifting
> > off-topic.
>
> It's related, but not exactly the same.
>
> Vim's problem with titlebars is that it's not converting titlebar
> strings to the ACP.  ("ºù.txt" shows up as <8d><f7>.txt, and 8df7
> looks like the Unicode value of ºù; I'm not entirely sure how that's
> happening and havn't looked at the code.)  Fixing this will allow
> displaying characters in the ANSI codepage: a system set to Japanese
> will be able to display Kanji, but not Arabic.

...and a system set (like mine) to a Latin codepage will be able to display
French (with its accents), but not Russian. That sheds some light on what I
experienced.
>
> For displaying full Unicode, it needs to test if Unicode is available,
> create a Unicode window (instead of an ANSI window), and set the title
> with the corresponding wide function.  This isn't too hard, but it
> does
> take more work and a great deal more testing (to make sure it doesn't
> break anything in 9x).  This would be nice, but it's above and beyond
> "don't break anything in UTF-8 that works in the normal ANSI
> codepage".

...and it would probably add quite some lines of code for cross-platform
compatibility, since not every platform offers a full Unicode interface.
>
> Whoops.  I just tried saving "ºù.txt", and ended up with
> "(garbage)¡à.txt". That explains the "<8d><f7>.txt".  Looks like file
> saving isn't working
> right when enc=utf-8.  This is a much more serious bug, but not one
> I'm
> up to fixing right now, as, like you, I rarely edit files with
> non-ASCII characters in the filename.  (I'm still using 6.1, though,
> so this might
> well be fixed.)

Can you create that filename with Notepad.exe (Save As) or cmd.exe (copy NUL
filename.txt)? If not, then Vim is no worse than at least some native
Microsoft applications. I suppose you know (but I'm repeating) that a
full-featured gvim distribution for Win32 (currently gvim.exe 6.2.96 plus
runtime files as of 13 Sep 2003) is available from Steve Hall at
http://cream.sourceforge.net/vim.html . It's the most recent gvim
distribution for Windows known to me, with what I regard as quite a
user-friendly installer. It is also a "standard" gvim, not a "special Cream"
gvim, notwithstanding its hosting location. (And it's the one I'm using,
which doesn't say much, except that I can attest that I have found it to
work the way the help files say it should. Of course I haven't tested every
possible little thing though.) Finally, if it happens in the future as it
did in the past, Steve will continue to generate updated gvim builds from
time to time, and the above-mentioned page will be updated accordingly.
>
> [1] or in Unicode in NT if you use the correct Windows messages, but I
> don't recall which of those work in 9x (probably none)
>
> --
> Glenn Maynard

Best regards,
Tony.

#999 From: Glenn Maynard <glenn@...>
Date: Mon Oct 13, 2003 5:44 am
Subject: Re: Filename encodings under Win32
glenn@...
Send Email Send Email
 
On Mon, Oct 13, 2003 at 07:28:24AM +0200, Tony Mechelynck wrote:
> so, IIUC, if we want to keep keyboard input, printer output, and file
> creation to operate by default according to the geographic locale, then one
> thing that I can see is that 'termencoding' cannot default to empty (as it
> can when 'encoding' defaults to the encoding defined by $LANG), it must
> default to the keyboard's national encoding. Similarly for 'printencoding'
> (where present and functioning), for the global side of 'fileencoding', and
> for the non-Unicode part of 'fileencodings', which could then for instance
> be set by default to "ucs-bom,utf-8,cp937" if cp937 is the "national"
> encoding as defined by the Windows country settings.

That's what I was suggesting originally, I just wasn't clear enough.

> You can call it UCS-2 or UTF-16. I've been told there are a few differences
> between the two, but IIUC they won't show themselves if you limit yourself
> to valid codepoints not higher than U+FFFF.

(Right, but the difference is significant, so I just wanted to make it
clear that I wasn't being precise.)

> ...and it would probably add quite some lines of code for cross-platform
> compatibility, since not every platform offers a full Unicode interface.

Vim already has the necessary code to convert between UTF-8 and the ACP,
without adding any dependencies like iconv.

> Can you create that filename with Notepad.exe (Save As) or cmd.exe (copy NUL
> filename.txt)? If not, then Vim is no worse than at least some native
> Microsoft applications. I suppose you know (but I'm repeating) that a

I can create it with notepad, and any other native graphical app that is
packaged with Windows.  (I can also create files with filenames in any
language; Windows-native apps in NT are completely Unicode-based.)

I can also create it with Vim if encoding is set to CP932; this only
happens enc=utf-8.

--
Glenn Maynard

#1000 From: Camillo Särs <ged@...>
Date: Mon Oct 13, 2003 7:24 am
Subject: Re: Filename encodings under Win32
ged@...
Send Email Send Email
 
Glenn Maynard wrote:
>>Can you create that filename with Notepad.exe (Save As) or cmd.exe (copy NUL
>>filename.txt)? If not, then Vim is no worse than at least some native
>>Microsoft applications. I suppose you know (but I'm repeating) that a
>
> I can create it with notepad, and any other native graphical app that is
> packaged with Windows.  (I can also create files with filenames in any
> language; Windows-native apps in NT are completely Unicode-based.)

Correct.  Additionally, you can always enter any unicode character code
directly from the keyboard.  All that is needed is the numeric keypad in
numlock mode and the Alt key.  This does not seem to work with Vim.

> I can also create it with Vim if encoding is set to CP932; this only
> happens enc=utf-8.

That's what I noted as well.  Basically vim works "ok" if you set the
termencoding and encoding to your codepage.  However, you don't get UTF-8
support that way.  Things break down on Windows when you use UTF-8 as your
encoding, as vim seems to use incorrect APIs.

And as noted, Win9x/ME are different.  I'm only concerned with NT-based
Windows here, as that's where you can expect Unicode support to work.

To summarize:
- Vim on NT does not work well with unicode/utf-8.
- The fixes are fairly straightforward (use Unicode API, UTF-8 internally)
- Win9x need to work in cp mode, but that's already supported

Camillo
--
Camillo Särs <+ged+@...>              **  Aim for the impossible and you
<http://www.iki.fi/+ged>                 **   will achieve the improbable.
PGP public key available                 **

#1001 From: Glenn Maynard <glenn@...>
Date: Mon Oct 13, 2003 7:47 am
Subject: Re: Filename encodings under Win32
glenn@...
Send Email Send Email
 
On Mon, Oct 13, 2003 at 10:24:01AM +0300, Camillo Särs wrote:
> That's what I noted as well.  Basically vim works "ok" if you set the
> termencoding and encoding to your codepage.  However, you don't get UTF-8
> support that way.  Things break down on Windows when you use UTF-8 as your
> encoding, as vim seems to use incorrect APIs.

They don't break down, they're just imperfect.

> And as noted, Win9x/ME are different.  I'm only concerned with NT-based
> Windows here, as that's where you can expect Unicode support to work.

Vim should support UTF-8 in 9x, too.

> - Vim on NT does not work well with unicode/utf-8.

It works well for many uses; I use enc=utf-8 exclusively, to edit files
in both UTF-8 (with characters well beyond CP1242 and CP932) and other
encodings.

> - The fixes are fairly straightforward (use Unicode API, UTF-8 internally)
> - Win9x need to work in cp mode, but that's already supported

No, convert between UTF-8 and the ACP and use the ANSI API calls.  This
will make enc=utf-8 work in both 9x and NT.

Using Unicode calls when available is useful (eg. to display non-ACP
text in the titlebar), but that's "new feature" territory, not "bugfix".

--
Glenn Maynard

#1002 From: Bram Moolenaar <Bram@...>
Date: Mon Oct 13, 2003 9:39 am
Subject: Re: Filename encodings under Win32
Bram@...
Send Email Send Email
 
Glenn Maynard wrote:

> On Sun, Oct 12, 2003 at 10:44:05PM +0200, Tony Mechelynck wrote:
> > As long as 'fileencoding', 'printencoding' and (most important)
> > 'termencoding' default (when empty) to whatever is the current value of
> > 'encoding', the latter must not (IMHO) be set to UTF-8 by default.
> >
> > (Let's spell it out) In my humble opinion, Vim should require as little
> > "tuning" as possible to handle the language interfaces the same way as the
> > operating system does, and this means that, when the user sets nothing else
> > in his startup and configuration files, keyboard input, printer output and
> > file creation should default to whatever is set in the locale.
>
> This is a trivial fix, which I already proposed many months ago: the
> defaults in Windows should be the results of
>
> exe "set fileencodings=ucs-bom,utf-8,cp" . getacp() . ",latin1"
> exe "set fileencoding=cp" . getacp()
>
> and now adding:
>
> exe "set printencoding=cp" . getacp()

The default that Vim starts with is 'encoding' set to the active
codepage and 'fileencoding' set to "ucs-bom".  This means it falls back
to 'encoding' when there is no BOM.  That should work almost the same
way as what you give here, but without the explicit use of the codepage
name.  When the user sets 'encoding' the other ones follow.  In your
example the user has to set all three options.

Perhaps setting 'termencoding' can be omitted if we can use the Unicode
functions for keyboard input.  Perhaps someone can figure out how to do
this properly.  And make use the input methods still work!

> Note that "getacp" is a function in a patch I sent which was lost or
> forgotton: return the ANSI codepage.

Can't recall that patch.  I generally give OS-specific additions a low
priority.

> Switching "encoding" to "utf-8" should be transparent, once proper
> conversions for win32 calls are in place.  Regular users don't care
> about what encoding their editor uses internally, any more than they
> care about what type of data structures they use.

The problem still is that conversion from and to UTF-8 is not
transparent.  Especially when editing files with an unknown encoding.

> On the other hand, if utf-8 internally is fully supported, then utf-8
> can be the *only* internal encoding--which would make the rendering
> code much simpler and more robust.  I remember finding lots of little
> errors in the renderer (eg. underlining glitches for double-width
> characters) that went away with utf-8, and I don't think Vim renders
> correctly at all if eg.  "encoding" is set to "cp1242" and the ACP
> is CP932 (needs a double conversion).

UTF-8 is already fully supported in Vim.  They may be a few glitches on
the conversions though.  The clipboard also still doesn't work 100%.

--
hundred-and-one symptoms of being an internet addict:
182. You may not know what is happening in the world, but you know
      every bit of net-gossip there is.

  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net   \\\
///          Creator of Vim - Vi IMproved -- http://www.Vim.org          \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
  \\\  Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html  ///

#1003 From: Camillo Särs <ged@...>
Date: Mon Oct 13, 2003 10:00 am
Subject: Re: Filename encodings under Win32
ged@...
Send Email Send Email
 
Glenn Maynard wrote:
> They don't break down, they're just imperfect.

Well, if I can't write a filename the way I need to write it, I have a
problem.  Fortunately this is mostly theoretic for me, but for some users
resorting to plain us-ascii is not a possibility.  These mails are more an
attempt at getting vim to work better than to improve my life.  After all,
I believe in contributing when I can, if only by highlighting problems and
proposing solutions.

> Vim should support UTF-8 in 9x, too.

Of course, but with the necessary restrictions.  Displaying unicode is a
problem, as is entering filenames.  Those functions are restricted to the
ACP on Win9x.

>>- Vim on NT does not work well with unicode/utf-8.
>
> It works well for many uses; I use enc=utf-8 exclusively, to edit files
> in both UTF-8 (with characters well beyond CP1242 and CP932) and other
> encodings.

Yes, editing is not the problem.  It's the system calls that cause the
trouble, as we have established.

>>- The fixes are fairly straightforward (use Unicode API, UTF-8 internally)
>>- Win9x need to work in cp mode, but that's already supported
>
> No, convert between UTF-8 and the ACP and use the ANSI API calls.  This
> will make enc=utf-8 work in both 9x and NT.

No it will not.  You would then restrict NT users to their local code page
only, and that's almost "reverting to DOS".  On Win9x we need to stick to
ACP, but on NT I don't see any reason not to go Unicode.  Also, the UTF-8
to UCS-2 mapping is quick and straightforward, with few hidden catches.
Mapping utf-8 to ACP is tricky and lossy.

Also, the code you had implemented already used the "W" APIs correctly.  I
don't understand why you would now advocate dropping widechar and unicode
support.

> Using Unicode calls when available is useful (eg. to display non-ACP
> text in the titlebar), but that's "new feature" territory, not "bugfix".

It is a bugfix.  Currently, when using UTF-8 on WinNT, vim is broken in (at
least) the following regards:

- Opening non-ascii filenames, regardless of codepage
    å.txt internally becomes <e5>.txt

- Saving filenames
    å.txt is saved in UTF-8 format (Ã¥.txt) and displayed incorrectly in
title bar

- The default termencoding should be set intelligently, UTF-8 as
termencoding breaks input of non-ascii.

- The default fileencoding breaks when "going UTF-8", most probably a
better behavior would be to default to the ACP always.

- Also, my vim (6.2) defaults to "latin1", not my current codepage.  That
would indicate that the ACP detection does not work.

OK, the list above sounds like whining, but earlier I did suggest that the
fixes are fairly straightforward.

On WinNT, vim should use unicode apis, essentially benefitting
automatically from NT native Unicode.  This only involves one additional
encoding/decoding step before calling the apis.

On Win9x, vim should use ANSI apis.  The only thing missing is again the
encoding/decoding, although it's trickier with the ANSI apis.  There are
many cases where an user would enter UTF-8 stuff that doesn't smootly
convert to the current CP.  I think vim's current code should detect that
easily.

Camillo
--
Camillo Särs <+ged+@...>              **  Aim for the impossible and you
<http://www.iki.fi/+ged>                 **   will achieve the improbable.
PGP public key available                 **

#1004 From: Bram Moolenaar <Bram@...>
Date: Mon Oct 13, 2003 10:57 am
Subject: Re: Filename encodings under Win32
Bram@...
Send Email Send Email
 
Camillo wrote:

> > Vim should support UTF-8 in 9x, too.
>
> Of course, but with the necessary restrictions.  Displaying unicode is a
> problem, as is entering filenames.  Those functions are restricted to the
> ACP on Win9x.

On Windows NT/XP there are also restrictions, especially when using
non-NTFS filesystems.  There was a discussion about this in the Linux
UTF-8 maillist a long time ago.  There was no good universal solution
for handling filenames that they could come up with.

Vim could use Unicode functions for accessing files, but this will be a
huge change.  Requires lots of testing.  Main problem is when 'encoding'
is not a Unicode encoding, then conversions need to be done, which may
fail.

If you use filenames that cannot be represented in the active codepage,
you probably have problems with other programs.  Thus sticking with the
active codepage functions isn't too bad.  But then Vim needs to convert
from 'encoding' to the active codepage!

> It is a bugfix.  Currently, when using UTF-8 on WinNT, vim is broken in (at
> least) the following regards:
>
> - Opening non-ascii filenames, regardless of codepage
>    å.txt internally becomes <e5>.txt
>
> - Saving filenames
>    å.txt is saved in UTF-8 format (Ã¥.txt) and displayed incorrectly in
> title bar

The file names are handled as byte strings.  Thus so long as you use the
right bytes it should work.  Problem is when you are typing/editing with
a different encoding from the active codepage.

> - The default termencoding should be set intelligently, UTF-8 as
> termencoding breaks input of non-ascii.

Why would 'termencoding' be "utf-8"?  This won't work, unless you are
using an xterm on MS-Windows.  The default 'termencoding' is empty,
which means 'encoding' is used.  There is no better default.  When you
change 'encoding' you might have to change 'termencoding' as well, but
this depends on your situation.

> - The default fileencoding breaks when "going UTF-8", most probably a
> better behavior would be to default to the ACP always.

'fileencoding' is set when reading a file.  Perhaps you mean
'fileencodings'?  This one needs to be tweaked by the user, because it
depends on what kind of files you edit.  Main problem is that an ASCII
file can be any encoding, Vim can't detect what it is, thus the user has
to specify what he wants Vim to do with it.

> - Also, my vim (6.2) defaults to "latin1", not my current codepage.  That
> would indicate that the ACP detection does not work.

Where does it use "latin1"?  Not in 'encoding', I suppose.

> OK, the list above sounds like whining, but earlier I did suggest that the
> fixes are fairly straightforward.

Mostly it's quite more complicated.  Different users have different
situations, it is hard to think of solutions that work for most people.

> On WinNT, vim should use unicode apis, essentially benefitting
> automatically from NT native Unicode.  This only involves one additional
> encoding/decoding step before calling the apis.

The problem is that conversions to/from Unicode only work when you know
the encoding of the text you are converting.  The encoding isn't always
known.  Vim sometimes uses "latin1", so that you at least get 8-bit
clean editing, even though the actual encoding is unknown.

> On Win9x, vim should use ANSI apis.  The only thing missing is again the
> encoding/decoding, although it's trickier with the ANSI apis.  There are
> many cases where an user would enter UTF-8 stuff that doesn't smootly
> convert to the current CP.  I think vim's current code should detect that
> easily.

You can use a few Unicode functions on Win9x, we already do.  I don't
see a reason to change this.

--
I'm in shape.  Round IS a shape.

  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net   \\\
///          Creator of Vim - Vi IMproved -- http://www.Vim.org          \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
  \\\  Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html  ///

#1005 From: Camillo Särs <ged@...>
Date: Mon Oct 13, 2003 11:51 am
Subject: Re: Filename encodings under Win32
ged@...
Send Email Send Email
 
Bram Moolenaar wrote:
> On Windows NT/XP there are also restrictions, especially when using
> non-NTFS filesystems.

Right, I forgot about those.  AFAIK, the fuctions do not fail silently in
those cases, so it's just (yet) more work.  Essentially, file names then
come from a restricted charset (code page limits).

> There was a discussion about this in the Linux UTF-8 maillist a
  > long time ago.  There was no good universal solution
> for handling filenames that they could come up with.

I bet.  For many systems, the current behavior is adequate even if
technically speaking wrong.  I'm not trying to propose a universal
solution, I'm just advocating the view that on win32, vim should do the
"windows thing" with unicode/utf-8.

> Vim could use Unicode functions for accessing files, but this will be a
> huge change.

Why so?  The code earlier in this thread probably did much of what is
needed. It also involved numerous other changes, which I ignored.  I'm not
being nosy, I'm just curious why this would be a "huge change".  It's not
the file contents we are getting at, it's the filenames (and the GUI).

Also note that when using the native code page as the encoding (read:
latin1), using the ANSI functions do work as expected.  So the fixes would
only need to concern the UTF-8 encoding, if you get picky. :)

> Requires lots of testing.

That's unicode for you.  However, deriving a decent test set using
available unicode test files should be a fairly straight-forward thing.

> Main problem is when 'encoding' is not a Unicode encoding, then conversions
  > need to be done, which may fail.

But what I assume you are doing now is even worse, isn't it?  Essentially
you are be feeding some user-selected encoding to functions that require
ANSI characters.  How's that for "a lot of testing"?

Conversions from almost any encoding to unicode should work.  I would not
expect major trouble there.  And note that if the conversion from the
encoding to unicode fails, I expect that the current usage would fail even
more severely.  And there haven't been reports of that, has there?

There certainly are tricky encodings that could cause problems.  However,
I'm mostly concerned with the basic use case of utf-8 and
"fileencodings=ucs-bom,utf-8,latin1".  This under a code page of cp1252.

> If you use filenames that cannot be represented in the active codepage,
> you probably have problems with other programs.

But I have filenames that can be represented in the active code page
(å.txt), but which get encoded into incompatible UTF-8 characters!

> Thus sticking with the active codepage functions isn't too bad.

If it worked that way, but it doesn't.  Setting "encoding=utf-8" changes
that behavior - only us-ascii is usable in filenames.

> But then Vim needs to convert from 'encoding' to the active codepage!

That would help most users.  Including me.  But it would not be the
"ultimate" solution to unicode on win32, as it would still cause trouble
with characters outside the codepage.  As I see it, the easiest fix is
actually using the unicode-api, as there are less (or no) conversion
failures that way.

> The file names are handled as byte strings.  Thus so long as you use the
> right bytes it should work.  Problem is when you are typing/editing with
> a different encoding from the active codepage.

My point exactly! :)

> Why would 'termencoding' be "utf-8"?  This won't work, unless you are
> using an xterm on MS-Windows.

Yeah, but that's what you get if you just blindly do "set encoding=utf-8".
Took me a while to figure that one out.  I need to do "set
termencoding=cp1252" first, or the "let &termencoding = &encoding".  Not
exactly transparent to non-experts.

> The default 'termencoding' is empty, which means 'encoding' is used.
  > There is no better default.

On Windows, I'd say "detect active code page" is the right choice.

> When you change 'encoding' you might have to change 'termencoding' as
  > well, but this depends on your situation.

As noted above, that's the unintuitive behavior I was getting at.  A
windows user, knowing that unicode is the native charset, does a "set
encoding=utf-8" and expects things to work.  They don't, but depending on
the language, it may take a while before a non-ascii character is entered.

>>- The default fileencoding breaks when "going UTF-8", most probably a
>>better behavior would be to default to the ACP always.
>
> 'fileencoding' is set when reading a file.  Perhaps you mean
> 'fileencodings'?  This one needs to be tweaked by the user, because it
> depends on what kind of files you edit.  Main problem is that an ASCII
> file can be any encoding, Vim can't detect what it is, thus the user has
> to specify what he wants Vim to do with it.

Yes, I was unclear.  Let me elaborate, although this point is rather
exotic, and you can safely ignore me. :)

When setting "encoding=utf-8", any new files will suddenly be utf-8 as
well.  For "ordinary" windows users, this may not be the desired result.
What I was getting at was that *perhaps* the default fileencoding should be
"cp####" in this case, unless the user explicitly sets it to something else
(presumably utf-8).  Before you object, yes, that's silly.

Why use "encoding=utf-8" if you still want to create new files as ANSI?
Well, quite a few windows applications don't do UTF-8.  But using UTF-8
internally still allows users to *transparently* edit existing
unicode/utf-8 files without conversions.

Anyway, I digress.  This thought of mine was not that bright.  Just forget it.

>>- Also, my vim (6.2) defaults to "latin1", not my current codepage.  That
>>would indicate that the ACP detection does not work.
>
> Where does it use "latin1"?  Not in 'encoding', I suppose.

Yes.  Without a _vimrc, I get:
encoding=latin1
fileencodings=ucs-bom
termencoding=

Thus changing the encoding only has funny effects.

> Mostly it's quite more complicated.  Different users have different
> situations, it is hard to think of solutions that work for most people.

Well, if you decide to make the unicode implementation work as it should,
most people should be able to get what they want.  It might involve a bit
of tweaking, but nothing more.

> The problem is that conversions to/from Unicode only work when you know
> the encoding of the text you are converting.  The encoding isn't always
> known.  Vim sometimes uses "latin1", so that you at least get 8-bit
> clean editing, even though the actual encoding is unknown.

I claim that on Windows, you should always have a good idea of the
encoding.  It's either explicitly set by the user, "cp####", or unicode.
Windows has good support for converting ANSI to unicode, so this should be
a non-issue.  And again, as this is about non-UTF-8 data, you already have
this problem anyway, because you are calling the ANSI functions with the
"unknown" data.  That it works should prove my point. ;-)

But in the universal case, I agree with you.

>>On Win9x, vim should use ANSI apis.  The only thing missing is again the
>>encoding/decoding, although it's trickier with the ANSI apis.  There are
>>many cases where an user would enter UTF-8 stuff that doesn't smootly
>>convert to the current CP.  I think vim's current code should detect that
>>easily.
>
> You can use a few Unicode functions on Win9x, we already do.  I don't
> see a reason to change this.

Sorry, I didn't want to imply that.  I agree that we should stick to the
unicode functions that are supported on Win9x, and only revert to ANSI
"when forced".

Camillo
--
Camillo Särs <+ged+@...>              **  Aim for the impossible and you
<http://www.iki.fi/+ged>                 **   will achieve the improbable.
PGP public key available                 **

#1006 From: Bram Moolenaar <Bram@...>
Date: Mon Oct 13, 2003 12:25 pm
Subject: Re: Filename encodings under Win32
Bram@...
Send Email Send Email
 
Camillo wrote:

> > Vim could use Unicode functions for accessing files, but this will be a
> > huge change.
>
> Why so?  The code earlier in this thread probably did much of what is
> needed. It also involved numerous other changes, which I ignored.  I'm not
> being nosy, I'm just curious why this would be a "huge change".  It's not
> the file contents we are getting at, it's the filenames (and the GUI).

Because every fopen(), stat() etc. will have to be changed.

> Also note that when using the native code page as the encoding (read:
> latin1), using the ANSI functions do work as expected.  So the fixes would
> only need to concern the UTF-8 encoding, if you get picky. :)

This only means extra work, since an "if (encoding == ...)" has to be
added to select between the traditional file access method and the
Unicode method.

> > Requires lots of testing.
>
> That's unicode for you.  However, deriving a decent test set using
> available unicode test files should be a fairly straight-forward thing.

No, it's actually impossible to test this automatically.  It involves
creating various Win32 environments with code page settings, network
filesystems and installed libraries.  Only end-user tests can discover
the real problems.

> > Main problem is when 'encoding' is not a Unicode encoding, then conversions
> > need to be done, which may fail.
>
> But what I assume you are doing now is even worse, isn't it?  Essentially
> you are be feeding some user-selected encoding to functions that require
> ANSI characters.  How's that for "a lot of testing"?

The currently used functions work fine for accessing existing files.
It's only when typing a new name or when displaying the name that
problems may occur.

> Conversions from almost any encoding to unicode should work.  I would not
> expect major trouble there.  And note that if the conversion from the
> encoding to unicode fails, I expect that the current usage would fail even
> more severely.  And there haven't been reports of that, has there?

Main problem is that sometimes we don't know what the encoding is.  In
that situation you can treat the filename as a sequence of bytes in most
places, but conversion is impossible.  This happens more often than you
would expect.  Put a floppy disk or CD into your computer...

There is also the situation that Vim uses the active codepage, but the
file is actually in another encoding that could not be detected.  Then
doing "gf" on a filename will work if you don't do conversion, but it
will fail if you try converting with the wrong encoding in mind.

> > Thus sticking with the active codepage functions isn't too bad.
>
> If it worked that way, but it doesn't.  Setting "encoding=utf-8" changes
> that behavior - only us-ascii is usable in filenames.

I don't see why.  You can use a file selector to open any file and write
it back under the same name.  Vim doesn't need to know the encoding of
the filename that way.

If you type a file name in utf-8 it won't work properly, thus you have
to use another method to obtain the file name.  It's clumsy, I know.

> > But then Vim needs to convert from 'encoding' to the active codepage!
>
> That would help most users.  Including me.  But it would not be the
> "ultimate" solution to unicode on win32, as it would still cause trouble
> with characters outside the codepage.  As I see it, the easiest fix is
> actually using the unicode-api, as there are less (or no) conversion
> failures that way.

As said above, this only works if we are 100% sure of what encoding the
text (filename) is in, and we don't always know that.

> > Why would 'termencoding' be "utf-8"?  This won't work, unless you are
> > using an xterm on MS-Windows.
>
> Yeah, but that's what you get if you just blindly do "set encoding=utf-8".
> Took me a while to figure that one out.  I need to do "set
> termencoding=cp1252" first, or the "let &termencoding = &encoding".  Not
> exactly transparent to non-experts.

Setting 'encoding' is full of side effects.  There is a clear warning in
the docs about this.

> > The default 'termencoding' is empty, which means 'encoding' is used.
> > There is no better default.
>
> On Windows, I'd say "detect active code page" is the right choice.

I remember this was proposed before, I can't remember why we didn't do
it this way.  Windows is different here, since we can find out what the
active codepage is.  On Unix it's not that clear (e.g., depends on what
options the xterm was started with).  Consistency between systems is
preferred.

> >>- Also, my vim (6.2) defaults to "latin1", not my current codepage.  That
> >>would indicate that the ACP detection does not work.
> >
> > Where does it use "latin1"?  Not in 'encoding', I suppose.
>
> Yes.  Without a _vimrc, I get:
> encoding=latin1
> fileencodings=ucs-bom
> termencoding=
>
> Thus changing the encoding only has funny effects.

Your active codepage must be latin1 then.  Vim gets the default from the
active codepage.

--
hundred-and-one symptoms of being an internet addict:
192. Your boss asks you to "go fer" coffee and you come up with 235 FTP sites.

  /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net   \\\
///          Creator of Vim - Vi IMproved -- http://www.Vim.org          \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
  \\\  Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html  ///

#1007 From: Camillo Särs <ged@...>
Date: Mon Oct 13, 2003 12:59 pm
Subject: Re: Filename encodings under Win32
ged@...
Send Email Send Email
 
Bram Moolenaar wrote:
> Because every fopen(), stat() etc. will have to be changed.

Right.  You're not using Windows apis, of course.  But to do things
correctly, you would have to make sure that the fopen() etc.
implementations [in Windows] either convert the strings they receive or
only are called with valid Windows file names.  Converting internally may
be risky, because you'd need a way to convey the encoding into the functions.

> Main problem is that sometimes we don't know what the encoding is.

On Windows?  I would disagree here.  Any filesystem mounted by Windows
should be mounted in a way that adheres to Windows naming conventions.
We're not discussing file contents here.

> In that situation you can treat the filename as a sequence of bytes in most
> places, but conversion is impossible.  This happens more often than you
> would expect.  Put a floppy disk or CD into your computer...

So why convert it? :) The current display/saving problems stem from the
fact that the file name is interpreted as UTF-8, a coding which Windows
does not recognize for file names or strings.

> There is also the situation that Vim uses the active codepage, but the
> file is actually in another encoding that could not be detected.  Then
> doing "gf" on a filename will work if you don't do conversion, but it
> will fail if you try converting with the wrong encoding in mind.

AFAIK, Windows will internally convert the path into Unicode if you call
the ANSI function.  Thus if gf succeeds as you describe, it should succeed
if you use the unicode api as well.  In both cases a 8-bit binary string
undergoes "cp2unicode" conversion.

> I don't see why.  You can use a file selector to open any file and write
> it back under the same name.

Uhm.  Thanks.  I'm so used to using :edit and :view that this alternative
hadn't even crossed my mind.

> If you type a file name in utf-8 it won't work properly, thus you have
> to use another method to obtain the file name.  It's clumsy, I know.

But it's a workaround.  But my title bar still is a mess.

> As said above, this only works if we are 100% sure of what encoding the
> text (filename) is in, and we don't always know that.

We should be sure.  And *if* we get it wrong, the user should be able to
correct it.

> I remember this was proposed before, I can't remember why we didn't do
> it this way.  Windows is different here, since we can find out what the
> active codepage is.  On Unix it's not that clear (e.g., depends on what
> options the xterm was started with).  Consistency between systems is
> preferred.

I would disagree on consistency here.  On windows, the encoding is either
ANSI or unicode, or then it has been explicitly set to something known.
And as long as we know the encoding, let's use it.

> Your active codepage must be latin1 then.  Vim gets the default from the
> active codepage.

My code page is cp1252.  It's not latin1 (iso-8859-1).  In practice, both
are 8-bit-raw.

Camillo
--
Camillo Särs <+ged+@...>              **  Aim for the impossible and you
<http://www.iki.fi/+ged>                 **   will achieve the improbable.
PGP public key available                 **

#1008 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Mon Oct 13, 2003 3:35 pm
Subject: Re: Filename encodings under Win32
antoine.mechelynck@...
Send Email Send Email
 
Bram Moolenaar <Bram@...> wrote:
> Camillo wrote:
[...]
> > - The default termencoding should be set intelligently, UTF-8 as
> > termencoding breaks input of non-ascii.
>
> Why would 'termencoding' be "utf-8"?  This won't work, unless you are
> using an xterm on MS-Windows.  The default 'termencoding' is empty,
> which means 'encoding' is used.  There is no better default.  When you
> change 'encoding' you might have to change 'termencoding' as well, but
> this depends on your situation.
[...]

Glenn Maynard wants 'encoding' to default to "utf-8" regardless of the
active codepage. IMHO this would require 'termencoding' to default, not to
the empty string, but to what is currently the default 'encoding', namely
the active codepage. Such change in the 'termencoding' default would (again,
IMHO) be a GoodThing anyway, since it would allow the keyboard to go on
working whether or not the user alters 'encoding'. Of course it is already
possible to do

     if &termencoding == ""
         let &termencoding = &encoding
     endif

but wouldn't it make it easier to the user (more user friendly) to have
'termencoding' default to the ACP not implicitly (&termencoding == "" and
'encoding' set to the ACP) but explicitly (by defaulting 'termencoding' to a
nonempty value representing the active codepage)? -- And it would make the
above "if" statement unnecessary but not harmful, so existing scripts should
not be broken.

Regards,
Tony.

#1009 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Mon Oct 13, 2003 3:52 pm
Subject: Re: Filename encodings under Win32
antoine.mechelynck@...
Send Email Send Email
 
Camillo Särs <ged@...> wrote:
> Bram Moolenaar wrote:
[...]
> > Why would 'termencoding' be "utf-8"?  This won't work, unless you
> > are
> > using an xterm on MS-Windows.
>
> Yeah, but that's what you get if you just blindly do "set
> encoding=utf-8". Took me a while to figure that one out.  I need to
> do "set termencoding=cp1252" first, or the "let &termencoding =
> &encoding".  Not exactly transparent to non-experts.

Took me some figuring too. A few hours ago I uploaded my solution to
vim-onlline (set_utf8.vim,
http://vim.sourceforge.net/scripts/script.php?script_id=789 ). I hope it
will make it transparent to non-experts. Yet I still believe that defaulting
'termencoding' to the locale's charset would be better than leaving it
empty -- and such a change wouldn't break the above-mentioned script, you're
welcome to look at its source.
>
> > The default 'termencoding' is empty, which means 'encoding' is used.
>  > There is no better default.
>
> On Windows, I'd say "detect active code page" is the right choice.
>
> > When you change 'encoding' you might have to change 'termencoding'
> > as
>  > well, but this depends on your situation.
>
> As noted above, that's the unintuitive behavior I was getting at.  A
> windows user, knowing that unicode is the native charset, does a "set
> encoding=utf-8" and expects things to work.  They don't, but
> depending on
> the language, it may take a while before a non-ascii character is
> entered.
[...]

Regards,
Tony.

Messages 980 - 1009 of 2761   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help