Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

vim-multibyte · Vim (Vi IMproved) text editor special language list

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Messages

Advanced
Messages Help
Messages 717 - 746 of 2761   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#717 From: Autrijus Tang <autrijus@...>
Date: Wed Sep 18, 2002 11:45 pm
Subject: [BUG] Ambiguous-width character handling
autrijus@...
Send Email Send Email
 
Greetings.  I'm a happy user of VIM's multi-byte editing environment under the
X terminal with 'mlterm', and combined they are pleasantly intelligent about
Chinese odds and ends.  My current settings are:

     set termencoding=big5
     set encoding=utf-8
     set fileencodings=big5,utf-8,big5-hkscs,gbk,euc-jp,euc-kr,utf-bom,iso8859-1

However, I have run into a bug concerning "Ambiguous width characters" within
Big5 files.  For example, the Big5 character \xA2\x69, when converted to its
utf-8 encoding, would be:

     2588;FULL BLOCK;So;0;ON;;;;;N;;;;;

Which is an 'A' (ambiguous-width) characters in EastAsianWidth.txt:

     2588;A # FULL BLOCK

Whereas VIM correctly treats normal Chinese characters and punctuations as
occupying two on-screen columns, it considers U+2588 as a single-width
character and seriously disrupts the display.

To observe this bug, move the cursor to the end of the following line
by pressing '$':

    
???1234???1234???1234???1234???1234???1234???1234???1234???1234???1234???1234

You will notice that it stops at a point where there are still 9 characters
displayed in its right.  In 'GVIM', it displays incorrectly by treating the
full-width U+2588 as a single-width character with no corresponding fonts.
Both behaviours are arguably erroneous.

According to http://www.unicode.org/unicode/reports/tr11/#Ambiguous
(Unicode Technical Report #11), the recommended way to handle ambiguous-width
characters are:

     When mapping Unicode to legacy character encodings:
     * Ambiguous Unicode characters always map to full-width characters
     * Ambiguous Unicode characters always map to regular (narrow) characters
       in non-East Asian legacy character encodings

     When processing or displaying data:
     * Ambiguous characters behave like wide or narrow characters depending
       on context (language tag, script identification, associated font, source
       of data, or explicit markup; all can provide the context)

Therefore, may I suggest a new buffer-local option, 'ambiguouswidth', which
can have either of the following values:

     'h' denotes half-width (current) behaviour for ambiguous-width characters
     'f' denotes full-width behaviour for ambiguous-width characters
     'a' denotes automatic handling: full-width if either termencoding _or_
         fileencoding is one of [ uhc, johab, gbk, euc-cn, big5, big5-hkscs ],
	 and half-width otherwise.

The 'a' option is good-to-have but not crucial to the operation.  All I want
is a way to override VIM's current behaviour.

Being not very familiar with VIM's internals and possessing less-than-competent
C skills, I'd appreciate if somebody can implement this idea, or at least
point me to the relevant portion within the source so I can hack on it.

Thanks,
/Autrijus/

#718 From: Autrijus Tang <autrijus@...>
Date: Thu Sep 19, 2002 12:00 am
Subject: Re: [BUG] Ambiguous-width character handling
autrijus@...
Send Email Send Email
 
On Thu, Sep 19, 2002 at 07:45:37AM +0800, Autrijus Tang wrote:
> To observe this bug, move the cursor to the end of the following line
> by pressing '$':
>
>    
???1234???1234???1234???1234???1234???1234???1234???1234???1234???1234???1234

Oops, make that:

     ¢i1234¢i1234¢i1234¢i1234¢i1234¢i1234¢i1234¢i1234¢i1234¢i1234¢i1234

> Therefore, may I suggest a new buffer-local option, 'ambiguouswidth', which
> can have either of the following values:

With a little digging I found this entry in the :help todo entry:

     8   Some UTF-8 have an ambiguous width (single or double).
         Should inspect the font to find out what will be displayed. (Long)

However I'm a little perplexed in how would you inspect the font under the
console (with big5con, imcce or other multibyte vga terminals), or within
the X terminal?  It seems to me that the "inspect the font to find out"
way only works in GVIM; please correct me if I'm mistaken.

Thanks,
/Autrijus/

#719 From: "Tony Mechelynck" <antoine.mechelynck@...>
Date: Thu Sep 19, 2002 1:04 am
Subject: Re: [BUG] Ambiguous-width character handling
antoine.mechelynck@...
Send Email Send Email
 
----- Original Message -----
From: "Autrijus Tang" <autrijus@...>
To: <vim-multibyte@...>
Cc: <whiteg@...>
Sent: Thursday, September 19, 2002 1:45 AM
Subject: [BUG] Ambiguous-width character handling

Greetings.  I'm a happy user of VIM's multi-byte editing environment under
the
X terminal with 'mlterm', and combined they are pleasantly intelligent about
Chinese odds and ends.  My current settings are:

     set termencoding=big5
     set encoding=utf-8
     set
fileencodings=big5,utf-8,big5-hkscs,gbk,euc-jp,euc-kr,utf-bom,iso8859-1

[...]

I don't feel competent to address most of the content of your message, but
about the above:

     - AFAIK, vim doesn't know "utf-bom" as an encoding
     - You can use "ucs-bom" to be able to recognise a Byte Order Mark in
your input files, but it must come before any other Unicode encoding,
including "utf-8", else it will not work properly.

see :help 'fileencodings'

Regards,
Tony.

#720 From: Bram Moolenaar <Bram@...>
Date: Thu Sep 19, 2002 7:43 pm
Subject: Re: [BUG] Ambiguous-width character handling
Bram@...
Send Email Send Email
 
Autrijus Tang wrote:

[about ambiguous characters being single width while the terminal
displays them as double width]

Note that Vim only supports one font for the whole Vim window.  I don't
expect that a single font has two glyphs for the same character,
depending on the context.  Therefore the choice for whether an ambiguous
character is single or double width should match the font.

If we can't obtain the info from the font, an option could be used.

> With a little digging I found this entry in the :help todo entry:
>
>     8   Some UTF-8 have an ambiguous width (single or double).
>         Should inspect the font to find out what will be displayed. (Long)
>
> However I'm a little perplexed in how would you inspect the font under the
> console (with big5con, imcce or other multibyte vga terminals), or within
> the X terminal?  It seems to me that the "inspect the font to find out"
> way only works in GVIM; please correct me if I'm mistaken.

How does the terminal know how wide a character is?  I suppose any Asian
terminal emulator will prefer double-width characters for most
characters.

There are several alternatives:
1. Add an option that specifies the width of all ambiguous characters:
    single or double width.
2. Try obtaining the width from the font.  Won't work for terminal
    emulators.
3. A combination: use the option to specify single, double or auto,
    where auto works like the second alternative.

A user might set the option in his .vimrc based on the name of the
terminal:

	 if &term =~ "big5"
		 set ambiwidth=double
	 else
		 set ambiwidth=single
	 endif

--
The goal of science is to build better mousetraps.
The goal of nature is to build better mice.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#721 From: Bram Moolenaar <Bram@...>
Date: Thu Sep 19, 2002 7:43 pm
Subject: New page on Vim web site: script links
Bram@...
Send Email Send Email
 
I have added a new page to www.vim.org that lists all the links found in
the runtime scripts.  You can use this to get the latest version of a
syntax file, indent script, filetype plugin, etc.

If you are the maintainer of a script file and there is no link for your
file(s) or it is not correct, please e-mail me a new version with the
URL in the header, like this:

	 " URL:  http://www.zellner.org/vim/indent/xml.vim

No other info in this line please, it confuses the script I'm using to
extract this info and generate the web page.

--
hundred-and-one symptoms of being an internet addict:
3. Your bookmark takes 15 minutes to scroll from top to bottom.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#722 From: Noah Levitt <nlevitt@...>
Date: Thu Sep 19, 2002 8:09 pm
Subject: Re: [BUG] Ambiguous-width character handling
nlevitt@...
Send Email Send Email
 
On Thu, Sep 19, 2002 at 21:43:00 +0200, Bram Moolenaar wrote:
>
> Note that Vim only supports one font for the whole Vim window.  I don't
> expect that a single font has two glyphs for the same character,
> depending on the context.  Therefore the choice for whether an ambiguous
> character is single or double width should match the font.

What about guifont and guifontwide? The fonts I use have
some overlap.

Incidentally, Autrijus's sample line gave me no problems in
an utf8 xterm. It treated the characters as single-width.
xterm uses Markus Kuhn's wcwidth, I believe.

Noah

#723 From: Bram Moolenaar <Bram@...>
Date: Thu Sep 19, 2002 8:37 pm
Subject: Re: [BUG] Ambiguous-width character handling
Bram@...
Send Email Send Email
 
Noah Levitt wrote:

> On Thu, Sep 19, 2002 at 21:43:00 +0200, Bram Moolenaar wrote:
> >
> > Note that Vim only supports one font for the whole Vim window.  I don't
> > expect that a single font has two glyphs for the same character,
> > depending on the context.  Therefore the choice for whether an ambiguous
> > character is single or double width should match the font.
>
> What about guifont and guifontwide? The fonts I use have
> some overlap.

That has a chicken-egg problem: the choice between the two fonts is made
based on the width of a character.  There could be a test if a glyph for
a character is available, but that's complicated.

> Incidentally, Autrijus's sample line gave me no problems in
> an utf8 xterm. It treated the characters as single-width.
> xterm uses Markus Kuhn's wcwidth, I believe.

The function I use has the same source, thus it's no surprise Vim and
Xterm work well together.  The problem probably only exists on Asian
terminals.

--
The Feynman problem solving Algorithm:
	 1) Write down the problem
	 2) Think real hard
	 3) Write down the answer

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#724 From: Autrijus Tang <autrijus@...>
Date: Fri Sep 20, 2002 12:29 am
Subject: Re: [BUG] Ambiguous-width character handling
autrijus@...
Send Email Send Email
 
On Thu, Sep 19, 2002 at 10:37:04PM +0200, Bram Moolenaar wrote:
> Noah Levitt wrote:
> > On Thu, Sep 19, 2002 at 21:43:00 +0200, Bram Moolenaar wrote:
> > > Note that Vim only supports one font for the whole Vim window.  I don't
> > > expect that a single font has two glyphs for the same character,
> > > depending on the context.  Therefore the choice for whether an ambiguous
> > > character is single or double width should match the font.
> > What about guifont and guifontwide? The fonts I use have
> > some overlap.
> That has a chicken-egg problem: the choice between the two fonts is made
> based on the width of a character.  There could be a test if a glyph for
> a character is available, but that's complicated.

Yes, and I'd think that it is somehow "too smart" for Vim to probe
the glyph's width -- what if the guifont has the narrow glyph, and
guifontwide has the East Asian-fullwidth glyph?

A separate option, maybe probed initially be some heuristic
(but not neccessary), is IMHO more natural.

> > Incidentally, Autrijus's sample line gave me no problems in
> > an utf8 xterm. It treated the characters as single-width.
> > xterm uses Markus Kuhn's wcwidth, I believe.
> The function I use has the same source, thus it's no surprise Vim and
> Xterm work well together.  The problem probably only exists on Asian
> terminals.

Or rather, only existing on East Asian fonts, like the one I use here:
     "ar pl mingti2l big5-iso10646-1;"

Switching to other fonts can surely match the single-width results
given by wcwidth, but since text files prepared other unicode/big5
editors will assume double-width layout, the resulting formatting
and display will be incorrect from the author's perspective.

Thanks,
/Autrijus/

#725 From: mxgl <mxgl@...>
Date: Fri Sep 27, 2002 6:50 pm
Subject: Ancient Greek, Installing Fonts, Win2000
mxgl@...
Send Email Send Email
 
I would like to be able to edit text files of ancient greek
encoded in utf-8.

I found a monospaced font with the full character set for ancient greek,
I installed it, i am using vim with Win2000, but opening vim, it was not
available.

How does vim identify a "regular" font among the fonts installed in the fonts
directory?

How should I proceed?

#726 From: Baptiste Calmès <calmes@...>
Date: Thu Oct 3, 2002 2:50 pm
Subject: encoding in latin1 with OSX
calmes@...
Send Email Send Email
 
Hello,

I am a new vim user, so my question may seem fairly obvious, but I
cannot figure it out. I am running vim under mac OS 10.2.1, and would
like to save files using iso-latin-1 encoding. My keyboard is a french
keyboard. How can I do this. I have tried giving values to encoding,
termencoding, etc..., but my files seem to be always saved with mac os
roman coding. I cannot understand how to set-up things properly.

Thanks for any help.

Baptiste Calmès

#727 From: Bram Moolenaar <Bram@...>
Date: Sun Oct 6, 2002 4:49 pm
Subject: Re: RevOut
Bram@...
Send Email Send Email
 
Glenn Maynard wrote on Aug 18:

> In win32 gui_mch_draw_string:
>
> [...]
>
> RevOut claims that its code is needed for NT 5+ and 98+:
>
> [...]
> So, it's only being called in NT, but thinks the code is needed in NT5+
> and 98+.  So there are four cases: <= 95, >= 98, <= NT4, >= NT5, and
> it's not clear which of them need the per-character code and which can
> use ETO_IGNORELANGUAGE.  The selection should probably be done all in
> one place, rather than in two places (both RevOut and the call to
> RevOut).  I don't think the "special" optimization does anything; I'd just
> put it all in the call, so we'd have:

I didn't see a response or folloup about this.  Do we need to change
anything, or should we just wait until someone complains there is a
problem?

> Other problems:
> 1: This needs to be done all the time, not just when curwin->w_p_rl is
> set, or we get screen corruption if the file happens to have Hebrew/Arabic
> text in it.  We should always render all Unicode text documents without
> screen corruption, even if it's not "right" for the language itself.

Would there ever be Hebrew/Arabic text without 'rightleft' set?

> 2: This needs to be done in Unicode, too.  If I paste Arabic text when
> enc=UTF-8, I get graphical glitches.  (Well, I do if I comment out
> ETO_IGNORELANGUAGE, which indicates that this would happen in 9x.)

This should be simple to solve.

> Perhaps we should just *always* draw character-by-character?  Is it
> ever noticably slower?  (I can't tell; I'm on a K7/1000.)  There
> doesn't seem to be an easy way to find out if there are characters
> in the text that Windows is going to mess with.

I do think it would cause a slowdown.  Calling a windows system function
does have overhead.

--
hundred-and-one symptoms of being an internet addict:
134. You consider bandwidth to be more important than carats.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#728 From: Glenn Maynard <glenn@...>
Date: Sun Oct 6, 2002 7:03 pm
Subject: Re: RevOut
glenn@...
Send Email Send Email
 
On Sun, Oct 06, 2002 at 06:49:09PM +0200, Bram Moolenaar wrote:
> I didn't see a response or folloup about this.  Do we need to change
> anything, or should we just wait until someone complains there is a
> problem?

Well, I was trying to track down what was being done where, and what
problems happened when, and this complicated things a good bit.

> Would there ever be Hebrew/Arabic text without 'rightleft' set?

Editing multilingual text (eg. translation data); it'd be unpleasant
if I wanted to change some text in one language and saw screen
corruption for others.

> > Perhaps we should just *always* draw character-by-character?  Is it
> > ever noticably slower?  (I can't tell; I'm on a K7/1000.)  There
> > doesn't seem to be an easy way to find out if there are characters
> > in the text that Windows is going to mess with.
>
> I do think it would cause a slowdown.  Calling a windows system function
> does have overhead.

It might be a measurable but meaningless slowdown.

Or it might be a real slowdown.  The only safe way to find out is to
test it, but I don't have access to any really slow machines.

FYI, I have all of these patches and I've summarized them at
http://zewt.org/vim .  I havn't thrown them out because I've gone silent
recently, I've just been working on some other projects.  (I don't think
this is complete; the file I/O CP code isn't mentioned, at least.)

--
Glenn Maynard

#729 From: "Antoine J. Mechelynck" <antoine.mechelynck@...>
Date: Tue Oct 8, 2002 10:13 pm
Subject: Re: J for multi_byte
antoine.mechelynck@...
Send Email Send Email
 
It is more complicated than you think.

Mixing English, Russian and Greek, and possibly also Hebrew and Farsi,
within a single Unicode file produces a kind of multibyte output where
joining with an ascii space does make sense.

IIRC, I have seen something about automatic line breaking anywhere in
*doublewide* text and that could be related to what you are seeking. I
believe that it is for a future release though.

HTH,
Tony.

Xiangjiang Ma <maxiangjiang@...> wrote:
> Hi,
>
> How to map J to gJ for multi_byte text only.
>
> Anything like
>
> if (multi_byte)
>   map J gJ
> endif
>
> By default, J will leave a ascii space after joining,
> which makes no sense for multi_byte text.
>
> Thanks
>
>
>
> --
> Xiangjiang Ma
> maxiangjiang@...
> www.clarkson.edu/~maxi
>
>
> _________________________________________________________________
> Join the world's largest e-mail service with MSN Hotmail.
> http://www.hotmail.com

#730 From: "Antoine J. Mechelynck" <antoine.mechelynck@...>
Date: Thu Oct 10, 2002 12:32 am
Subject: Keymap switching (Was: Re: Unicode)
antoine.mechelynck@...
Send Email Send Email
 
Preben Randhol <randhol+vim@...> wrote:
[...]
> I have no hurry. I have a working solution. I was just pointing out that
> the utf-8 support could need improvements in the future. Perhaps
> somebody wants to write Hebrew and Russian in the same document, don't
> need English then. :-)
>
> --
> Preben Randhol ---------------- http://www.pvv.org/~randhol/ --
> iMy favorite editor is Emacs!<ESC>cbVim<ESC>
>                                          -- vim best-editor.txt

Nothing is perfect, so improvements can always be thought of. But let's
think big. Prehaps someone will want Hebrew, Arabic and English in the same
document (e.g. a treaty of Semitic philology). Or French, Russian, Greek and
Chinese. Or Norwegian, Czechish, Turkish and Eperanto (e.g. in a list of
participants to a congress). Or...

IMO it does make sense to be able not only to switch between English and one
other map, not only to switch between two non-English keymaps, but to cycle
through any number, as I think someone in this thread proposed. I'm not
gonna write it, because I don't need it myself. But I see at least one way
to do it.

Regards,
Tony.

#731 From: Preben Randhol <randhol+vim@...>
Date: Thu Oct 10, 2002 6:34 am
Subject: Re: Keymap switching (Was: Re: Unicode)
randhol+vim@...
Send Email Send Email
 
"Antoine J. Mechelynck" <antoine.mechelynck@...> wrote on 10/10/2002
(07:58) :
>
> IMO it does make sense to be able not only to switch between English and one
> other map, not only to switch between two non-English keymaps, but to cycle
> through any number, as I think someone in this thread proposed. I'm not
> gonna write it, because I don't need it myself. But I see at least one way
> to do it.

This is exactly what I think too :-)

--
Preben Randhol ---------------- http://www.pvv.org/~randhol/ --
iMy favorite editor is Emacs!<ESC>cbVim<ESC>
                                          -- vim best-editor.txt

#732 From: mxgl <mxgl@...>
Date: Fri Oct 11, 2002 5:45 pm
Subject: Greek utf-8-Font
mxgl@...
Send Email Send Email
 
Please excuse the delay im my response, for a greek utf-8-Font check the
following url:

http://bibliofile.mc.duke.edu/gww/fonts/Monospace/index.html

I hope this will help.

MfG mxgl

#733 From: Bram Moolenaar <Bram@...>
Date: Thu Oct 17, 2002 6:34 pm
Subject: And the winner is...
Bram@...
Send Email Send Email
 
Every year the Linux Journal magazine organizes the Readers' Choice
Awards, where people can vote for their favorite Linux items.  One of
the catagories is "Favorite text editor".  You can guess who won :-).

The list with results can be found at:

	 http://www.linuxjournal.com/article.php?sid=6380&mode=thread&order=0

Or read the November issue of Linux Journal.

Have fun with your award winning editor!

--
hundred-and-one symptoms of being an internet addict:
256. You are able to write down over 250 symptoms of being an internet
      addict, even though they only asked for 101.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#734 From: Motonobu Ichimura <famao@...>
Date: Fri Nov 1, 2002 7:08 pm
Subject: add multibyte support for hardcopy
famao@...
Send Email Send Email
 
Hi.

:hardcopy doesn't support multibyte characters,
so I wrote a patch to support it.

Currently this patch (can apply from 6.0 to 6.1.247) enables

Japanese EUC-JP
Japanese SJIS
Korean EUC-KR
Chinese Big5
Chinese GB2312

codeset to print PS file. (PS fontnames are hardcoded yet.)

some screenshots (output ps file using :hardcopy and view with gv)
are available at

http://www.momonga-linux.org/~famao/vim/

thanks.

#735 From: Motonobu Ichimura <famao@...>
Date: Fri Nov 1, 2002 7:25 pm
Subject: Re: add multibyte support for hardcopy
famao@...
Send Email Send Email
 
sorry, failed to attach patch :-<

On Sat, 2 Nov 2002 04:08:29 +0900
Motonobu Ichimura <famao@...> wrote:


> :hardcopy doesn't support multibyte characters,
> so I wrote a patch to support it.

here it is.

#736 From: Motonobu Ichimura <famao@...>
Date: Fri Nov 1, 2002 7:40 pm
Subject: Re: add multibyte support for hardcopy
famao@...
Send Email Send Email
 
I don't know why...

I have uploaded this patch at
http://www.momonga-linux.org/~famao/vim/vim-6.0-hardcopy-0.1.patch

thanks

#737 From: Bram Moolenaar <Bram@...>
Date: Fri Nov 1, 2002 10:19 pm
Subject: Re: add multibyte support for hardcopy
Bram@...
Send Email Send Email
 
Motonobu Ichimura wrote:

> :hardcopy doesn't support multibyte characters,
> so I wrote a patch to support it.
>
> Currently this patch (can apply from 6.0 to 6.1.247) enables
>
> Japanese EUC-JP
> Japanese SJIS
> Korean EUC-KR
> Chinese Big5
> Chinese GB2312
>
> codeset to print PS file. (PS fontnames are hardcoded yet.)
>
> some screenshots (output ps file using :hardcopy and view with gv)
> are available at
>
> http://www.momonga-linux.org/~famao/vim/

Thanks for making this patch.  I'll await comments.

--
    Arthur pulls Pin out.  The MONK blesses the grenade as ...
ARTHUR:  (quietly) One, two, five ...
GALAHAD: Three, sir!
ARTHUR:  Three.
                  "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#738 From: Nam SungHyun <namsh@...>
Date: Mon Nov 4, 2002 12:50 am
Subject: Re: add multibyte support for hardcopy
namsh@...
Send Email Send Email
 
On Fri, 01 Nov 2002 23:19:51 +0100, Bram Moolenaar wrote:
>
> Motonobu Ichimura wrote:
>
> > :hardcopy doesn't support multibyte characters,
> > so I wrote a patch to support it.
> >
> > Currently this patch (can apply from 6.0 to 6.1.247) enables
> >
> > Japanese EUC-JP
> > Japanese SJIS
> > Korean EUC-KR
> > Chinese Big5
> > Chinese GB2312
> >
> > codeset to print PS file. (PS fontnames are hardcoded yet.)
> >
> > some screenshots (output ps file using :hardcopy and view with gv)
> > are available at
> >
> > http://www.momonga-linux.org/~famao/vim/
>
> Thanks for making this patch.  I'll await comments.

Very good.
But I could not see generated ps file because I have no
MingMT-.. and SMGothic-.. fonts.

Currently I can see the ps file using
	 {"Gulim", "Gulim-Bold",
		 "Gulim-Oblique", "Gulim-BoldOblique"},

There's a free truetype fonts (baekmuk-ttf) package.
(ftp://ftp.mizi.co.kr/pub/baekmuk)
So, how about use such a name for EUC-KR?
(ex. fontconfig (XFree86 4.2.x) also use the name from Baekmuk.)
Or at lease, let user select the fonts.

Regards,
namsh

#739 From: Bram Moolenaar <Bram@...>
Date: Mon Nov 4, 2002 8:22 pm
Subject: Re: add multibyte support for hardcopy
Bram@...
Send Email Send Email
 
Namsh wrote:

> > Motonobu Ichimura wrote:
> >
> > > :hardcopy doesn't support multibyte characters,
> > > so I wrote a patch to support it.
> > >
> > > Currently this patch (can apply from 6.0 to 6.1.247) enables
> > >
> > > Japanese EUC-JP
> > > Japanese SJIS
> > > Korean EUC-KR
> > > Chinese Big5
> > > Chinese GB2312
> > >
> > > codeset to print PS file. (PS fontnames are hardcoded yet.)
> > >
> > > some screenshots (output ps file using :hardcopy and view with gv)
> > > are available at
> > >
> > > http://www.momonga-linux.org/~famao/vim/
> >
> > Thanks for making this patch.  I'll await comments.
>
> Very good.
> But I could not see generated ps file because I have no
> MingMT-.. and SMGothic-.. fonts.
>
> Currently I can see the ps file using
>  {"Gulim", "Gulim-Bold",
> 	 "Gulim-Oblique", "Gulim-BoldOblique"},
>
> There's a free truetype fonts (baekmuk-ttf) package.
> (ftp://ftp.mizi.co.kr/pub/baekmuk)
> So, how about use such a name for EUC-KR?
> (ex. fontconfig (XFree86 4.2.x) also use the name from Baekmuk.)
> Or at lease, let user select the fonts.

That is one of the problems to be solved: How to select the printer
fonts.  We need a good mechanism for this.  And then defaults that work
for most people.

--
Living on Earth includes an annual free trip around the Sun.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#740 From: Nam SungHyun <namsh@...>
Date: Mon Nov 4, 2002 11:11 pm
Subject: Re: add multibyte support for hardcopy
namsh@...
Send Email Send Email
 
On Mon, 04 Nov 2002 21:22:47 +0100, Bram Moolenaar wrote:
>
> Namsh wrote:
>
> > > Motonobu Ichimura wrote:
> > >
> > > > :hardcopy doesn't support multibyte characters,
> > > > so I wrote a patch to support it.
> >
> > Currently I can see the ps file using
> >  {"Gulim", "Gulim-Bold",
> > 	 "Gulim-Oblique", "Gulim-BoldOblique"},
> >
> > There's a free truetype fonts (baekmuk-ttf) package.
> > (ftp://ftp.mizi.co.kr/pub/baekmuk)
> > So, how about use such a name for EUC-KR?
>
> That is one of the problems to be solved: How to select the printer
> fonts.  We need a good mechanism for this.  And then defaults that work
> for most people.

I think user can print using 'gs' if he can see it using 'gs'.
I guess it means no need to know about printer fonts.

At least, I can print ps file (including those generated by vim)
using ghostscript.

     inf="$1"
     outf=tmp.lj
     gs -q -dSAFER -sDEVICE=ljet4 -sPAPERSIZE=a4 -r600 -dNOPAUSE \
	 -sOutputFile=${outf} ${inf} -c quit
     lpr -h ${outf}

regards,
namsh

#741 From: "Yasuhiro Matsumoto" <mattn_jp@...>
Date: Thu Nov 7, 2002 3:00 am
Subject: Re: add multibyte support for hardcopy
mattn_jp@...
Send Email Send Email
 
>Hi.
>
>:hardcopy doesn't support multibyte characters,
>so I wrote a patch to support it.
>
>Currently this patch (can apply from 6.0 to 6.1.247) enables
>
>Japanese EUC-JP
>Japanese SJIS
>Korean EUC-KR
>Chinese Big5
>Chinese GB2312
>
>codeset to print PS file. (PS fontnames are hardcoded yet.)
>
>some screenshots (output ps file using :hardcopy and view with gv) are
>available at

Hello Ichimura.

I tried your patch, it's great!.
It is coincidence, I was just talking about this problem
with Mike Williams and Bram at last month.

I guess that it need bit's change. :-)
I could output ps file on win32 with this patch.
(I made this patch against your patch.)

BTW)
As bram said, there is a problem about
   "How to select the printer fonts."
I think, it is very difficult for vim to select a CMap.
Maybe, most users has the other way to output ps like a2ps.
I think, vim don't need to change more after this patch.

Thanks.

*** ex_cmds2.c~ Thu Nov 07 11:42:14 2002
--- ex_cmds2.c Thu Nov 07 11:39:51 2002
***************
*** 2632,2637 ****
--- 2632,2638 ----
   #ifdef FEAT_MBYTE
   static void prt_set_mfont __ARGS((int bold, int italic, int underline));
   static void prt_mfont_init __ARGS((void));
+ void mch_print_set_mfont __ARGS((int, int, int));
   #endif
   static void prt_line_number __ARGS((prt_settings_T *psettings, int
page_line, linenr_T lnum));
   static void prt_header __ARGS((prt_settings_T *psettings, int pagenum,
linenr_T lnum));
***************
*** 3465,3471 ****
  	 -250, 805,
  	 {"Ryumin-Light-RKSJ-H", "GothicBBB-Medium-RKSJ-H",
  		 "Ryumin-Light-RKSJ-H", "GothicBBB-Medium-RKSJ-H"},
!  "sjis"
       },
       {
  	 /* Korean EUC-KR */
--- 3466,3472 ----
  	 -250, 805,
  	 {"Ryumin-Light-RKSJ-H", "GothicBBB-Medium-RKSJ-H",
  		 "Ryumin-Light-RKSJ-H", "GothicBBB-Medium-RKSJ-H"},
!  "sjis,shift_jis,cp932"
       },
       {
  	 /* Korean EUC-KR */
***************
*** 3716,3727 ****
       }
       for (i = 0; i < SUPPORTED_ENCODINGS; i++)
       {
!  if (!STRICMP(p_enc,prt_ps_mfonts[i].encoding))
  	 {
! 	    prt_ps_mfont = prt_ps_mfonts[i];
! 	    prt_has_mfont = TRUE;
  	     return;
  	 }
       }
       prt_has_mfont = FALSE;
   }
--- 3717,3744 ----
       }
       for (i = 0; i < SUPPORTED_ENCODINGS; i++)
       {
!  char_u *name;
!  char_u *ptr = vim_strsave(prt_ps_mfonts[i].encoding);
!  char_u *old = ptr;
!  if (!ptr)
  	 {
! 	    prt_has_mfont = FALSE;
  	     return;
  	 }
+  while(*ptr)
+  {
+ 	    name = ptr;
+ 	    while(*ptr != ',' && *ptr != '\0')
+ 	 ptr++;
+ 	    *ptr++ = 0;
+ 	    if (!STRICMP(p_enc, name))
+ 	    {
+ 	 prt_ps_mfont = prt_ps_mfonts[i];
+ 	 prt_has_mfont = TRUE;
+ 	 return;
+ 	    }
+  }
+  vim_free(old);
       }
       prt_has_mfont = FALSE;
   }
***************
*** 4801,4807 ****
       if (len > 1) {
  	 int i;
  	 for (i = 0; i < len ; i++) {
! 	    ga_append (&prt_ps_buffer, p[i]);
  	 }
  	 goto done;
       }
--- 4818,4827 ----
       if (len > 1) {
  	 int i;
  	 for (i = 0; i < len ; i++) {
! 	    ch = p[i];
! 	    if (ch == '(' || ch == ')' || ch == '\\')
! 	 ga_append(&prt_ps_buffer, IF_EB('\\', 0134));
! 	    ga_append(&prt_ps_buffer, ch);
  	 }
  	 goto done;
       }
--

- Yasuhiro


_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.
http://join.msn.com/?page=features/featuredemail

#742 From: "Mike Williams" <mike.williams@...>
Date: Sun Nov 10, 2002 10:09 am
Subject: Re: add multibyte support for hardcopy
mike.williams@...
Send Email Send Email
 
Hi,

I have a few comments.  I am no expert in CJKV typography (how to
layout the characters) but have experience in the PostScript used to
them out, so please excuse me if I ask any silly questions.  I
haven't seen the full set of patches so I may have missed some things
answered there.

I would suggest separating the CMap name from the font name.  It may
be useful to change the CMap name based on the platform - for example
CMap 90ms-RKSJ-H supports Windows 3.1J and 95J characters sets.  To
the support the Apple Macintosh Traditional Chinese character set you
need to use B5pc-H CMap. The font and CMap names can be joined
together when generating the output file.

A CMap defines the character collection to be used for printing, so
is also dependent on the display font being used if you want to be
able to print all the characters in the file.  For example, the CMap
Ext-RKSJ-H includes IBM extensions to the JIS X 0208 character set,
or ETen-B5-H for the Big Fice character set with the ETen extensions.

A CMap can also defines the encoding.  While RKSJ-H uses EUC-JP and
Shift-JIS, UniJIS-UCS2-H uses UCS-2 and UniJIS-UTF8-H uses UTF-8.
The Chinese and Korean character collections support additional
encodings as well.

The Courier font used for latin1 encoded files has a width of .6 of
the selected font height which is font specific (but common to Roman,
Bold, and Italic versions of Courier).  Have you been able to find
the font metrics for the CID fonts?  If not I guess you may be seeing
an odd spacing between lines (too big a gap is my guess).

Also, CID fonts can include half-width and proportional versions of
Latin characters.  Are you mapping these back to the full-width
versions? Or leaving them as they are?  Or is this not a problem?

Another thing to be aware of is that some Adobe documented CMaps may
not be available on a printer. The behaviour of a printer is
unpredictable when this happens - it may not print anything, it may
substitute a default CMap, or it may even revert to printing as
single byte encoded using Courier or Helvetica.  Has anyone tried
this?

There are a number of PDFs on CID fonts and CMaps publicly available
at http://partners.adobe.com/asn/developer/technotes/fonts.html
In particular Technical Note 5094 may be of interest as it lists the
current standard Adobe CMaps for Chinese, Japanese and Korean CID
fonts.

I am happy to help, do, or review any PostScript coding to support
multi-byte printing in VIM.  Just drop me a line.

TTFN

On 7 Nov 2002 at 12:00, Yasuhiro Matsumoto wrote:

> >Hi.
> >
> >:hardcopy doesn't support multibyte characters,
> >so I wrote a patch to support it.
> >
> >Currently this patch (can apply from 6.0 to 6.1.247) enables
> >
> >Japanese EUC-JP
> >Japanese SJIS
> >Korean EUC-KR
> >Chinese Big5
> >Chinese GB2312
> >
> >codeset to print PS file. (PS fontnames are hardcoded yet.)
> >
> >some screenshots (output ps file using :hardcopy and view with gv) are
> >available at
>
> Hello Ichimura.
>
> I tried your patch, it's great!.
> It is coincidence, I was just talking about this problem
> with Mike Williams and Bram at last month.
>
> I guess that it need bit's change. :-)
> I could output ps file on win32 with this patch.
> (I made this patch against your patch.)
>
> BTW)
> As bram said, there is a problem about
>   "How to select the printer fonts."
> I think, it is very difficult for vim to select a CMap.
> Maybe, most users has the other way to output ps like a2ps.
> I think, vim don't need to change more after this patch.
>
> Thanks.

Mike
--
One good thing about repeating your mistakes is that you know when to cringe.

#743 From: "Beck, Zak" <zak.beck@...>
Date: Thu Nov 14, 2002 3:52 pm
Subject: Win32 clipboard and pasting unicode from other apps
zak.beck@...
Send Email Send Email
 
Hi

I think I know why gvim on Win32 breaks unicode cut/copy and pasting from
other apps, but I don't know enough (yet) about the vim internals and
especially multibyte to fix it.

I used ClipSpy from http://home.inreach.com/mdunn/ and copied from a
Japanese MS Word document to see exactly what was put on the clipboard.  The
text was put on the clipboard in two formats: CF_TEXT, which produced a
series of '?' characters (0x3F), which was presumably because the japanese
characters do not exist in ANSI.  However, the correct text was put on the
clipboard in the CF_UNICODETEXT format, which is presumably UTF-8.

The problem is in os_mswin.c, the function clip_mch_request_selection (line
755) only requests the CF_TEXT format from the clipboard.  I would imagine
the best way around this would be to modify this function so that it uses
the CF_UNICODETEXT format in preference to the CF_TEXT format if set
encoding is a multibyte format.

My problems are:

1. I don't know how to determine whether or not I want the CF_TEXT format or
the CF_UNICODETEXT. Presumably I look at the value of encoding and determine
it somehow from that? Is there a function to do this?

2. I don't know what to do with the CF_UNICODETEXT once I've retrieved it
from the clipboard! Will it need converting to another format? Possibly the
code in that function will work as is, I don't know enough about the
internals.

Anyway, if this helps anyone or if anyone can help me, please let me know!

Zak Beck
Accenture HR Services (formerly e-peopleserve)
Learning Management Systems
Tel: 01785 762750
email: mailto:zak.beck@...

This electronic message contains information from e-peopleserve, which may
be privileged and/or confidential. The information is intended to be for the
use of the individual(s) or entity named above. If you are not the intended
recipient, be aware that any disclosure, copying distribution or use of the
contents of this information is prohibited. If you have received this
electronic message in error, please notify us by telephone or e-mail (to the
number or address above) immediately.

#744 From: Bram Moolenaar <Bram@...>
Date: Thu Nov 14, 2002 8:04 pm
Subject: Re: Win32 clipboard and pasting unicode from other apps
Bram@...
Send Email Send Email
 
Zak Beck wrote:

> I think I know why gvim on Win32 breaks unicode cut/copy and pasting from
> other apps, but I don't know enough (yet) about the vim internals and
> especially multibyte to fix it.

There are a few patches that have been made, but not included yet.  If
I'm not mistaken then this one is the most recent:

     http://members.telocity.com/~seer26/UTF8_STRING.patch

I don't know if this solves all the problems you mention, please check
it out.

--
hundred-and-one symptoms of being an internet addict:
31. You code your homework in HTML and give your instructor the URL.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

#745 From: "Maiorana, Jason" <jmaiorana@...>
Date: Thu Nov 14, 2002 8:36 pm
Subject: RE: Win32 clipboard and pasting unicode from other apps
jmaiorana@...
Send Email Send Email
 
> I think I know why gvim on Win32 breaks unicode cut/copy and pasting
from
> other apps, but I don't know enough (yet) about the vim internals and
> especially multibyte to fix it.
>There are a few patches that have been made, but not included yet.  If
>I'm not mistaken then this one is the most recent:
>  http://members.telocity.com/~seer26/UTF8_STRING.patch


GAK, that patch is for GTK gvim and console vim's connection to
the X server. I dont know the first bit about gvim on windows.
(Does the Win32 version use GTK?)

The patch basically turns on support for the X clipboard format
know as "UTF8_STRING", which is used by gtk2, mozilla, etc, pretty
much any recent app uses it instead of "COMPOUND_TEXT". (Also
in my apps I dont bother to support anything but UTF8_STRING,
so it help me personally too.)

On my system "COMPOUND_TEXT" doesnt work because i muck around with
my locales etc too much, so in order for me to copy/paste multilanguage
stuff I needed "UTF8_STRING" pretty badly. PS, ive been using the patch
with vim/gvim for several months now, with no problems.

So i think my patch is a bit X-windows oriented.
As for ms-windows, doesnt that use some sort of UTF-16 encoding
everywhere?
I dont have any way to write windows code, so I cant help you with
that...

#746 From: Bram Moolenaar <Bram@...>
Date: Thu Nov 14, 2002 9:18 pm
Subject: RE: Win32 clipboard and pasting unicode from other apps
Bram@...
Send Email Send Email
 
Jason Maiorana wrote:

> > > I think I know why gvim on Win32 breaks unicode cut/copy and pasting
> > > from other apps, but I don't know enough (yet) about the vim internals
> > > and especially multibyte to fix it.
> >There are a few patches that have been made, but not included yet.  If
> >I'm not mistaken then this one is the most recent:
> >  http://members.telocity.com/~seer26/UTF8_STRING.patch
>
> GAK, that patch is for GTK gvim and console vim's connection to
> the X server. I dont know the first bit about gvim on windows.
> (Does the Win32 version use GTK?)

Sorry, my mistake.  I can't seem to find a patch for Win32.  Maybe it
was only suggested, not implemented.

> As for ms-windows, doesnt that use some sort of UTF-16 encoding
> everywhere?
> I dont have any way to write windows code, so I cant help you with
> that...

MS-Windows uses UCS-2, 16 bit Unicode charactes.  The problem is that
the format stored on the (default) clipboard is not specified, thus we
must do a few tricks when using multiple encodings.  Someone worked out
how it should work, but I can't find the reference right now...

--
He who laughs last, thinks slowest.

  ///  Bram Moolenaar -- Bram@... -- http://www.moolenaar.net  \\\
///          Creator of Vim - Vi IMproved -- http://www.vim.org          \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
  \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///

Messages 717 - 746 of 2761   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help