Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Mime-Version: 1.0 (Apple Message framework v553)
Content-Transfer-Encoding: 8bit
Message-ID:  <BFBC483C-ACE5-11DA-B480-0003930D5AB6@residenset.net>
Date:         Mon, 6 Mar 2006 08:49:32 +0100
Reply-To: Mailing list for the LaTeX3 project
              <LATEX-L@LISTSERV.UNI-HEIDELBERG.DE>
Sender: Mailing list for the LaTeX3 project <LATEX-L@LISTSERV.UNI-HEIDELBERG.DE>
From: =?ISO-8859-1?Q?Lars_Hellstr=F6m?= <Lars.Hellstrom@RESIDENSET.NET>
Subject: Re: LICR objects
To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE
In-Reply-To:  <17419.19074.277514.24682@morse.mittelbach-online.de>
Precedence: list
Status: R

Söndagen den 5 mars 2006 kl 21.30 skrev Frank Mittelbach:
>
> that is not to say that  the line
>
>>>>  \DeclareUnicodeCharacter{02C6}{\textasciicircum}
>
> is probably wrong it should be most likely
>
>   \DeclareUnicodeCharacter{005E}{\textasciicircum}
>
> and several others have similar defects.  would be good if that got 
> checked.

Is that even a legal definition? U+005E (^) is, as was mentioned 
earlier in this thread, syntax in LaTeX, so you can't inputenc map it 
to something. Or are you thinking about some attempt at supporting 
verbatim input?

>> Example: Assuming there is a word "deaffish" and the
>> author does not want a ligature ffi spanning both word parts.
>> Therefore, having a good editor, he uses the Unicode sequence
>> U+0066 U+FB01 to specify the correct and desired ligature.
>> Using the later case of \DeclareUnicodeCharacter{FB01}
>> TeX would get "ffi" and then form the wrong ligature.
>
> wrong example in my opinion. as Lars said: fi or ffi ligature ended up 
> in
> unicode as legacy codes because they were in legacy 8-bit encodings. 
> million
> other ligatures are not available as "chars" because UC like most other
> standards are heavily influenced by what is right for certain 
> countries but
> not others. using "fi" in this way is like using tables in html to 
> position
> elements on the page, ie it works for that example but ...
>
> so the right thing is not to use fi at all here but would be to a 
> generic
> method to denote subword boundaries or whatever to allow the formatter 
> not to
> use the ligature. TeX's method would be \textcompwordmark ... but 
> unicode
> never thought that such encoding of lgoical information is the task of 
> the
> standard.

Actually, U+200C (ZERO WIDTH NON-JOINER) seems to me a perfect match to 
\textcompwordmark, and I've entered it as such in my "Draft 
specification for the T1 encoding".

More pragmatically, one may of course write "deaf\-fish" to not only 
escape the ligature, but also point out the proper point of hyphenation.

Lars Hellström