X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2998" "Tue" "21" "October" "1997" "12:26:11" "+0200" "Roger Kehr" "kehr@ITI.INFORMATIK.TU-DARMSTADT.DE" nil "69" "Re: von v. van der & other problems" "^Date:" nil nil "10" nil nil nil nil nil] nil) Received: from listserv.gmd.de (listserv.gmd.de [192.88.97.1]) by mail.Uni-Mainz.DE (8.8.5/8.8.5) with ESMTP id MAA30275; Tue, 21 Oct 1997 12:26:23 +0200 (MET DST) Received: from lsv1.listserv.gmd.de by listserv.gmd.de (LSMTP for OpenVMS v1.1a) with SMTP id <5.00BD7123@listserv.gmd.de>; Tue, 21 Oct 1997 12:26:21 +0200 Received: from RELAY.URZ.UNI-HEIDELBERG.DE by RELAY.URZ.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 1.8b) with spool id 219406 for LATEX-L@RELAY.URZ.UNI-HEIDELBERG.DE; Tue, 21 Oct 1997 12:26:13 +0200 Received: from ix.urz.uni-heidelberg.de (root@termalt1.urz.uni-heidelberg.de [129.206.119.231]) by relay.urz.uni-heidelberg.de (8.8.7/8.8.7) with SMTP id MAA06882 for ; Tue, 21 Oct 1997 12:26:11 +0200 (MET DST) Received: from sponsor.iti.informatik.tu-darmstadt.de by ix.urz.uni-heidelberg.de (AIX 3.2/UCB 5.64/4.03termalt1) id AA285452; Tue, 21 Oct 1997 12:27:03 +0200 Received: (from kehr@localhost) by sponsor.iti.informatik.tu-darmstadt.de (8.8.4/8.8.5) id MAA05364; Tue, 21 Oct 1997 12:26:11 +0200 (MET DST) Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: VM 6.32 under 19.15p5 XEmacs Lucid Message-ID: <199710211026.MAA05364@sponsor.iti.informatik.tu-darmstadt.de> Reply-To: Mailing list for the LaTeX3 project Date: Tue, 21 Oct 1997 12:26:11 +0200 From: Roger Kehr Sender: Mailing list for the LaTeX3 project To: Multiple recipients of list LATEX-L Subject: Re: von v. van der & other problems Status: R X-Status: X-Keywords: X-UID: 2495 Phillip Helbig writes: > Maarten Gelderman : > > This is something which needs to be considered. There are many ways to > treat things like v (this is the same as the letter marked above if it > gets garbled by some email software along the way). > > o treat it as a separate letter (in Swedish it's the last letter of > the alphabet) > > o treat it as o (common practice in English, mixing it with things > really written with o) > > o treat it as oe, mixing it with things REALLY written with oe > (this is done in German telephone books) > > o put it immediately before o > > o put it immediately after o > > o put it immediately before oe > > o put it immediately after oe > > I've definitely seen the first three in use. The rest are thinkable. In > German, sometimes (such as in address books, file collections etc) Th, > Ph, Chr, Sch, St etc, either as the beginnings of words or as initials > of names (one sees both), are treated as essentially separate letters, > usually coming after the first letter of the group (corresponding to > example 5 above). The problem with these is that in German there are surnames are M\"uller as well as Mueller. Hence, both styles are possible. In this case there is still no exact order defined for both entries which might confuse the reader if several people have names like these, especially in telephone books with dozens of entries. Then we need a rule that says: 1. Phase: Treat all \"o's as if they were oe's. 2. Phase: Put all "\o's in front of oe (or vice versa). Hence, specifying rules like this is not trivial. The French sorting rules for example as well require more than one sorting phase, due to the fact that at first diacritical marks are not considered, so e and \'e are equal in a first phase. If then there are words left such as cote, cot\'e the diacritical marks define the exact order, but lexicographically from right to left. This is rather complicated to implement. I have done some work on the xindy index processor that to a large extend offers mechanisms to solve these problems. The current implementation uses a string rewriting mechanism that operates in several stages. And what I learned from that project is that it requires a lot of effort to obtain a complete and consistent specification of sorting rules. There exists an ISO Standard "ISO/IEC CD 14651 - International String Ordering - Method for comparing Character Strings and Description of a Default Tailorable Ordering" about his topic. But it does not offer solutions for the PhD. stuff Maarten mentioned above. I hope I haven't frustrated you. Cheers --Roger P.S: The ISO standard is available at http://www.dkuug.dk/JTC1/SC22/WG20. -- ====================================================================== Roger Kehr kehr@iti.informatik.tu-darmstadt.de Computer Science Department Darmstadt University of Technology