Return-Path: Delivered-To: rainer Received: from h2774747.stratoserver.net (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) by h2774747.stratoserver.net (Dovecot) with LMTP id KcDNDIGXcF7eQgAA4+3H6A for ; Tue, 17 Mar 2020 10:25:21 +0100 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) by h2774747.stratoserver.net (8.15.2/8.15.2/Debian-3) with ESMTP id 02H9PJqG017116 for ; Tue, 17 Mar 2020 10:25:20 +0100 Received: from relay.uni-heidelberg.de ([129.206.100.212]) by mx-ha.gmx.net (mxgmx016 [212.227.15.9]) with ESMTP (Nemesis) id 1MGjNA-1j1V8q1mMc-00DrbP for ; Tue, 17 Mar 2020 10:25:14 +0100 Received: from listserv.uni-heidelberg.de ([129.206.100.94]) by relay.uni-heidelberg.de with ESMTP; 17 Mar 2020 10:25:14 +0100 Received: from listserv (localhost [127.0.0.1]) by listserv.uni-heidelberg.de (Postfix) with ESMTP id A10B4127EFA; Tue, 17 Mar 2020 09:30:57 +0100 (CET) Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 17.0) with spool id 42661622 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Tue, 17 Mar 2020 09:30:57 +0100 Delivered-To: LATEX-L@listserv.uni-heidelberg.de Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.119.212]) by listserv.uni-heidelberg.de (Postfix) with ESMTP id 78BE1124030 for ; Tue, 17 Mar 2020 09:30:57 +0100 (CET) X-IronPort-MID: 56624862 X-IronPort-RemoteIP: 209.85.221.45 X-IronPort-SenderGroup: UNKNOWNLIST X-IronPort-MailFlowPolicy: $ACCEPTED X-IronPort-Reputation: 2.8 IronPort-PHdr: =?us-ascii?q?9a23=3A7yl2mhd8KSS+ox0IA1cXr9vVlGMj4e+mFxMJ6p?= =?us-ascii?q?chl7NFe7ii+JKnBkHE+PFxlwaZDdfB5vZNmrCQrqbhHGwN4JqMtHoPapMKWx?= =?us-ascii?q?JWwd4OkVkGB8iIQVb+MOasdzYzScZFT1J9/zekK0laGNrWYlrIpHy04iUOEw?= =?us-ascii?q?+lcwFyYPn2ScbJl8ri8ee09tXIZhlQwj+0YLd8NhKz+AfWrcQOgc19MKc+yg?= =?us-ascii?q?fhp3xSfeVQynh0LEjJ2R37oNqzr9Z46yoFnfUn+oZbVLniOaQ1SbsNFDM9L2?= =?us-ascii?q?U8/9HmrzHGRAqLo2QGCyAYzkcODA/C4xX3GJz2t3Kg5NBw0ySbI8D6CIsMd2?= =?us-ascii?q?/+tfVNTxnlwGcdOjow4TuRl8pxyaJWoR6soxZy2YGSbIzHfPx5NrjQe98XXw?= =?us-ascii?q?8jFo5YSjBBD4WgboAOE/tJPOBWqJP4rkcPqh32DBelBefmwDtFznHs2qhy3+?= =?us-ascii?q?MkGADAlAsuerBG+HbTt9jtNOENS+G6zLXgyDLZafJQ1izh4ZKSNBsm5+yPHP?= =?us-ascii?q?pxfcfX1UgzBlbdlFzDzO6tdziR1+kLry2a9786DbPp2zNh8Vso5GTylY82h4?= =?us-ascii?q?LEh5wY0AXe+CFw0dxzONu5DU52YNKpFJFdrS7cO4YlJ6FqC2xupis+zaUL/J?= =?us-ascii?q?Chey1fgpYg3BPHa7qYaYmH4g7LUeGLJzZ+i2l5cajlwR21t1WjgL6ZNIH8wB?= =?us-ascii?q?NRoyxJn8OZ/HIKzBHL6o6YUPpy/luJ3DGV0QHV5fpYKFpu06HcbYMiiO1V9N?= =?us-ascii?q?JboQHIGSn4n1/zhamdexA/++Sm3O/gZ63vup6WM4It0FPEP68jm9KyDaEDCi?= =?us-ascii?q?ZVAzfJ3+O62fWj50D9SaQQyOAwk+zZvZHWKMARq7S2RQNYgM4l7F6kAjGq3c?= =?us-ascii?q?59/zFPJU9ZeB+Bk4niOk3faPH+A/Clhl2wkTBtj/nYN7zlC5/JIzDNirDkNb?= =?us-ascii?q?p67kddzkI0w7U9r9pWDaoMOPK1RVXwu9zEJho+KQq1xen8FNxnjMUVXiSSAe?= =?us-ascii?q?7RMa/ft0OJ+vN6JuCNYIEPvzOuTpptr/XqjHI/hRocZfzzhcpROC3+RK4/ZR?= =?us-ascii?q?nEMjL2j9wMEHkHpF8kQejjmQfETDhSIna/Xq4x4js2Eo3gAYqQI+LlyLGHwi?= =?us-ascii?q?q/GYVbI25cDVXZW3XvbIicW7ETciOdItNJlzseU7mnRJQ93Av08gT9jadkZL?= =?us-ascii?q?mxmGVQpdf42d556veG3xM17jFsD4KCyWCCT31cm2QSQT4w07tjqFYnjFyEl7?= =?us-ascii?q?V7ya89d5Qb97ZCVQE0MoTZxupxBoXpWw7PSdyOTU6vXtSsBTxoFIALztQDYl?= =?us-ascii?q?hwF5CZtj6Yg3X4JbgTmvTLHpEw+7OGmWX3JoN9xnfK2a0rgkMpBMdCZyWgge?= =?us-ascii?q?Zk+g7fCpSs8Q3Rnru2dakawC/G9XuShWuIskZCVQdsUKLDFXkBb0rSpN796w?= =?us-ascii?q?vMVbirQbggNwJAz4aFJM4oIpXxik5aQf74JNnES2e4mmP1GgzRg73QMczlfG?= =?us-ascii?q?IS2CibA08B0kgS8XuAKQkiF3KhrmbZX1kMXRrkZ0Lh9/U7qWvuFBdliVHXKR?= =?us-ascii?q?c7jvzpp01dn/GXRvIN06hRoioloiQuWku73pfWBtuMqg5rcb9TJ98wszIlnS?= =?us-ascii?q?rUsRJwOpu4IuVsnFkbJk51uFjjzxUxFZ9Bnsg2hHYt1gxzL62DzFpbLXWT2t?= =?us-ascii?q?btOfeETwu6tADqcKPQ1lzEhZyO/bwT7f0jt1j5lASgF05n7G8+ltcJjj2T4Z?= =?us-ascii?q?LFCAdUWpX0GBVSlVAyt/TRZS8z4JnR3HtnPPyvszPM7NkuAfMs1heqe9oGbP?= =?us-ascii?q?G0UTTqGshfPPCAbfQwkgLxPBcDJuZT8qEvI8q8Lb2N3+i2P7Q4xWP0vSF8+I?= =?us-ascii?q?l4l3m02W95Q+/M0YwCxqvAjA6ATTH9ili6rsntw8ZPYncPHTjnkHW2NMtqfq?= =?us-ascii?q?R3OL0zJyKuLsmwnIgsgpfsXztH7gfmCQ9bnsCufhWWYhr22ggCjUk=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0C/CAAsinBegC3dVdFmgkOBeyd0UiM?= =?us-ascii?q?ENYQWjwiBbCWBAZlpA1QJAQEBAQEBAQEBBwEvBAEBhEMCgiMcBgY1BQ0CAwE?= =?us-ascii?q?BAQMCBQEBBQEBAQIBAgMEFAEBCQ0JCCeFXwyFZAEBAQMBAQwEEQ8BBQgBASM?= =?us-ascii?q?VDwsYAgImAgI0AQUBHAUBDQgBAR6DBIJ8BKABgQQ9iyiBMoJ/AQEFgkSDU4E?= =?us-ascii?q?1CQkBCHwqinqBNA+BTD+BEScMA4IpBy4+hDWDJ4Jer0t2B4I/eASVbyOPCA0?= =?us-ascii?q?FjCmgaIl3AgoHBg8jgUdlgRQzGjSDK1AYDY9EAQKCSYpVQTMCjHuCQgEB?= X-IronPort-AV: E=Sophos;i="5.70,563,1574118000"; d="scan'208";a="56624862" X-MGA-submission: =?us-ascii?q?MDGHrpEMvJdaFUTzySXKP4deThmsfFEUk/7Q7i?= =?us-ascii?q?Hf5+nj9rHAhawlEFj5SZyTKYm52JEwqvBpf1eI+GJ+vQ6CozhfK+0Zqs?= =?us-ascii?q?zYfbZzwKV5JCB9ctPhl7HJN9d8pp4BpAKXaRCiH1G+PbTE/WwKCxXaPB?= =?us-ascii?q?6pKoPvb7opOyvXRePdpr4YjQ=3D=3D?= Received: from mail-wr1-f45.google.com ([209.85.221.45]) by relay2.uni-heidelberg.de with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 17 Mar 2020 09:30:58 +0100 Received: by mail-wr1-f45.google.com with SMTP id f11so7655989wrp.8 for ; Tue, 17 Mar 2020 01:30:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=IiAer5CjiiTBGmSIQyA+/QhRgBbxYLHXGPjVhzNque0=; b=t8drCE6VCJypnFJS5erEBq50LvOpb8gmqAD25sRp/Qn8jJtCojX4cKemCWkPgfbsWl lWxUvtyVsgStQO5XYC00I8/Wx772nPSDAmzrXGsPmq7cJkRBhHFa+0gRWnvE7UEVVJ/B UYJjbHc2suG8ci2z0F3+WxpKU+W+fg2jGl3HA1Qw+HC86I8bJ/iXlK4G04VD4BiEspqS xkDyr3dwKBLzyDTxYW+ahamC782tRrKwHy6INlzmKiG0rTtY7H1KGFUPr7YQALzv4Xae quBxCa9Kn27iOGzbrJz9yHybhBA1vf6hnaTcX8P6/8vetXwGeFFVaFxWc6noyzl4vAZu dTZw== X-Gm-Message-State: ANhLgQ2WeaYzoLC2YNdDtxysAcOqgNFNphIoatQhOe518QVPUsLY8cui s9jG+q4z6M0vYBl3TfK8D1hW+81KbGA= X-Google-Smtp-Source: ADFU+vthQ/AnglcIrdIpCOG4cTciMrs1KjKciIkRuXQ/8JGNCq4Tg/j+JrYQFKD7wndPEsqBnRwGoA== X-Received: by 2002:a5d:6aca:: with SMTP id u10mr4375503wrw.99.1584433856124; Tue, 17 Mar 2020 01:30:56 -0700 (PDT) Received: from [192.168.178.20] (82-69-88-43.dsl.in-addr.zen.co.uk. [82.69.88.43]) by smtp.gmail.com with ESMTPSA id k3sm3401984wro.59.2020.03.17.01.30.54 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Mar 2020 01:30:55 -0700 (PDT) References: <8422310776248060.WA.kellysmith12.21gmail.com@listserv.uni-heidelberg.de> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Message-ID: Date: Tue, 17 Mar 2020 08:30:54 +0000 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Joseph Wright Subject: Re: Using Lua to Preprocess Unicode Data To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: <8422310776248060.WA.kellysmith12.21gmail.com@listserv.uni-heidelberg.de> Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-Spam-Flag: NO X-UI-Filterresults: notjunk:1;V03:K0:orVjP5grO8w=:t+tE95nGsEe2Vwg8IpaU2pGiQD IURIjsCOSY/OnjN+cUdalbCkYYhYGWBt+IiYROuaWh0itPidh7JAIIBCikQyLCoIrOfOyishY ofeewZvRZ4GlwY1j72wyNiC2KojQsbtiZ+Gxv6rZAa48DRoUK491+cjGX1UqAIA+ZIk6kBBoJ XQxjsao7SpB4Omtooe+681PnyZeCpiczYGvLAggKFJQrLJRVaLzrtv8gELZbM/h0Xs01UeHEy wQWFwAaPabq40YpO8WQNcNj6jM2x5Il/uzRk1oEirEOWCg6fHSDG7fZzvfNxkXOlDogO+B5bj mfB17JCFe1AiEHiYq2CUJaoYdIpnqkGN+pyQDm3DbQ5we18A+ILI/nae4Mcf03xbb419zXECg VI10pl26iJKRzo/nVB3ePPcYvm1fB/vE3T5+9qleEExusqKpAT7RhYE9qLW9bEI1jmKuGxgxI VagzWhyK/88D0lthG7JiVQqV10nJ0RfEQxtTKHK6/NdItMQ1cgSAG6lsjk1RBbtMKpXdgy2Zz DHSrq8hHYtl3oDfSpshJQH9AKr9f3Q+BvUCfKgnrHdwxcAPRg0UcAMXOQB/ivcTDAwiqD0boM oniYCKDOizK1fsGBIpwR2Qzj2+MyYXbExKfCpJOWEq+zz+rQhIZa9Inq2+ghTiSOpA6Zl1vEP pleUWodUUTJKvBzCFMgGa2P2avTwMPX9dpK7/OMFo2THN8Rie4YGeVnDoD0+wadQF4Zb5n/Gd 4AR3cbSZv5G8GEzdoom0qO4tdrDHmsLXj7DMCkm1ApJSH5bxSdLNxprKGrEa3PA5JtrS3mErZ XJKce/mQzoa2wb16/Z4+hoYqnJJph24w8xxazaZ8KLShS5aCP3oyM0tJRTF7j1+/oY5reucCq eziWVqaSkaAh0NSSvUhL7uRtWRBOaldxmcWP3BkO81J1lbEpsY4UZC6kg5N4VjmpvRtSo3ppI s659Ckp0r98UgOjwJTycZbWqy3UCgCpYBO4u194cInEEIdwYf2ep5Kni7vbNylgJTwR1KkFS3 PppOIOnFU2gkKuMHdYVBnAclQ+nkgX6p4gioYUw2yRzCtpobW1HRkq7Y3XLFz8WMf6Ytu+KYY gfU2h/MsIDgP7DHnuiVhEefHRWF2aYntydnKmAWVgjqMWnBEpbEmhSeYRAIJ6gOyIPzM3/vyU LTo3v7iPvC/B2aj0gtvxVrhHMiJ8gEmLMBZTk6K5Qhsk++n6B9P0QzW1zwioFLLemgRExzGKm kajxxsEmHLrTwHJ9wXTCL7pk3uMwz3l7njNWZWSD7VqkKVKBmdr/ZgXlB1u3ZNBttwBS6T8y6 zKgq+ivSJnhRDHY5rXY+mTTsPQwiPUwIuYX7irzVUm+axJUrjxcOmTH781Gufe4sFXyE/SwcW lVWRCJIbz273/uAYiYEbSYQqQXTO04isUzv4UAVFqtTi0+3PCkTHc4HeqvhqISo6h1UYt4ES+ RMCWJu63IostQ5x+cWp1AUqo3+IVI3CSdZ32U+ttRsAuAadsY7bM+RWU7vFxd1psGq9TOfsbB d6F6S6zxZnzbXaDqIDePYW5vq8XbwHbEy4IrX08y2KIhhYYLo7Z8VCB9HQL8LEM+M/+QS6Nvj 7qaPDPjYVQP83hio4fqjcpy8N9JGhN8Kdvnjmb7rNYFeWdULLXT0elK+V7vOkiOLzaqZqSMq5 tk9uZub0wFlVmQ5QUfog1E3iTckzSQr+bG11GUj+3TlJ5BXs7UOiC9L4Yyhw/3z5pcOTnNbCd nzpn4kxQ3Viae6Gs3w7ZIwNM45hKCmmq+bw8qPJ8T4a5h9lQuqwzYREXgUhjIMWf/kWRkISNF o0mLsaoYdWrhxFox/kY2mLKlVqQo0rAOwk313afjnOaycP9MDLh864p7Mdfte3JzIJg/WMNTI Z55vno6KByNsZRBHr1lbcs+KlY9KBqTkPsZq5cLhMAY9T7mkM1BkLsdwFo87KTDut2YAVVq1T eXVGpiLzE9F05b2i4dLMWhB7H6W8+M/7PSDy5uxM07Yv5WZtmk20fUDkn73VZvv0vpw1jbdHN PVsCkx07NWUmeIEaePGiFUI2Gr9Rmg2XZNhb7yYL0DbIi5oVeI5byKACwsGKgVjOv1YPStGXm +g2WwAqTFHF9c84aWRVQFhJVeVC79bqXx42vuhAoD2Fh1RWnHlF2gZWhSS7Sej9P5ln0gA7Ja jMNJ8kZxQGRCAkF5Iq57RjVPJskk7pFrsCmqpWoeAs122ntS4sEOqPKoLrmxAbqU42u1SEI4Y yn6krbvgdX224qWFUKnPgKxpKwa6VTDc4NogzeH89gygb/Pq5a1/09vGLmCDc30UiO+Vq+TZK 6Nf5Dt9+pFQDUG+kLXezH7md0baT1/RzBh0+d5I5M6x5fjxAlrDNYQsNRzazQhxMlJvuYLSCw lS2lXS+vcjoYNZulwhCOMmT7StBoxddnHNwohfCM7fe33Pd1eUI3MvchFkzcSk9ZEE6f0YQic SrGr8DNaq09JHSdBDahJA3jxxsG1b/AO9j2qZGcaMhnu7S//c+8b17hoofQhErAMvT1Bw6vqX gJHbVWu+ofmW7hp2BHAf6RedY9/C4tWzAjfQ8A93PLkaaJGhQcHE0Pvp6kx/tzAALfE3auOGx MYwgMQPxnHEe2+jymyf4uDk7DXVjTSSpj9ycOSPzKzIKQdBdTIYH1uvDKO5ue1Am4EAEXoTL+ UK0bzGYTqUN7AgcRJhGnzxXmRqChx+o1VQ7w/W0+VPq6Nbw== X-UI-Loop:V01:2ANX4EcJPyM=:VtHSqoi1HCvD1J3vwQxwiOhHjWsXs+1iV1dnu+gYdww= X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:psrjKXkTTC0=:kaOWhwwkli6XGgh0jeNDFz u7MPbhgr2cRt2d8cOz3sgf+QlMd5Jjc7OQCJfuL2ijDjIGhHScUOjiYGu86o1EBkjFQ43WqyC iedZR99TlDOnMYszxcICmzWMwkHeMgEk155pNUrH6+7tmUaPWltadHm/Y7qFFsrcQjJEq/Wgk JZ0VbU4TNdxViIRAhLg/NnTGOQr7bjC4T/GHb5zQtFf+IDuq2ro2JNbziJPxlPK54b2OWQyUJ IiCtl8cvCjL9zxERIriveS85l2rlplNAaKdQh5wYpdOJTK6hfDZJg4HfbJ8aFJAFVkAlKYNuk P/DAFdh6u7mp6FhbFMZj6trh2141YMvLA2pXvfpQl7FNuIeBB8RjnbsRxtabp/8l7P+W9UyDU ys07kUwEGJx15Q0ZGs26U8aO5N2+KB0vhYQIj9/D2Zl3j5d2rdEYRjLVFCB83e0oN6iJXVE/g 17/BgiX0SKRoyjmEbjGBd7a+JAvVJDM= X-Scanned-By: MIMEDefang 2.78 on 81.169.212.23 Status: R X-Status: X-Keywords: X-UID: 8207 On 16/03/2020 22:54, Kelly Smith wrote: >> As format-building is all about saving time for 'normal' runs, I'm not >> seeing there is a massive need to speed up the process. I know there is >> one engine in development that doesn't use format files, so that might >> be a place to consider things, but I think we'd need a strong case to >> alter the approach for XeTeX/LuaTeX (pdfTeX, ...). >> >> Joseph > > Sorry, I should’ve clarified: the point of preprocessing the data wouldn’t > be to speed up anything, instead, the point would be to do complex > processing that would be very difficult or even impossible in LaTeX. > > For example, if the l3regex module were extended so that precompiled > regexes could be used as parts of other regexes, then Unicode properties > could be simply implemented by referring to precompiled regexes whose > content was created by running filters over the Unicode character database. > > Another example would be processing the very complex XML files that are > used in supplementary Unicode files, like the Common Locale Data Repository, > which could help with localization and language-specific date and number > parsing/formatting. > > This idea of preprocessing could be applied to any complex data set that > LaTeX3 may need to work with, but I used the example of Unicode data > because that’s the one that immediately came to mind. When there is a need to do complex pre-processing, Lua is the obvious way to go nowadays, and that's reflected in a number of scripts. The issue tends to be not what one does to script extraction, but rather the target at the TeX end. For example, on the regex idea, I suspect performance would be the major concern: regex processing is already a lot of work, and I have a feeling Bruno would want to optimise how the data were stored inside TeX. (I'm not sure where the balance between data extraction and storage lies here.) Similarly, a lot of work is done by Javier for babel using the CLDR, but many of the outcomes are not amendable to scripting: it's about how you set up the TeX 'just right'. So whilst there is no reason not to use Lua when it works, at the moment there are not pressing areas where the pre-processing is the barrier. Joseph