Return-Path: Delivered-To: rainer Received: from h2774747.stratoserver.net (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) by h2774747.stratoserver.net (Dovecot) with LMTP id Hi3NDz24b16gdgAA4+3H6A for ; Mon, 16 Mar 2020 18:32:45 +0100 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by h2774747.stratoserver.net (8.15.2/8.15.2/Debian-3) with ESMTP id 02GHWg4N030366 for ; Mon, 16 Mar 2020 18:32:45 +0100 Received: from relay2.uni-heidelberg.de ([129.206.119.212]) by mx-ha.gmx.net (mxgmx117 [212.227.17.5]) with ESMTP (Nemesis) id 1MspAO-1jTDbA3P68-00t56V for ; Mon, 16 Mar 2020 18:32:36 +0100 Received: from listserv.uni-heidelberg.de ([129.206.100.94]) by relay2.uni-heidelberg.de with ESMTP; 16 Mar 2020 18:32:37 +0100 Received: from listserv (localhost [127.0.0.1]) by listserv.uni-heidelberg.de (Postfix) with ESMTP id 67956127E31; Mon, 16 Mar 2020 18:32:32 +0100 (CET) Received: by LISTSERV.UNI-HEIDELBERG.DE (LISTSERV-TCP/IP release 17.0) with spool id 42648311 for LATEX-L@LISTSERV.UNI-HEIDELBERG.DE; Mon, 16 Mar 2020 18:32:32 +0100 Delivered-To: LATEX-L@listserv.uni-heidelberg.de Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by listserv.uni-heidelberg.de (Postfix) with ESMTP id 46467127831 for ; Mon, 16 Mar 2020 18:32:32 +0100 (CET) X-IronPort-MID: 59132242 X-IronPort-RemoteIP: 209.85.221.49 X-IronPort-SenderGroup: UNKNOWNLIST X-IronPort-MailFlowPolicy: $ACCEPTED X-IronPort-Reputation: 2.8 IronPort-PHdr: =?us-ascii?q?9a23=3AkTg/Ax+lJQYkZP9uRHGN80YQeigqvan1NQcJ65?= =?us-ascii?q?0hzohDabmn44+7bBON//hphU6PQIParvJJiubbt6rtQ2NG7ZvS+GsafslqUB?= =?us-ascii?q?kIwd4TgxRmGNSMXE/8N/Pyb2olBsVNVENN+nahN0lTF93ibkeI5Ha1qyMRSV?= =?us-ascii?q?3kLQQgAOPzF8bJitivkeW7+pndeQJN0Ti6er5qLFOptwTettU+hIx4LKc3xQ?= =?us-ascii?q?DVq2ESPe9Rg35rdhqIhxipwMC28dZ49jhI/fIs88kVSaLhY6ExVqBVFhwjOm?= =?us-ascii?q?ExodLx7FzNEVrJ6XwbXWEb1BFPBlGfviv3VZrwrCb289FF9nLGZZ/QSrY5EX?= =?us-ascii?q?S44qNiWVn1jiZCNzM4/GzTgMFqg+RXrUDpoRs32IPSbIyPUZg2NqrAYdMXQ3?= =?us-ascii?q?ZAVcdNRmRABI27dY4GE+sGO65RsYD8o1IEqRb2CxOrAavjzTpBh3m+2qNfsa?= =?us-ascii?q?xpEwbd0RQgWc0UsXvUsP3wM70UUOa617XB12+FZPQQxDS8oInEfxY9oO2dCK?= =?us-ascii?q?pqeJm0qwFnHAfEg1OM7I39amnNh6Jd7i7BtbMmCbP8w3QqoAxwvDW1k90hjo?= =?us-ascii?q?DYwJ8TyxXE+CR1zYI/KMe3DkJ2MrvGWNNdsT+XM4xuT4YsWWZt7Ww4zaYLoZ?= =?us-ascii?q?79YzILx5Q77xvYcfqGdYeT/hv5DqCaJnFlhzg2HdD3zwb36kWmxuDmA4O210?= =?us-ascii?q?1HtioDjcPNuXAR/xnU8M6GTPRm4k67g3CE0EbO6asXRCJ83bqeIJknzLkqk5?= =?us-ascii?q?MVukmWBS76lnL9i6qOf1kl8Oylsr61WLjtq56COoMxsTnQafhzyOq4B+lwch?= =?us-ascii?q?IDW2GKv/m61fjj8Ez1TbFAg+c51KXU4tjcIowAq6i1DhUwsM5r4gujDzqgzN?= =?us-ascii?q?UTnGUWZFNDdhWdioH1OlbIaPnmBPa7il6onX9l3ffDdrHmB5zMKDDEntKDNf?= =?us-ascii?q?5/7FVR0Ap10spe6JJIIrQGOvTyXEDqqNXCVFkyNEqpwKevCdlw0J8fRXPaB6?= =?us-ascii?q?aYNK3Itlrbg4Bna+KIZYITpHP8M61/v6+o3SJ/wwVEO/X4jtMNZXu1H+prOR?= =?us-ascii?q?CDbH7lk5EcF2ZPvgc/QOHuglCYXnhfaiXXPep06zclBYahFYqGSJqqhenL3i?= =?us-ascii?q?CgE4VWIH9cC1+MDV/tcJ+CWvEDcz6bOJUnmTtCSLvrGOpDnVm+8RT3zbZqNL?= =?us-ascii?q?+e+CoDuIrjksdv5uLTiTk38iZyDsKSyH2EVTsylWROWjx8j8Ud6QRtj1yE16?= =?us-ascii?q?Z/mflRE9desuhIXgkNPpnZ1+VmCtr2V1ubLO2EQ1unXNiqRAoJYIltno0oZE?= =?us-ascii?q?BwU5W4gxTKzmy3Bb5Ql7GKCJEw+K/G0D79KpQ1zXGOz6Qng1Q8J6kHfWS7mq?= =?us-ascii?q?5y8RTSDI/Vgg2Ykaitb6EVwC/K8i+K02OPuEhSVAM4X7/CWDgTYU7frNKx4U?= =?us-ascii?q?2nLffmEbM8Lg5I0tKPMINPY9ztyEpcHbLtZIWYbGW2lGO9Qx2Pw/LEbYbnfX?= =?us-ascii?q?kcwDSICEUAlFN2nz7OPgw/CyG95mPGWWY2RBS/PgW2q7A49CLoBlU5xAyLcU?= =?us-ascii?q?B7gqC4/hcEw+GaSrYT07MItSMrpi97WlqwjLe0Q5KNoRRseKJEbJYz+lBCgC?= =?us-ascii?q?jbsBdwJpzmMLpjiV4CWwFwpUXv0RpsFoxay44hqzU3z0AhTMDQmEMEbD6e0Z?= =?us-ascii?q?3qb/fPLXLu+Rm0d6PM8lTX0dLT5b1Wrfpk+hPsuwamEkdk+HJimYowsTPU9t?= =?us-ascii?q?DBCwwcVoj0W0A8+k1hpr3UVSI64pvdyXxmNaTt7m3ynukxDe5g8S6OOs9FOf?= =?us-ascii?q?ncRgr1CcsbAce1NOUwwR6iaVQZP7IKrf9mD4adb/KDnZWTEqNllTOig35A5d?= =?us-ascii?q?olgEeL7Sx9TOHTw58fmbeT2U2aVGWk1QrzgoXMgYlBIAoqMC+/xCzjXtMDY6?= =?us-ascii?q?RzecMUEz7rLZHmgNp5gJHpVjhT81vxX14=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0A8BAAGt29egDHdVdFlgkOCIoFGIwQ?= =?us-ascii?q?1hBaPCYIRmmoDVAkBAQEBAQEBAQEHAS8EAQGEQwKCIRwGBjUFDQIDAQEBAwI?= =?us-ascii?q?FAQEFAQEBAgECAwQUAQEJDQkIJ4VfDIVkAQEBAwEBEBEPAQUIAQE4DwsYAgI?= =?us-ascii?q?mAgI0AQUBHAUBDQgBAR6DBIJ8oC+BBD2LKIEygn8BAQWCRINTgTUJCQEIfCq?= =?us-ascii?q?FIIcOD4FMP4ERJw+CMC4+h1yCXo4ooSN2B4I/eASVbyOCSow+DQWMKZBSkBa?= =?us-ascii?q?JdwIKBwYPI4FHgXkzGjSDK1AYDY1GYxeDUIpVQTMCjxYBAQ?= X-IronPort-AV: E=Sophos;i="5.70,561,1574118000"; d="scan'208";a="59132242" X-MGA-submission: =?us-ascii?q?MDGB/DBDSe6m0M2EbomSZoh942OzKjM2C0RW6E?= =?us-ascii?q?oYgEiZ45K9nMj/2msyu3FZR8YLCyv8mfqcHyksUKOaS8j1vXBrjiAVsn?= =?us-ascii?q?KzK4D0LZPUOmUlfBfbIfw6gzBHQCTKbMU95rdXHHiLlACH5Upt7nOUem?= =?us-ascii?q?BMVz+qSkh07ameIfWiKaPhQw=3D=3D?= Received: from mail-wr1-f49.google.com ([209.85.221.49]) by relay.uni-heidelberg.de with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 16 Mar 2020 18:32:32 +0100 Received: by mail-wr1-f49.google.com with SMTP id s5so22359467wrg.3 for ; Mon, 16 Mar 2020 10:32:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Y2KwVRdDqgL426Je9fi7uBn6FPBND+38SufxHcH1qNU=; b=pYpkgctuarSiv88WbrXxy73I0elK7iAS7WMHn7kBjVuEECSK2fye40ivJAPeo7rAyC YHrLBszoMccMwIwdeOQtC25vwBLNxx7WJYD8tdc143bPdZlBg3tGIZ5RmE61fr5O/nOC rz14toniOipc2fV0n6lm5253mGmFm1xckfKF+aTBLE67IkopVcj/my/lhx1UOthsN3Gq a7etHymy5OBl/jEWdKFZqVe7TqxJ7GCIEz7pl9Z7uchygnj2wYT3mf3BoOn2lSAqzgxw gj8GHqOjCYGkxaLHi5SOQ11V75rxKDb6jgGFgwAQnjoJ2tViNEaWH9I6zHDBBrXckgse cMXQ== X-Gm-Message-State: ANhLgQ2MqZqoSU+ceSrPT48SKYrvEYtusoAQvpsjY/NoUIZsbTaVMM80 vZTdmrndQZuCm7WDkmeL1zrgeqMcNA4= X-Google-Smtp-Source: ADFU+vu+tn0Ru9ladgvD+duQRY8zvCWPJSAye+jfExRQwR4krnK2rP53MQbLR+KP3ZttGGiDkUXRTA== X-Received: by 2002:a05:6000:1186:: with SMTP id g6mr408018wrx.331.1584379950797; Mon, 16 Mar 2020 10:32:30 -0700 (PDT) Received: from [139.222.115.64] (ueaczc002bk88.uea.ac.uk. [139.222.115.64]) by smtp.gmail.com with ESMTPSA id k12sm848052wrm.26.2020.03.16.10.32.29 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Mar 2020 10:32:30 -0700 (PDT) References: <3939613963363954.WA.kellysmith12.21gmail.com@listserv.uni-heidelberg.de> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Message-ID: <924058d3-d3a3-7e40-d799-fe0bc0e20ffb@morningstar2.co.uk> Date: Mon, 16 Mar 2020 17:32:29 +0000 Reply-To: Mailing list for the LaTeX3 project Sender: Mailing list for the LaTeX3 project From: Joseph Wright Subject: Re: Using Lua to Preprocess Unicode Data To: LATEX-L@LISTSERV.UNI-HEIDELBERG.DE In-Reply-To: <3939613963363954.WA.kellysmith12.21gmail.com@listserv.uni-heidelberg.de> Precedence: list List-Help: , List-Unsubscribe: List-Subscribe: List-Owner: List-Archive: Envelope-To: X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3; X-Spam-Flag: NO X-UI-Filterresults: notjunk:1;V03:K0:j/dkG5sO8bU=:XQ2l1TUr3l2rjEgFyMCM4YXwVx ijzjkWDTQFwhU1SXvWoB0YkIWhFTsb+x6gDa8Gde7pDgHQMLOhU8eqb6uhDgWGO6Mwqa6yB80 vImw/k0+UztAPryKbyeoZFMUDpFmH17gfdizBBRPlF35poRUuwFv8WN5MDYdbdkV/MCLJQ1Cj YNatPVgTGNTvomaxEKkvTXPE9V/6awbaOkomAhPbxq7Q4hPfXEKrWazgogqabM+DZvm+hEtzS ueU8nbMC5LVtLbfUtKRVqH4ilbM3gK0wzB3vfYFeyj91KNRuUGtmClJYnXYSSdTRLTzau+Z69 NVwBi3qMAYiCxFn+glW/fm/nIATvx7p3QDwHmtFHzxc+gafGXkuVkB79q2kYWajdEHj3Piy37 TmopN8ddwZZ8nspYbcWTnVo3NJ9WY8X5hCFt1lGIajttmL4m5sKV3xE6jalJK0lDgoz7Y7XlO VRcbiM5jgW2yZ8wDjGGSRjPRwg9pt40j+TFLurNrRbDq4mVAjemspTy7UTKm/qOTDgqih3bHv 7wRPGe7GWG+UVN2JJd+8/oVDocCaPI6iXUfIvTCnHTzCXnNc3FpoEZ3su1VeO+K/eGbeUDK6B 2hynPv8VW0o6aVRXCpekNBUyqDGdhJQANHjTQkMykKG9cMzy6d4Q7/qpR4tS47KaNriNvyDBw G8CHNREOO6QMTnonbnDkPXkt4ryodb/qstaBwwXzuWyk+se2T8HjIrYHpf91RHMaxrYhC46fx IIVwCzOwrb/Wx7q6Vcj2uf7G8ih8LIoGVVeI9LWC//sd4JObxz23OdmRY9/5hUauK0xdQ5Zem oQVd6UPWFtgcWqI5dpHYj97zo6abZbYRC9jyNXUP5+eVyk5aqmJWnjzg3MoTutVGrHG++03Gd Ti5lNCz5m0IWGS7trbkDj75+O4z6Y5UfQTcD7BKSq2nScgjpnbSOE4/+nM6YOogzVfHuMI69H 3nhfViONbrk9JhAq3jXzQzsJCHl/Djn1tm/UnFp2Zs8BN9qHzqth2p7wUgQDt1vdIAw1h/oi/ XX7zIgihTrYSnclntc9QfusiujHBovq7cKg1hmLc8YaAkU2yiPO8TlxwqMimOIABsrPGEROT3 H9WwpHZzMRoJ1/d/qTK3424fB8yFIaRPn+5t6wIhi5khggihc40nxeaLNvLF9rJutU7gU9NK7 ZrLCW8c7oAt9au1/igbN9TsOU7cQUbqIUiad9GTidpnQf9qs8FyZrHQiU6sJPN4X+KgyzvKTo YCPpe+A+MMOI+68GTvMqi44yFgTCF1zOHyJqBDgPZakvHxVRRG7Zdb4T56CZdUaJLtOO1oafE KQyaT3F7hoH1ZUNoE2G4hujlN/pIMDFtHi7sbSgev5x56daO6WlVRYPTZkMNns90+dV3V07pq 8Pbz9UzDXIx2/Xrc6134UIJNZF0OxIbJO8Kl/yoALnBlOgeDMY69pQA0ZJm/lboj9Sk/o+koY 3HYCkIH4vrqRL2ygog7sK8tJoMWBHeIVvpNqcaikmjKtaWxNGkanczDu7btNqGP4XfXrRaLH7 52F1zM+U2ydHM3bn+TBTF1ZpEVWNrTBmvOd2jG/ZoC5BrqIrf/8iRHDMHCte8O7nlj1VZ+dp/ Tm4Fv7GlqPiP/m4mP1JYFfMqg6y5HouIf7LZFk7Lei5PoqlHBJk1K8yCBdzjC7QmMzFIIpIAB VholdJM1IAi7FyfTdxa+iazO0JecL6mU5luIwFIQv5eXnyRV0+tj/+1A4599XvKejel69Na1l IQvsLWlFyHxbCE1/5eH83SkXYLtwjR3O+vaZIoZu85OJL2BqrmvoXUNn6ZOzMExKWqgZorpbl mzCBmW1nHnzNcxBASrMZYZhkMb+szW7jvf8k0KjDapxRmPFbaToAPGHx0GS+oUymMj876io5j /sv1vm2Gvt8vYxUGVhrvm/p8+nOyo3Q5C/MoSDQkdl51sq1+H84XS7Wb20GyrAKwoT+pyBIAg Z3m9WZSWciwGhxG8oyGjaND7pSPC02kHLfHQ8lZvpub0ImZ/uhAn1uZ6TSvK4T8Nw7/5t19Yx EzsSUl094vkrySzvseDSx3JYXb8sTU3zWHB4GA1ZnF8MmjqZz/wfJ3VsqQNi7T14yQ4B4NV8i uH0th0ccZ3wEq2tnU//9g793+EJxi7fz1vjDefh00BvdW86QnBqesLyKLD1BtF82miuG2aDf0 Rt5KJbCM6cegzSE32BRXKfLBoARLg0utpLCoVQlMMv75CI4qsT2vz1XWMVJ3Q/lnkaPrRqcOM 9RFtcZjszkgZ0Yaa5kfP6crzX8h0jdqxwzpHY4AdltciZR0TwUdi1MQvTAcRK1ACOVsIh0CSs xUkjypRd22inyy6Bj9HCzopbw/WUq1knaAiJMu8Y4B6ZDzcWyyjh6AaKGlD3SWIgB+PT6uHc+ NDNp3hU0/Ev4zQZnNmJSBk8o82+8FPQkmEmhGfDHS0du9W+CFQThGWh1jfwojHcZS4fHJ9cnH 4Raa6utlKLaMWhhBuO1j/mNAZq67v7ymyBOJOOqoA4/A2/1+SQXy6gHONJ/R+XDUjs5iobGef TqH63P88NCDN9io9UkFNf1YM4rvQIpenfKbaRCIbCngXerwhgKssjaFCex8HtgjKE2jwcuNBO OL0u9gjsQAQvWnvvnf3geOeumraQ7KLx5wdhqa1YnfrcjT4+Vs/Pw+799o9K1LeXIISxx7aVx FLc3He7NdGWAPoPJCTUhVWmbDtsumpxLgsnqPq7hRSnticA== X-UI-Loop:V01:GRbUon4hhdU=:vYxdjSlGs5UEuO6YBsNvraFOS2amqnoZWza+bV/96bY= X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:7HXLyXyiSq0=:Wa9sejHyLAg3dGmWIAhmzx SlQUbVipCRLT39joF055Yn8lnmQk04pQSkY9yhkzIa72szPrTnS7UQMZsE+8Dn5pD6a7BCXky DCpW8wkWn5jDfBKdclQm1FOfuQQYkr3nfWPn8Ol4M+hjx6JvPt+X0saAmLib88bVfTyjWyIBz Yq4co/yQe5UlOuSnDW0KRYtWrf1iaODyONlcGW7ELqBvq/VwFL8RUjORo8byjnl0loSmwazcP RnreumRaJx6Hvo7o+BClS/MA65huGZxGSLJ87R8/v/3dI7STz1tRtc7ciZcphpAPOnhHY+4DW QZrwFWDHOEqsYfnXtV3w5mbWZhY/XpOywiDKuJLbLhZ6nXlwRQlQsryj7Qgaz6fE5hp6THNJf DUZef7KT42UUaThKz5ERyX/CmCeW6hw7PmPxeNtt4BOIxjJ2I+Vx0xZlHx2FWrSJd3kYV3oKO 3KIJyxGQ64q5kiJfsPItPzXWKfcZGCI= X-Scanned-By: MIMEDefang 2.78 on 81.169.212.23 Status: R X-Status: X-Keywords: X-UID: 8203 On 16/03/2020 17:01, Kelly Smith wrote: > Hello! > > I’ve been thinking: since Lua is already involved in the build process, > by way of l3build, wouldn’t it be reasonable to use a lua script > to preprocess Unicode data into forms that are easily consumed by LaTeX > during the format-building process? > > Warmly, > Kelly > It depends on the outcome you are after. The original loading method for Unicode data in XeTeX was via a Perl script. That created a .tex file containing (for example) catcode data. To update the Unicode data, one had to run the Perl script, then send the processed files to CTAN. There were two issues. First, that meant that any change required active work to not only get the data from Unicode but also to manipulate it. Second, and more significant, it was *slower* than just reading the files in TeX. (This only became apparent when I wrote some test parsers.) Now, there is more data being loaded today than when I did that work, and some of it is in LuaTeX so could be done Lua-only. It's also possible that the Perl script was sub-optimal, or that as part of a general 'install' function the time would not really show. However, XeTeX needs the data, so one is still looking at having to explicitly pre-process in Lua. Moreover, most of the time taken for format-building is not about reading Unicode data. With LuaTeX, pre-loading expl3 does cut out a slight 'stall' when loading everything for case-changing, but having a LuaTeX and a XeTeX path separately is not attractive. The current set-up means that updating the Unicode files is just a question of copy-pasting the raw .txt files into a form that CTAN can accept. Pre-digesting still leaves us needing some way to co-ordinate between packages (format, luaotfload, expl3, specialist stuff), plus with having to do the explicit extraction. As format-building is all about saving time for 'normal' runs, I'm not seeing there is a massive need to speed up the process. I know there is one engine in development that doesn't use format files, so that might be a place to consider things, but I think we'd need a strong case to alter the approach for XeTeX/LuaTeX (pdfTeX, ...). Joseph