Files
konsole/tools/uni2characterwidth/template.example
Mariusz Glebocki 5f32cb3c44 Add a tool for generating character width tables
Summary:
The uni2characterwidth tool, converts Unicode Character Database files
into character width lookup tables. It uses a template file to place
the tables in a source code file together with a function for finding
the width for specified character. It also allows to generate few forms
of lists with width data for debug and test purposes, or for future use
as a replacement of Unicode files.

Set `KONSOLE_BUILD_UNI2CHARACTERWIDTH` cmake flag to build the tool.
Use `--help` argument for more detailed usage.

There is a possibility to generate separate "width" for Ambiguous
characters. It can be used to add ability to configure the characters
width in Konsole settings.

The `example.template` file contains all possible named tags, and some
additional tags to show how to use them.

CCBUG: 396435

Depends on D15756

Test Plan:
Download files listed below from `11.0.0` and `emoji/11.0` directories
on `https://unicode.org/Public/`. You can also directly use URLs to the
files.

* UnicodeData.txt
* EastAsianWidth.txt
* emoji-data.txt

Generate any available list except compact-ranges (e.g. `details`):

```
uni2characterwidth \
    -U UnicodeData.txt  -A EastAsianWidth.txt  -E emoji-data.txt \
    -g details  result.txt
```

The list should contain ranges for all possible widths
(-2, -1, 0, 1, 2). You can choose some characters with a width you know
and check how they were classified. -2 is a special non-standard width
for ambiguous characters, which can be overriden by adding `-a 1` or
`-a 2` parameter. With this flag, all ranges from -2 group should
disappear and become assigned to selected width (1 or 2).

Generate output using a template:

```
uni2characterwidth \
    -U UnicodeData.txt  -A EastAsianWidth.txt  -E emoji-data.txt \
    -g code,./template.example  result.txt
```

Reviewers: #konsole, hindenburg

Reviewed By: #konsole, hindenburg

Subscribers: hindenburg, konsole-devel

Tags: #konsole

Differential Revision: https://phabricator.kde.org/D15757
2018-09-30 12:22:30 -04:00

78 lines
2.6 KiB
Plaintext

«*COMMENT:----------------------------------------------------------------------
Tags:
«*anything:comment where everything but closing sequence is allowed:anything*»
«NAME:any content, including other tags. \: have to be escaped. It is processed
using data passed from code() function under NAME key. It should contain other
tags, without them this text will be replaced with passed data or removed.»
«NAME» - like before, used when data should replace it, so content is
unnecessary
EXAMPLE:
data: Map{ "exampleA", Map{ { "Number", 42 }, { "String", "hello" } } }
template: «exampleA:number\: «Number», string\: «String»»
result: number: 42, string: hello
«» - empty anonymous element. Used in named elements which receive lists.
The element will be replaced with list item, and duplicated if
«:anonymous container. It should contain some elements which receive data.
The element will disappear when child element will not receive any value.
Useful to add suffixes/prefixes to data»
EXAMPLE:
data: Map{ "exampleB", Vector{ 1, 2, 3, 4, 5, 6, 7 } }
template: «exampleB:«:[«»] »»
result: [1] [2] [3] [4] [5] [6] [7]
data: Map{ "exampleC", Vector{ "a", "b", "c" } }
template: «exampleC:«:first = «»»«:, second = «»»«:, third = «»»«:, fourth = «»»»
result: first = a, second = b, third = c
«!fmt "XXX":a wrapper which sets printf-like format XXX for numbers and
strings inside it. Starts with %.»
«!repeat N:repeats contents inside N times.»
EXAMPLE:
data: Map{ "exampleD", Vector{ 1, 2, 3, 4, 10, 11, 12, 13 } }
template: «exampleD:«!fmt "%#.2x":«!repeat 3:«» »«»; »»
result: 0x01 0x02 0x03 0x04; 0x0a 0x0b 0x0c 0x0d;
D: «exampleD:«!fmt "%#.2x":«!repeat 3:«» »«»; »»
----------------------------------------------------------------------:COMMENT*»
For available data see code() function. Below are usage examples
Warning about generated file - putting "this is a generated file" text in a
template file could be misleading.
«gen-file-warning»
Command used to generate the file:
«cmdline»
Direct LUT - widths of the first 256 code points in direct access array:
{«!fmt "% d":«direct-lut:
«!repeat 32:«:«»,»»
»»}
Arrays with code point ranges for every width:
«ranges-luts:«:
«name» = {«!fmt "%#.6x":«ranges:
«!repeat 8:«:{«first»,«last»},»»
»»}
Number of elements in the array: «size»
»»
List of array names, sizes, and widths:
{«ranges-lut-list:
«:{«!fmt "% d":«width»», «!fmt "%-16s":«name»», «size»},»
»}
Number of elements in the array: «ranges-lut-list-size»;