Korean Hangul can be represented in Unicode either as precomposed Hangul
syllables, or as sequences of alphabetic components called Jamo.
Syllables should occupy 2 cells (there are halfwidth variants at
U+FFA0..U+FFDF). A fully decomposed syllable consists of an initial
jamo (choseong - leading consonant - may be a filler U+115F), a medial
jamo (jungseong - vowel - may be a filler U+1160), and an optional final
jamo (jongseong - trailing consonant). Old Korean can have more than
one of each of those. In any case, to make the total width 2, we assign
width 2 to choseong, and 0 to jungseong and jongseong, which, absent a
context-aware wcswidth, will still break with Old Korean syllables with
more than one jamo for leading consonants.
This aligns with glibc:
commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76
Author: Thorsten Glaser <tg@mirbsd.de>
Date: Fri Jul 14 14:02:50 2017 +0200
Refresh generated charmap data and ChangeLog
[BZ #21750]
* charmaps/UTF-8: Refresh.
diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 04ef5ad071..9e05b4a652 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,17 @@
+2017-07-14 Thorsten Glaser <tg@mirbsd.de>
+
+ [BZ #21750]
+ * charmaps/UTF-8: Refresh.
+ * unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
+ * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
+ * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
+ * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
+ * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining.
+ [BZ #19852]
+ * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
+ UnicodeData lines so the latter have precedence; remove hack
+ to group output by EastAsianWidth ranges.
+
[ ... snip ...]
commit 6e540caa21616d5ec5511fafb22819204525138e
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jun 16 08:29:40 2020 +0200
Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
<UABE8> 0
<UABED> 0
<UAC00>...<UD7A3> 2
+<UD7B0>...<UD7C6> 0
+<UD7CB>...<UD7FB> 0
<UF900>...<UFA6D> 2
<UFA70>...<UFAD9> 2
<UFB1E> 0
This option is deprecated on Qt since 5.12. Konsole cmake minimal Qt
version set is 5.12, giving warning messages while making.
A regular expression is automatically optimized the first time it is
used now.
This reverts commit f96deb39aa.
This was anti-optimization.
QStringLiteral is a QString created at build time. Initialization of
QString with it has no overhead.
QLatin1String is 8 bit C string wrapper which needs run-time conversion
to 16 bit encoding used in QString.
Summary:
Since minimum Qt version is 5.9.7, code for older versions is not
needed.
Reviewers: #konsole, hindenburg
Reviewed By: #konsole, hindenburg
Subscribers: hindenburg, konsole-devel
Tags: #konsole
Differential Revision: https://phabricator.kde.org/D17746
Summary:
The uni2characterwidth tool, converts Unicode Character Database files
into character width lookup tables. It uses a template file to place
the tables in a source code file together with a function for finding
the width for specified character. It also allows to generate few forms
of lists with width data for debug and test purposes, or for future use
as a replacement of Unicode files.
Set `KONSOLE_BUILD_UNI2CHARACTERWIDTH` cmake flag to build the tool.
Use `--help` argument for more detailed usage.
There is a possibility to generate separate "width" for Ambiguous
characters. It can be used to add ability to configure the characters
width in Konsole settings.
The `example.template` file contains all possible named tags, and some
additional tags to show how to use them.
CCBUG: 396435
Depends on D15756
Test Plan:
Download files listed below from `11.0.0` and `emoji/11.0` directories
on `https://unicode.org/Public/`. You can also directly use URLs to the
files.
* UnicodeData.txt
* EastAsianWidth.txt
* emoji-data.txt
Generate any available list except compact-ranges (e.g. `details`):
```
uni2characterwidth \
-U UnicodeData.txt -A EastAsianWidth.txt -E emoji-data.txt \
-g details result.txt
```
The list should contain ranges for all possible widths
(-2, -1, 0, 1, 2). You can choose some characters with a width you know
and check how they were classified. -2 is a special non-standard width
for ambiguous characters, which can be overriden by adding `-a 1` or
`-a 2` parameter. With this flag, all ranges from -2 group should
disappear and become assigned to selected width (1 or 2).
Generate output using a template:
```
uni2characterwidth \
-U UnicodeData.txt -A EastAsianWidth.txt -E emoji-data.txt \
-g code,./template.example result.txt
```
Reviewers: #konsole, hindenburg
Reviewed By: #konsole, hindenburg
Subscribers: hindenburg, konsole-devel
Tags: #konsole
Differential Revision: https://phabricator.kde.org/D15757