mirror of
https://github.com/f-droid/fdroidclient.git
synced 2026-01-16 10:57:44 -05:00
The default `simple` tokenzier config used in Room/SQLite3 FTS4 makes only ASCII characters case insensitive. The SQLite FTS3/4 doc [1] says: > All uppercase characters within the ASCII range (Unicode codepoints less than 128), are transformed to their lowercase equivalents as part of the tokenization process. Thus, full-text queries are case-insensitive when using the simple tokenizer. > The remove_diacritics option may be set to "0", "1" or "2". The default value is "1". If it is set to "1" or "2", then diacritics are removed from Latin script characters as described above. However, if it is set to "1", then diacritics are not removed in the fairly uncommon case where a single unicode codepoint is used to represent a character with more that one diacritic. [...] This is technically a bug, but cannot be fixed without creating backwards compatibility problems. If this option is set to "2", then diacritics are correctly removed from all Latin characters. This change makes use of the intended behaviour by using the `unicode61` tokenizer with the `diacritics="0"` option to keep the behaviour similar to the current one. This replaces the previously used `simple` tokenizer. A migration is necessary to recreate the AppMetadataFts table to make use of the different tokenizer. [1] https://www.sqlite.org/fts3.html#tokenizer