Files
fdroidclient/libs/sharedTest/src/main
Torsten Grote 23dde0bc9b [db] Better support search for CJK languages
by inserting zero whitespace between their characters to help the existing sqlite FTS tokenizers to split them up.

We have considered splitting them up only at word boundaries, but after consulting native speakers decided to do splitting by chars instead.

Doing this is a hack, but due to the limitations of tokenizers currently available with sqlite, we saw no better solution. While the ICU tokenizer is available as well, it doesn't handle diacritics in other languages.

The zero whitespace is added to zh, ja and ko locales when saving their text to the database. It happens for app names, summaries and descriptions either when loading a full index or when applying diffs. Tests have been added for both cases.
2025-11-04 08:50:51 -03:00
..