by inserting zero whitespace between their characters to help the existing sqlite FTS tokenizers to split them up.
We have considered splitting them up only at word boundaries, but after consulting native speakers decided to do splitting by chars instead.
Doing this is a hack, but due to the limitations of tokenizers currently available with sqlite, we saw no better solution. While the ICU tokenizer is available as well, it doesn't handle diacritics in other languages.
The zero whitespace is added to zh, ja and ko locales when saving their text to the database. It happens for app names, summaries and descriptions either when loading a full index or when applying diffs. Tests have been added for both cases.
The structure of the JSON (FeatureV2) and of our internal class AppManifest is different. The latter uses a list of strings instead of objects. The ReflectionDiffer didn't handle this different and was throwing an exception when diffing changing features.
However, the impact of this bug should be small as normally one version has an ID which is its SHA256 hash and thus its features shouldn't change over repo updates. Imaginable is erratic repo creation software though.
This affects anti-features and categories. Reflection diffing has been made more robust in the process with the earlier FileV2 hack removed and better error messages.
This refactors the library so that Downloaders receive the IndexFile directly so that they get access to the IPFS CID, but also to the SHA256 hash and the file size. Mirrors can now be marked as IPFS gateways.
and remove sharedTest symlink hack. The shared tests are now a proper gradle module to avoid issues with using the same source files in different modules.