Commit Graph

869 Commits

Author SHA1 Message Date
Emmanuel Engelhart
950e742116 No metalink file on fs 2021-05-04 13:15:43 +02:00
Veloman Yunkan
3879b82112 const-correct kiwix::Library
- Made most methods of kiwix::Library const.
- Also added const versions of getBookById() and getBookByPath()
  methods.
2021-04-28 11:42:55 +04:00
Veloman Yunkan
63e9a09259 Cleaned up/beautified Library::updateBookDB() 2021-04-27 16:59:21 +04:00
Veloman Yunkan
4178c169dd Xapian documents in book DB store only the book id 2021-04-27 16:59:21 +04:00
Veloman Yunkan
f751aff2fb Full case/diacritics insensitivity in catalog filtering
Catalog filtering should now be case/diacritics insensitive for all
fields. However it is not validated for language, name and category
fields, and is validated for tags, creator & publisher only for text
supplied in the filter (but not for values read from the book).
2021-04-27 16:59:21 +04:00
Veloman Yunkan
87dc9d2723 Made catalog filtering by query diacritics insensitive
Catalog filtering by titles/description was sensitive to diacritics
present in the query string. Fixed that.

Also enhanced the unit test to validate the insensitivity to diacritics
present in either the title/description or the query string.
2021-04-27 16:59:21 +04:00
Veloman Yunkan
9c7366890d Catalog filtering by tags works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan
19e195cb7d Filter::Tags typedef 2021-04-27 16:59:21 +04:00
Veloman Yunkan
3d5fd8f585 Catalog filtering by creator works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan
d3d5abe14d Handling of non-words in publisher query
This change fixes the failure of the LibraryTest.filterByPublisher
unit-test broken by the previous commit.

The previous approach used in `publisherQuery()` for building a phrase
query enforcing the specified prefix for all terms fails if

1. the input phrase contains a non-word term that Xapian's query parser
   doesn't like (e.g. a standalone ampersand character, 1/2, a#1, etc);
2. the input phrase contains at least three terms that Xapian's query
   parser has no issue with.

Using the `quest` tool (coming with xapian-tools under Ubuntu) the
issue can be demonstrated as follows:

```
$ quest -o phrase -d some_xapian_db "Energy & security"
Parsed Query: Query((energy@1 PHRASE 11 Zsecur@2))
Exactly 0 matches
MSet:

$ quest -o phrase -d some_xapian_db "Energy & security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

$ quest -o phrase -d some_xapian_db 'Energy 1/2 security act'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

$ quest -o phrase -d some_xapian_db "Energy a#1 security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
```

The problem comes from parsing the query with the default operation set
to `OP_PHRASE` (exemplified by the `-o phrase` option in above
invocations of `quest`). A workaround is to parse the phrase with a
default operation of `OP_OR` and then combine all the terms with
`OP_PHRASE`.

Besides stemming should be disabled in order to target an exact phrase
match (save for the non-word terms, if any, that are ignored by the
query parser).
2021-04-27 16:59:21 +04:00
Veloman Yunkan
a759ab989f Catalog filtering by publisher works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan
7ccd9ffcce Catalog filtering by language works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan
0c0a37073b Catalog filtering by category works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan
415c65cf03 Catalog filtering by book name works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan
8287f351e7 Final logic of Library::filterViaBookDB()
Moved the `filter.hasQuery()` check inside `buildXapianQuery()`.
`Library::filterViaBookDB()` only cares if the query that is going to be
run on the book DB would match all documents. The rest of changes
related to enhancing the usage of Xapian for the catalog search will
happen inside `buildXapianQuery()` and `updateBookDB()`.
2021-04-27 16:59:21 +04:00
Veloman Yunkan
ea779ac200 Extracted buildXapianQuery() 2021-04-27 16:59:21 +04:00
Veloman Yunkan
80cd1fc989 Renamed 2 functions in Filter and Library 2021-04-27 16:59:21 +04:00
Veloman Yunkan
2d76f8395e Dropped unused functions from Filter's private API
This should have been done back in PR #460
2021-04-27 16:59:21 +04:00
Manan Jethwani
965b9622c2 removed redirect to articles in search 2021-04-20 20:23:42 +05:30
Veloman Yunkan
9d4370403b get_url() was renamed in zim::search_iterator 2021-04-16 13:30:36 +04:00
Vertigo
611146aa37 Added Search Link for bad bookName/articleName on 404 2021-04-12 21:31:47 +05:30
Veloman Yunkan
b54215f146 Manager::readOpds() doesn't modify its input 2021-04-12 15:14:12 +02:00
Veloman Yunkan
9033f2f28e Manager::readXml() doesn't modify its input 2021-04-12 15:14:12 +02:00
Veloman Yunkan
ec9186b174 Library::removeBookById() updates the search DB
This fix makes the `XmlLibraryTest.removeBookByIdUpdatesTheSearchDB`
unit-test pass.
2021-04-09 17:06:45 +04:00
Veloman Yunkan
aaaa5a637e Library::filter() doesn't create empty books
This changes how the `XmlLibraryTest.removeBookByIdUpdatesTheSearchDB`
unit-test fails.
2021-04-09 17:06:45 +04:00
Veloman Yunkan
24ed96a38c Library.removeBookById() drops the reader too
This fix makes the `XmlLibraryTest.removeBookByIdDropsTheReader`
unit-test pass.
2021-04-09 17:05:56 +04:00
Manan Jethwani
5cb276a933 adding kind and path attributes to suggest response object and using it in autocomplete 2021-04-07 21:04:33 +05:30
Veloman Yunkan
aa2a031ba4 Xapian headers are not exposed through libkiwix 2021-04-07 18:24:33 +04:00
Manan Jethwani
7872734f44 changed method of injecting root link 2021-03-24 14:17:58 +05:30
Manan Jethwani
c557bb271b injecting root link directly and renamed head_part to head_taskbar 2021-03-24 02:10:16 +05:30
Manan Jethwani
93264f7409 added root functionality for block external link feature 2021-03-23 03:17:14 +05:30
Veloman Yunkan
e214efecd4 Language code conversion via ICU
Language code is converted from ISO 639-3 to ISO 639 (which is
understood by Xapian) via ICU. The previous approach via an explicit
map had its advantages since Xapian has more than one stemmer
implementations for some languages (selectable via Xapian-specific
identifiers). This commit relies on the defaults associated with the
ISO 639 language codes.
2021-03-17 14:32:03 +01:00
Veloman Yunkan
09233bf4f3 Support for partial queries in catalog search
The search text in the catalog query is interpreted as partial by
default, but partial query mode can be disabled in C++. The latter
possibility is not exposed via the /catalog/search kiwix-serve endpoint,
though.
2021-03-17 14:32:03 +01:00
Veloman Yunkan
a599fb3892 Initial version of Xapian-based catalog search 2021-03-17 14:32:03 +01:00
Veloman Yunkan
a17fc0ef2d Library::getBooksByTitleOrDescription() 2021-03-17 14:32:03 +01:00
Veloman Yunkan
db06b2c7ca Library::BookIdCollection typedef 2021-03-17 14:32:03 +01:00
Veloman Yunkan
a20f9e2ce1 Library::filter() works in two stages
1. Get the subset of books matching the q (title/description) parameter
   of the search

2. Filter out books not matching the other parameters of the search.

Stage 1. currently works in the old way, but will be replaced by Xapian
based search in subsequent commits.
2021-03-17 14:32:03 +01:00
Veloman Yunkan
b7b0bdbdd8 Both Book::update() methods update the category 2021-03-17 14:10:57 +04:00
Veloman Yunkan
4abc4f8518 Support for book category attribute in library.xml 2021-03-17 14:10:57 +04:00
Veloman Yunkan
6b2067c236 Reading category element from OPDS stream 2021-03-17 14:10:57 +04:00
Veloman Yunkan
e55bf514e8 Dedicated 'category' parameter in catalog search 2021-03-17 14:10:57 +04:00
Veloman Yunkan
80d4f7e349 Extracted InternalServer::search_catalog() 2021-03-17 14:10:57 +04:00
Veloman Yunkan
58186ffb26 kiwix::Book::getCategory() 2021-03-17 14:09:48 +04:00
Veloman Yunkan
ae32ff40c0 Dropped an extra colon from book <updated> dates 2021-03-17 14:02:27 +04:00
Veloman Yunkan
26331b401e Fixed the month in OPDS feed <updated> date
`tm::tm_mon` varies in the [0, 11] range.
2021-03-17 14:02:27 +04:00
Matthieu Gautier
67caae6c32 Use the new libzim's getRandomEntry instead of implementing it ourselves. 2021-03-02 14:16:09 +01:00
Veloman Yunkan
839fc10a4f Fixed the Windows build
Opening ZIM archives by file descriptor (as well as embedded
ZIM archives) is not supported under Windows.
2021-02-10 14:19:47 +01:00
Veloman Yunkan
5a8b825c70 Testing of JNIKiwixReader.getDirectAccessInformation() 2021-02-10 14:19:47 +01:00
Veloman Yunkan
7a465e66d7 Renamed org.kiwix.kiwixlib.{Pair->DirectAccessInfo} 2021-02-10 14:19:47 +01:00
Veloman Yunkan
5a99634dfd Java wrapper test checks favicon.png too 2021-02-10 14:19:47 +01:00