This is the initial version of using nuspell for spelling correction,
which yet has to be tuned.
Note that libnuspell must be available as a dependency.
Xapian-based code for spelling correction is not deleted.
Under Windows and Packages CI workflows the previous approach to testing
that an existing spellings DB file is reused didn't work since it relied
on an auxiliary test ensuring that a spellings database cannot be
created in a read-only directory, whereas
1. under Windows a temporary directory couldn't be made read-only
(leading to the failure of the auxiliary test)
2. in the Packages workflow the build was run with root privileges
and the read-onliness of the target directory was ignored
(leading to the same failure).
So the test was rewritten to actually check the content of the target
directory as well as the modifications times of the target directory and
the database file.
Packages workflow jobs run under Ubuntu 22.04 Jammy and 24.04 Noble with
different versions of libxapian.so. So the spelling correction unit test
must adapt accordingly.
The path parameter of the SpellingsDB constructor has been changed to
denote the path of the cache directory where spellings databases for
different ZIM archive should be stored. The filename of the spellings
database is generated from the ZIM archive UUID and the current version
of the spellings database implementation.
The ZIM file test/data/spelling_correction_test.zim was generated using
the script test/data/create_zim_file_for_testing_spelling_correction
included in this commit.
Renamed contentServerUrl in LibraryDumper to contentAccessUrl to avoid
confusion with contentServerUrl concept at Server/InternalServer level
(roughly, contentAccessUrl = contentServerUrl + "/content").
Running kiwix-serve without --catalogOnly and
--urlRootLocation resulted in non-clickable book tiles since
empty root was confused with empty contentServerURL which controlled
whether book preview links should be activated.
This commit fixes that bug and adds respective unit-tests.
Now the book preview link is derived from the content URL link found in
the OPDS entry for a book. If no content URL is present in the OPDS
entry for a book, then preview link is disabled.
The dependence of the welcome page on viewer_settings.js was added
in commit cc6aa9b162 as a hack which
stayed there after the need for it was removed by PR#1044.
The test data was modified so that support for catalog only mode
of kiwix-serve can be properly tested.
The effect of this change in the test data on the library_server unit
test demonstrates that although the new entry does not appear in the
catalog (for example, no catalog_v2_entries* test cases were affected)
the category and language of this ghost entry slipped into the
observable output.
The only way to detect change of the iframe location performed via
`History.pushState()` or `History.replaceState()` is to constantly
monitor it, since those methods don't trigger any events.
Clicking intrapage links (of the form <a href="#anchor">) inside the
viewer iframe is detected by the viewer and reflected in the URL in the
address bar.
The solution only works if following the link is performed by the
browser as a default action. It doesn't work if the changed URL in the
address bar after clicking a link is a result of `History.pushState()`
or `History.replaceState()` being called by javascript code installed as
an event handled on the link (which is the case in single page
applications).
Viewer now rewrites internal links so that opening them in a new
tab/window keeps the viewer around. Thus the viewer acts as a chaperon
for the users preventing them from finding themselves out of the
viewer's supervision. Of course there are ways to circumvent such
oversight, however it has always been the case with chaperons in all
cultures in all epochs.
Additional error info (text of the exception thrown by low level C++
code) is shown inside a text box of the same style as used for the
advice on the 404 error page (we either need to change the name of the
CSS style, or introduce a separate style for this piece of information).
The external link blocker page isn't actually translated since it is not
managed by the viewer. Will port the translation code from the viewer.js
in next commit.
The failing test point in the ServerTest.Http404HtmlError unit-test
has been superseded by the enhanced ServerTest.HttpSexy404HtmlError
unit-test, resulting in a clean test-suite.
Translation of the multi-line/multi-paragraph advice is done under the
assumption that its structure (5 paragraphs, two of which serve as
entries in a bulleted list) can be preserved in the translations by
proper phrasing, i.e. the advice must be translated as a whole rather
than each of its sentences (which act as units of translation) separately.
- Enabled translation on the new 404 error page
- Translated its title & heading
This commit also fixes the failure of the ServerTest.UserLanguageControl
unit-test.
The only remaining failing test case is ServerTest.Http404HtmlError
(only for url=/ROOT%23%3F/content/zimfile/invalid-article?userlang=test).
Converted a few failing test points of the ServerTest.Http404HtmlError
unit-test into a new unit-test ServerTest.HttpSexy404HtmlError.
Some broken test cases still remain.
The page doesn't support translation, yet.
The new 404 error page is used only when accessing ZIM file content
(i.e. as a response from the `/content` API endpoint).
One notable difference from the previous error page is that now no hint
is provided about whether the error is due to trying to access a
non-existent book/ZIM-file or non-existent resource inside a valid
book/ZIM-file (previously such a hint was present in the suggested
search URL). However, when displayed in the viewer this difference can
be seen in the viewer toolbar - book related buttons are hidden if the
URL points to a non-existent book.
This change breaks some unit tests. They will be fixed in a separate
commit.
In our CI a quite old version of gcc (6.3.0) is used for the aarch64
configs and it was confused by the (previous) code of the test program
intended to find out if libatomics must be explicitly passed to the
linker.
- Implemented a spinner to improve user experience while ZIM content is loading in the viewer iframe.
- Added .loader and .spinner styles in kiwix.css.
- The iframe content is initially hidden (visibility: hidden) and will be displayed once loading completes.
- Used CSS animations (@keyframes spin) for a smooth rotating effect.
- Refactored language code handling to replace multiple comma-separated values with 'mul'.
- Single-language entries remain unchanged.
- created the helper function to get the lang tag
- ensured the consistent behaviour for js and nojs version for the kiwix library view
Chrome on Android doesn't support displaying PDF documents inline so an
attempt to load a PDF into the Kiwix viewer iframe fails in a way that
may be confusing to the users. In such situations it is better to offer
to the users to download the PDF file so that they can view it with a
dedicated application. Making the clicked PDF link to open in a new
tab/window achieves exactly that effect.
External links with target="_blank" attribute are opened in a new tab/window.
Also included metaKey (Command key under Mac) in the list of click
modifiers that make the link to be opened in a new tab.
Note however that no upper limit is set on the width of the
autocompleter box - it is possible, but I don't see how we could
come up with a good value for it.
Added to test book descriptions words that serve as keywords
for query syntax with boolean operators (or, and, not, xor, near, adj) enabled.
Note that the change in indexed text has lead to the change in the order
of returned results.
Below are the contents of libkiwix.pc and kiwix.pc files generated using
the new and old approaches, respectively, for native_dyn and
native_static configurations:
```
$ cat BUILD_native_dyn/INSTALL/lib/x86_64-linux-gnu/pkgconfig/libkiwix.pc
prefix=<REDACTED>
includedir=${prefix}/include
libdir=${prefix}/lib/x86_64-linux-gnu
Name: libkiwix
Description: A library that contains useful primitives that Kiwix readers have in common
Version: 14.0.0
Requires.private: icu-i18n, libzim < 10.0.0, libzim >= 9.0.0, pugixml, libcurl, libmicrohttpd, zlib, xapian-core
Libs: -L${prefix}/lib/x86_64-linux-gnu -lkiwix
Libs.private: -pthread
Cflags: -I${includedir}
$ cat BUILD_native_dyn/INSTALL/lib/x86_64-linux-gnu/pkgconfig/kiwix.pc
prefix=<REDACTED>
libdir=${prefix}/lib64
includedir=${prefix}/include
Name: libkiwix
Description: A library that contains a lot of things used by used by other kiwix programs
Version: 14.0.0
Requires: libzim icu-i18n pugixml libcurl libmicrohttpd xapian-core
Libs: -L${libdir} -lkiwix
Cflags: -I${includedir}/
$ cat BUILD_native_static/INSTALL/lib/x86_64-linux-gnu/pkgconfig/libkiwix.pc
prefix=<REDACTED>
includedir=${prefix}/include
libdir=${prefix}/lib/x86_64-linux-gnu
Name: libkiwix
Description: A library that contains useful primitives that Kiwix readers have in common
Version: 14.0.0
Requires: icu-i18n, libzim < 10.0.0, libzim >= 9.0.0, pugixml, libcurl, libmicrohttpd, zlib, xapian-core
Libs: -L${prefix}/lib/x86_64-linux-gnu -lkiwix -pthread
Cflags: -I${includedir} -pthread
$ cat BUILD_native_static/INSTALL/lib/x86_64-linux-gnu/pkgconfig/kiwix.pc
prefix=<REDACTED>
libdir=${prefix}/lib64
includedir=${prefix}/include
Name: libkiwix
Description: A library that contains a lot of things used by used by other kiwix programs
Version: 14.0.0
Requires: libzim icu-i18n pugixml libcurl libmicrohttpd xapian-core
Libs: -L${libdir} -lkiwix
Cflags: -I${includedir}/
```
The notable differences are:
- libdir changed from `${prefix}/lib64` to `${prefix}/lib/x86_64-linux-gnu`
- for native_dyn configuration Requires.private is used
- pthread has appeared in Libs/Libs.private and/or Cflags
- version information was added to the libzim requirement
Added a test that checks that the static resources returned by the
server have content that matches their cacheid. This test currently
fails because some binary resources (e.g. png images) are garbled by the
dos2unix conversion.
- Changed the position of download button to the end of tile and
added a proper download icon to it. When the button is hovered it
becomes darker.
- Also internationalized the "Download" text on the modal download widget
and added download size information to it.
Zimit2 ZIMs employ Wombat for client-side rewriting of URLs. The latter
interferes with our approach of handling in the viewer CTRL/SHIFT-clicks
on links inside articles. This commit disables Wombat temporarily while
changing the href attribute of the clicked link.
Links that should be handled/opened by external applications - such as email
addresses (mailto:), phone numbers (tel:), etc - are opened by the
viewer in a new tab/window, thus avoiding any issues with content
security policy.
ICU package contains a special code "mul" with a self-name of "multiple
languages". libkiwix now suppresses that. As a result the self-name of
"mul" (like for any other unknown language code) is the code itself
(i.e. "mul").
The most prominent user-visible effect of this change is that the
language filter in the library page no longer contains a "Multiple
languages" entry if there is a legacy ZIM file with the language set to
"mul" - that entry now shows up as "Mul".
- Tags in the OPDS feed are HTML encoded and must be decoded.
- Tag values must be HTML encoded when injected into the DOM:
* When the injection is done by setting the innerHTML attribute of a
DOM element, HTML encoding must be done explicitly (since that text
is going to be parsed as HTML).
* When the tag value is expanded into a string that is then set as an
attribute of a DOM element via the setAttribute() method, no HTML
encoding must be done (since Element.setAttribute() directly sets
that value and no HTML decoding is involved in that operation).
`value.end() - 3` may be "before start of string" if string length is < 3.
On Windows debug, is throw an exception.
On other platform it is probably an undefined behavior.
Rewrite the test to avoid such invalid substraction.
Downloader constructor may get stuck if the check for the aria2c RPC
being up gets stuck due to curl_easy_perform() never returning (or, at
least, taking longer than I was willing to wait). Currently it may
happen, for example, after an application crashes with active downloads
being saved to slow media. Then the next creation of a Downloader object
will deal with aria2c immediately resuming those downloads and becoming
unresponsive as it struggles flushing incoming data to slow storage (or
because of some other unfortunate timing of the RPC request being
received while it cannot yet be served).
Otherwise, creating a Downloader object next time may take very long (or
that operation may get stuck) if an active download is being saved to slow
media.
Switched display of ZIM sizes from decimal (MB) to binary (MiB) units.
Also made the size value to be formatted in a more human friendly way
(fractional part is shown only to provide at least three significant
digits).
- Restored kiwix::getNetworkInterfaces() API to the version before
support for IPv6 was introduced
- Renamed the new API method to kiwix::getNetworkInterfacesIPv4Or6()
This was mainly done to prevent further duplication of test data as more
test points around the same query are added next but is also useful on
its own.
Also this change leads to the change in the mapping (since conflicts
that previously went undetected and just overwrote the existing entry
are now rejected).
The extended setup of the NameMapper unit test demonstrates (by the fact
that this change doesn't break the tests that check the stderr) that
certain naming conflicts escape NameMapper's attention.
Two entries in test library.xml used to point to the same file path.
Note that though the third entry pointed to a different file name
it is a symbolic link to the same file.
Now all three entries point to pseudo-different files (having the
same content behind them).
This change is needed so that tests don't break when detection of
conflicting book names is made stricter.
`MigrationMode` was kind of defined in the context of an internal mode
used by `migrateBookmark(...)`.
But now, with `getBestTargetBookId`, it is broken.
This commit fix that and the associated implementation.
Now `UPGRADE_ONLY` will make `getBestTargetBookId` return only newer books.
and `ALLOW_DOWNGRADE` will return older books only if current book is
invalid.
At indexation time, double quote are ignored, so a title as
`TED "talks" - Business` is indexed as `ted talks business`.
By removing the quotes, we ensure that our title "phrase" is not closed
too early and we correctly search for `ted PHRASE talks PHRASE business`
instead of `ted AND talks AND business`.
On top of modifying the existing test, the commit also make
`MigrateBookmark` test fails as `migrateBookmarks` now migrates
from `wrong-book-id-noname` to `Dummy id`.
Fix will be provided in next commit.
Now the search results page is presented by the backend in the language
controlled by the value of the `userlang` URL query parameter (or, if
the latter is missing, the value of the `Accept-Language:` HTTP header).
Note that the front-end doesn't yet take advantage of this
functionality.
Note that static/skin/languages.js must be generated/updated manually
by running the static/generate_i18n_resources_list.py script. Previously
it had to be done only when new languages were added. Now the
translation counts will also need to be updated when new entries are
added to static/skin/i18n/en.json or upon merging a few translatewiki PRs.
... so that extra info about the count of translated strings can be
added.
Note that due to increased size skin/languages.js lost its
too-small-to-be-worth-compressing status.
Added a test case demonstrating how a bad error response could be
generated if </script> appears inside KIWIX_RESPONSE_DATA. That seems to
be the only problematic interaction between HTML-like syntax inside
javascript code (hence the deleted XXX comments on the other two test
cases).
But the question is do we need all of them to be translatable in the
frontend? Maybe only responses to /random, /content and /search endpoints (that
are displayed in the viewer) should be translatable?
Also, the test cases against vulnerabilities in kiwix-serve seem to suggest
that KIWIX_RESPONSE_DATA should be HTML-encoded too.
This commit demonstrates front-end-side translation of an error page
for a URL like /viewer#INVALIDBOOK/whatever (where INVALIDBOOK should
be a book name NOT present in the library).
Known issues:
- This change breaks a couple of subtests in the
ServerTest.Http404HtmlError unit test.
- Changing the UI language while an error page is displayed in the
viewer doesn't retranslate it.
Non-HTML-encoded HTML-template data causes problems in HTML
even when it appears inside JS string (resulting in the <script> tag being
closed by a </script> appearing inside a JS string).
Besides, the KIWIX_RESPONSE_DATA and KIWIX_RESPONSE_TEMPLATE variables
are set on the window object so that they can be accessed from the top
context.
This commit eliminates the need for the `escapeQuote` parameter in
`escapeForJSON()` (that was introduced earlier in this PR) since now it
is set to false in all call contexts. However from the consistency point
of view, the default and intuitive behaviour of `escapeForJSON()` should
be to escape the quote symbols, which justifies the existence of that
parameter.
This is a shortcut change since it doesn't make sense to send the error
page template with every error response (the viewer can fetch it from
the server once but that's slightly more work).
- More familiar escape sequences for tab, newline and carriage return
symbols.
- Quote symbol is escaped by default too, however that behaviour can
be disabled for uses in HTML-related contexts where quotes should then
be replaced with the character entity "
Now when parameterized messages are added to an error response, they are
not immediately instantiated (translated). Instead the message id and
the parameters of the message are recorded. The instantiation of the
messages happens right before generating the final content of the
response.
ContentResponseBlueprint::m_data is now an opaque data member
implemented in the .cpp and ready to be switched from
kainjow::mustache::data to a different implementation.
Next to come warc2zim archive will come with "wombat" embedded.
The purpose of wombat is to be an interface with js code to mask that
we are in a scrapped/zim context to the js.
So it rewrite the `.href` attributes to the original url (ie, an
absolute url to the original website), even if the local relative url
is valid.
Let's ask to wombat to not rewrite href in our special case.
It turned out that ContentResponse::m_root is no longer used.
At this point, the root parameter is dropped only from the 3-ary variant
of ContentResponse::build(), so that its all call sites are
automatically discovered by the compiler (and updated manually).
Including the other (4-ary) variant of ContentResponse::build() in this
change might result in the semantic change of expressions like
`ContentResponse::build(x, y, z)` and failure to update them.
On media (screens) narrower than 420 pixels, the toolbar buttons
are hidden. Before this change, when made visible they were laid out
in two rows. This change places them in a single row and provides
some vertical spacing from the search-box.
Attempts to use the same color for buttons yielded poor results: viewer
toolbar buttons don't look nice on the dark background used for the
filter controls on the library page, whereas the light background of the
viewer toolbar buttons doesn't play well with the filters on the library
page which seem to be designed around the contrast effect.
There was a slight difference (between index.css and taskbar.css) in the
margin values of the UI language selector button, however the values
taken from taskbar.css don't seem to have any visible impact on the
welcome/library page (controlled by index.css).
Moved from index.css into kiwix.css some CSS with global effect thus
making it apply to the viewer too.
Extra font-size directives in taskbar.css are needed to undo the effect
of 'font-size: 62.5%' now applied to the 'html' element type.
The new file kiwix.css is intended to host the intersection of index.css
and taskbar.css. In this commit only font definitions have been moved
into it.
Missing parameters are added to the magnet link for it to work properly
given it is mainly based on webseeds.
Each mirror serving the file is added as a webseed.
Params that requires URIEncoding are now encoded.
Introduces `makeURLSearchString()` to turn SearchParams into a string but only
URIEncoding some specific params.
By moving the nameMapper/library arguments in `getHtml`/`getXml` we avoid
any potential "use after free" of name mapper or library as they are not
stored.
As we now always use Library through a shared_ptr, we can make `Library`
not copiable and not movable.
So we can move back the data in `Library`, we don't care about move
constructor.
The why of this mutex is in `Library` is a bit complex.
It has been introduced in c2927ce when there was only `Library` and no
`std::unique_ptr<Impl>`.
As introducing the mutex imply implementing the move constructor, we have
split all data in `LibraryBase` (and keep a default move constructor here)
and add the mutex in `Library` (and implement a simple move constructor).
Later, in 090c2fd, we have move the `LibraryBase` to `Library::Impl`
(which should have been `Library::Data`).
So at the end, `Library::Impl` is never moved. We can move the `mutex` in
it and still simply implement move constructor for `Library`.
We want to be sure that `Library` actually exists when we use it.
While it is not a silver bullet (user can still create a shared_ptr on
a raw pointer), making the `Server` keep `shared_ptr` on the library
help us a lot here.
We want to be sure that `Library` actually exists when we modify it.
While it is not a silver bullet (user can still create a shared_ptr on
a raw pointer), making the `Manager` keep `shared_ptr` on the library
help us a lot here.
This fix contains a small hack - in order to detect the default language
from browser language preference during the first visit, the library
page has to load /viewer_settings.js which contains that information.
This commit drops the usage of the userlang cookie in the backend but
not in the frontend. UI language control should be broken at this point
and will be fixed in the next few commits.
After upgrading my OS to Ubuntu 22.04 the language selector button
didn't show up in the viewer taskbar. Investigation shows that the id
used in the CSS was applied to the wrong HTML element (the enclosing
<a> rather than <img>).
Created a generic function multipleQuery which takes:
1. string (representing a comma separated list)
2. param (the value to query on using Xapian).
Category and language query will use this function.
Before this fix suggestion links were built out of fully URI-encoded
book name and article path components despite the fact that this measure
was taken against only a few dangerous symbols such as '#', '?', '"' and
'\'. However, URI-encoding the slash symbols in the path has some
undesirable side-effects (see #958).
Henceforth only the problematic symbols are encoded in the article path
component. The book name is still fully URI-encoded since I don't see
any counter-arguments.
meson and ninja both depends on python3 which received an update.
This python3 update fails to install when linking.
This temp fix removes those files. Hopefully a future update will remove the need
for this hack
Now (in a library.xml flow) a book is considered to contain an
illustration only if both "faviconMimeType" and "favicon" attributes
are set to non-empty values.
Presence of the "faviconMimeType" attribute in a book entry in library.xml
file is enough for libkiwix to assume that the book contains an illustration
(even if the "favicon" attribute is missing).
This is a quickfix for the problem observed with external link blocking
during certain history navigation actions (when the cached iframe content is
loaded/restored before the viewer setup is completed).
Since external link blocking doesn't depend on the translations (that
are asynchronously loaded during the viewer setup) it can be performed
unconditionally. However, the current dependence of `on_content_load()`
on viewer setup has to be addressed too.
Before this fix clicking an external link in the viewer iframe had no
effect (other than an error being reported in the browser dev tools
console) because the attempt to navigate the top browser context was
suppressed due to sandboxing - the click handling code changed the
target of the link but navigating to that target was blocked. Now the
click handler works as follows:
1. Changes the target of the link to the catch page only if the
link is going to be opened in a new tab or window (in this case
sandboxing restrictions do not apply).
2. Otherwise directly navigates the viewer window to external URL
or the catch page.
An unhandled scenario is opening an external link in a new tab/window
via a middle click or context menu - such events cannot be intercepted
and therefore there is no way of blocking external links accessed in
the said way.
In firefox, when PDF content is displayed in the viewer, changing the
viewer URL in the address bar had no effect. The most prominent
manifestation of this bug was the broken book home button but the same
issue was present even if the fragment component of the viewer URL was
edited manually. The bug was a result of
1. an optimization preventing any actions if the new content URL is the
same as the old content URL (this was needed to break the infinite loop
of mutual updates of the top-window and content window/iframe URLs when
any one of them changes).
2. sandboxing of the iframe and inability to access the content URL in
iframe because of cross-origin restrictions when the content is a PDF
displayed by the builtin viewer.
Now that issue is fixed. A slight remaining defect is that the
addressbar URL is still not updated when a PDF file is loaded/displayed
in the viewer.
The dependency on libzim must be specified in compiler.has_header_symbol() for
it to find the header in all cases.
Fixes a configure error on FreeBSD:
"""
Header "zim/zim.h" has symbol "LIBZIM_WITH_XAPIAN" : NO
meson.build:33:2: ERROR: Problem encountered: Libzim seems to be compiled without xapian. Xapian support is mandatory.
"""
According to POSIX, sockaddr_in is declared in netinet/in.h.
Some POSIX systems (notably OpenBSD and FreeBSD) declare it in
only this header, so including it is required. Others, like Linux,
are are more lax in exposing symbols to the namespace, providing
sockaddr_in via additional headers, but it does no harm to include
the standard header on such systems.
meson and ninja both depends on python3 which received an update.
This python3 update fails to install when linking.
This temp fix removes those files. Hopefully a future update will remove the need
for this hack
The previous "fix" (merged PR #924) was buggy. It not only didn't work
in Chromium v90, but in more recent versions too.
I verified that this fix works in Firefox (v111) and Chromium (v90):
- Attempts by the ZIM content to break out of the viewer iframe are
blocked.
- PDFs are displayed in the viewer.
Added a <noscript> elements which hides everything on the welcome page if javascript is not enabled.
It displays a text to tell the user to navigate to /nojs endpoint
Added 4 tests for /nojs endpoint
Test 1: no_js_default - Without any filters
Test 2: no_js_eng_lang - With lang=eng as filter
Test 3: no_js_no_books - For 0 results case
Test 4: no_js_download - To test download page
Added cursor type and hints to the UI language selection button. The
hints are always in English since seeing a hint in an unfamiliar language
doesn't help and English is the current lingua franca.
This prevents scripts running inside an iframe from inadvertently
manipulating the top browsing context. However a malicious script could
still remove the sandboxing imposed on it (because the combination of
"allow-same-origin" and "allow-scripts" is vulnerable).
Since kiwixNav is sticky for larger screens now, the tiles area on mobile devices is incredibly low.
This change hides kiwixNav if the screen is scrolled.
If a translation JSON file doesn't contain the 'name' (self-name)
attribute of the translation language then that language is not included
in the list of languages available in the UI language selector.
The language selector on the welcome page has been replaced with
a smaller button that opens a modal language selector. Though the
code for introducing such a modal language selector has been added
in i18n.js, its appearance relies on styles defined in index.css.
Once this new UI for changing the UI language is approved, it must be
used in the ZIM viewer too.
Known issues:
- selecting the language with arrow keys (using the keyboard only,
without pressing space first, so that the full list of languages is
shown) doesn't work because as soon as the current language is changed
the modal language selector disappears.
If the userlang query param is present in the URL it is used to set the
UI language and then is removed from the URL.
Unlike the ZIM viewer, changing the UI language on the welcome page
isn't recorded in the navigation history (and probably it should work
the same way in the ZIM viewer where the appearance of the web page is
affected by the UI language changes to a significantly smaller extent).
With non-empty root location, the canonic form of the root URL for a
kiwix server is now required to end with a slash (to match the situation
for an empty root location). This requirement enables usage of relative
URLs on the welcome page and resources/scripts loaded through that page.
A slashless root URL is redirected to the slashful version.
This translation has to deal with handling of plural forms which is a
tricky part of internationalization, but we are not going to complicate
things in our code and will offload the headache to translators (they
will have to invent a single message for all numbers).
This change adds a <link> element in the head node of welcome page.
Browsers with extensions for RSS will show a sign to navigate to the feed.
The link changes based on current set filters.
Now the root location is URI-encoded too.
In order to properly test this change the root location in the tests was
changed from "/ROOT" to "/ROOT#?" (or "/ROOT%23%3F" in URI-encoded form),
which is why this commit is so big.
This change doesn't make much sense on its own - the real goal is to
prepare some ground for easier implementation of URI-encoding of the root
location.
Testing of this functionality revealed that the query part containing +
symbols (as replacement for spaces in the parameter values) isn't
forwarded properly as the + symbols are URI-encoded (this is a bug on
the part of the `RequestContext::get_query()` the result of which
already contains URI-encoded +'s).
- Before this change `InternalServer::build_redirect()` only URI-encoded the
article path, ignoring the book name and/or the root location components of
the URL.
- In order to be able to test this fix, corner_cases.zim was renamed to
contain a couple of special URL symbols in its filename. The
`create_corner_cases_zim_file` script was updated accordingly.
`false` is a pretty bad default value as most user want to track
the real download.
By removing the default value, we force user to make a choice.
We could have change the default value to true but it would have been
a silent API change and we don't want that.
User may already have a pointer to the `Download` and it is not protected
against concurrent access.
We could update the status of new created `Download` as by definition,
no one have a pointer on it.
But it better to not do it neither :
- For consistency
- Because the first call on update status may be long on windows (because
of file preallocation). It is better to not block the downloader for that.
The recently introduced ZIM viewer UI language selector looked
adequately nice under Firefox without any explicit styling applied.
Under SeaMonkey, however, its default look and feel was intolerable, so
I used this opportunity to make the UI language selector comply with the
current fashion of the ZIM viewer toolbar.
SeaMonkey doesn't yet support [Window.visualViewport][1]. As a result the
height of the content iframe element was initialized to the default 150
pixels and never changed. Fortunately there is [Window.innerHeight][2]
which is supported from the very first days of the Gecko layout engine.
The difference between `Window.visualViewport.height` and
`Window.innerHeight` is that the latter also includes
- the height of the horizontal scroll bar, if present (but in a correctly
implemented ZIM viewer there shouldn't be a horizontal scroll bar for the
full web-page, so it's OK)
- the height of the on-screen keyboard (which is mostly used on mobile
devices where SeaMonkey doesn't run). And it is also arguable if the
appearing on-screen keyboard should squeeze the iframe or slide over
it (in which latter case it may make more sense to always use `innerHeight`
instead of `visualViewport.height`).
[1]: https://developer.mozilla.org/en-US/docs/Web/API/Window/visualViewport
[2]: https://developer.mozilla.org/en-US/docs/Web/API/Window/innerHeight
Now that we have proper UI for user language selection, we don't need
the `?userlang=` query parameter present in the URL. If `?userlang=` is
explicitly provided in the URL, it sets the requested language and
disappears.
Known issues
- styling / placement
- language changes via the selector UI are not recorded in the
navigation history
- changing the language via the UI doesn't update the `?userlang=` URL
query parameter
ZIM viewer is now internally internationalized but the UI language
can only be set by providing the `userlang` query parameter in the URL:
Example:
/viewer?userlang=fr#wikipedia_en_climate_change_mini_2021-03/A/index
^^^^^^^^^^^^
Serving the language list as a JS file rather than JSON simplifies
a few things:
- cacheid management;
- having to manually delay the UI initialization until the JSON file
is loaded.
static/skin/languages.js must be generated/updated manually by running
the static/generate_i18n_resources_list.py script.
Before this change, some of the actions related to the initialization of
the viewer were run in the global scope as a side effect of loading
/skin/viewer.js. This change moves those actions into setupViewer().
This is a quick workaround (at the expense of data duplication) for
having to generate the i18n data in JSON format from the embedded i18n
resource data.
Note, however, that at this point i18n resources are not included in
the list of regular static resources. This will change in the next
commit.
Special URI symbols occurring in the item path part of the search result
link were NOT encoded, because that would also encode the path separator (/)
symbol. Now that `urlEncode()` never encodes the / symbol, it is safe to
encode all other URI-special symbols in the path.
This change is a quick hack solving known issues with URI-encoding in
libkiwix.
This change removes the slash character from the list of URL separator
symbols in URL encoding/decoding utilities, and makes it a symbol that
is safe to leave unencoded.
Effects:
- `urlEncode()` never encodes the '/' symbol (even when it is requested
to encode the URL separator symbols too).
- `urlDecode(str)`/`urlDecode(..., false)` will now decode %2F to '/';
other encoded URL separator symbols are NOT decoded when the second
argument of `urlDecode()` is set to false (which is the default).
Without specifying the "Path" attribute of the cookie in the "Set-Cookie" header
we end up with multiple instances of the cookie for different URLs. We
want a single "global" cookie for kiwix-serve. Besides we want it to be
"permanent" rather than a session cookie, hence the large (1-year-long)
TTL value for the "Max-Age" attribute.
- Description of a test point was not updated in an earlier commit
that added proper handling of the Accept-Language header. Also
after enhancing the limited implementation it made sense to
add another test point demonstrating that the most suitable language
(rather than just the first one in the list) is selected.
- Now failures of the test case because of a missing Set-Cookie header
are more informative.
The question mark (?) is not a valid filename character on Windows.
Changing to a the pound sign (#) so that this repository can still be
cloned on Windows.
Specifying the = symbol with single-character options makes that
character included in the option value (e.g. -l=en results in the
language of the ZIM file being set to =en).
Directly pointing the suggestion link to a /content/... URL avoids
an unnecessary redirection by the server (and an associated bug
related to redirection of URLs with URI-encoded special symbols in
them that - in the current implementation - go into the target URL
in decoded form).
This change fixes two issues:
1. Presence of URL-specific special symbols (such as ? or #) in the book
and/or article name resulted in a wrong suggestion link. This is
fixed by URI-encoding the book name and the path, too.
2. Presence of a single quote symbol in the book and/or article name
resulted in invalid javascript code in the href attribute of the
suggestion link.
The single quote (') symbol is not URL-encoded (unlike its double quote
counterpart). As a result, enclosing a URL-encoded string in single
quotes may result in invalid javascript. Using double quotes instead is
safe, since both double quote (") and backslash (\) symbols (which are
the only special symbols for such quoting) undergo URL-encoding.
* [API Break] Remove wrapper around libzim (@mgautierfr #789)
* Allow kiwix-serve to use custom resource files (@veloman-yunkan #779)
* Properly handle searchProtocolPrefix when rendering search result (@veloman-yunkan #823)
* Prevent search on multi language content (@veloman-yunkan #838)
* Use new `zim::Archive::getMediaCount` from libzim (@mgautierfr #836)
* Catalog:
- Include tags in free text catalog search (@veloman-yunkan #802)
- Illustration's url is based on book's uuid (@veloman-yunkan #804)
- Cleanup of the opds-dumper (@veloman-yunkan #829)
- Allow filtering of catalog content using multiple languages (@veloman-yunkan #841)
- Make opds-dumper respect the namemapper (@mgautierfr #837)
* Server:
- Correctly handle `\` in suggestion json generation (@veloman-yunkan #843)
- Better http caching (@veloman-yunkan #833)
- Make `/suggest` endpoint thread-safe (@veloman-yunkan #834)
- Better redirection of main page (@veloman-yunkan #827)
- Remove jquery (@mgautierfr @juuz0 #796)
- Better Viewer of zim content :
. Introduce `/content` endpoints (@veloman-yunkan #806)
. Switch to iframe based content viewer (@veloman-yunkan #716)
- Optimised design of the welcome page:
. Alignement (@juuz0 @kelson42 #786)
. Exit download modal on pressing escape key (@juzz0 #800)
. Add favicon for different devices (@juzz0 #805)
. Fix auto hidding of the toolbar (@veloman-yunkan #821)
. Allow user to filter books by tags in the front page (@juuz0 #711)
* CI :
- Trigger CI on pull_request (@kelson42 #791)
- Drop Ubuntu Impish packaging (@legoktm #825)
- Add Ubuntu Kinetic packaging (@legoktm #801)
* Testing:
- Test ICULanguageInfo (@veloman-yunkan #795)
- Introduce fake `test` language to test i18n (@veloman-yunkan #848)
* Fix documentation (@kelson42 #816)
* Udpate translation (#787#839#847)
We need a fake language for tests that won't be affected by
modifications made by 3rd party translators (see kiwix/libkiwix#749).
- static/i18n/hy.json was cloned as static/i18n/test.json
- usage of "hy" in unit-tests was replaced with "test"
From now on, the `lang` parameter of the /catalog/search,
/catalog/v2/entries, and /catalog/v2/partial_entries endpoints is
interpreted as a comma-separated list of languages.
Before this change RequestContext::get_query() returned a reordered
query string (alphabetically sorted by the parameter names).
This fix facilitiates testing of responses where the request URL appears
in the response.
Multizim search requires that all selected books be in the same
language.
No new URL query parameter was introduced for specifying the intended
search language - `books.filter.lang` can be used for that purpose.
The server_search unit-test was updated to use a slightly cheating
library xml file where the language of example.zim was tweaked from "en"
to "eng" in order to match that of zimfile.zim. Note that this change
drops from the tested server two other goofy ZIM files corner_cases.zim
and poor.zim that have been/are included in ServerTest.
Before this change cacheids were computed only for those static
resources that were referenced from other resources via KIWIXCACHEID.
A few static resources without such references existed.
Now all resources under skin/ have their cacheids computed.
During static resource preprocessing and compilation their cacheid
values are embedded into libkiwix and can be accessed at runtime.
If a static resource is requsted without specifying any cacheid
it is served as dynamic content (with short TTL and the library id
used for the ETag, though using the cacheid for the ETag would
be better).
If a cacheid is supplied in the request it must match the cacheid of the
resource (otherwise a 404 Not Found error is returned) whereupon the
resource is served as immutable content.
Known issues:
- One issue is caused by the fact that some static resources don't get a
cacheid; this is resolved in the next commit.
- Interaction of this change with the support for dynamically customizing
static resources (via KIWIX_SERVE_CUSTOMIZED_RESOURCES env var) was
not addressed.
One (hopefully, last) remaining relative URL to a static resource
is the reference to ./search-icon.svg found in skin/index.css to which
KIWIXCACHEID could not be applied because of the limitations of the
resource preprocessing script `kiwix-resources`.
Before this fix the root URL for a book was assumed to resolve to the
main page. This was not true for ZIM files containing an entry at an
empty path or with a path equal to "/", resulting in issue #826. The
logic behind this behaviour is found in `kiwix::getEntryFromPath()`.
The fix to that issue is a little more general and will result in an
HTTP redirect in any case where `kiwix::getEntryFromPath(zim, path)`
returns an entry with a real path different from the requested one. In
particular, this will affect the behaviour on ZIM files with the old
namespace scheme, where the requested resource - if not found - is also
looked up in the 'A', 'I', 'J', and/or '-' namespaces. Now instead of
returning the contents of that other resource an HTTP redirect response
will be sent.
A rebase invalidated the cacheids in the previous commits of the
iframe_based_content_viewer branch. This commit fixes only the current
state leaving the history with wrong cacheids - this can be an issue
for `git bisect` being executed on a commit range overlapping with
the iframe_based_content_viewer branch.
If `kiwix-serve` is run with the `--nosearchbar` option the toolbar is
disabled (hidden) in its viewer.
Note however that certain actions performed by the viewer merely with
the purpose of keeping the toolbar up-to-date are still carried out.
Auto hiding of the toolbars on narrow screens works only for the first
page loaded in the viewer. Navigating to other pages interferes with
autohiding as follows:
- If the toolbar was hidden, it stays hidden.
- If the toolbar was not hidden, it loses the ability to autohide.
`--nosearchbar` option of `kiwix-serve` (despite its misleading name)
was used to disable the entire taskbar. This commit accounts for the
existence of that option only partially:
1. Links to books on the welcome/library page are affected - by default
books are displayed in the viewer, but in a kiwix-serve instance run
with --nosearchbar books are loaded in the top window.
2. The `/viewer` endpoint is enabled unconditionally, so if anyone
enters the viewer URL in the address bar they will see books in the
viewer.
Made the viewer respect the `--blockexternal` and `--nolibrarybutton`
options of `kiwix-serve`. Those options are passed to the viewer
via the dynamically generated resource `/viewer_settings.js`.
The only place that the root link is now used is in /skin/index.js,
so added it in static/templates/index.html. But it seems that nothing
prevents us from from switching from aboslute paths to relative paths
in /skin/index.js, which will eliminate the need for the root link
altogether.
As a result of this change content is never decorated by kiwix serve.
This resulted in compiler aided discovery of all call sites where the
default values were used. For OPDS/catalog requests now passing true for the
`raw` parameter, since XML content isn't supposed to undergo any
transformations.
Removed the isHomePage param from one of the variants of
`ContentResponse::build()`. The other overload is dangerous since
failing to review&update all of its call site may result in changed
semantics. Will do it in a couple of separate commits.
Before this fix there were two issues with the taskbar search box:
1. The book used for the suggestions API was resolved only once during
the page load and didn't change during navigation.
2. The current book could not be resolved from a search URL.
Now both issues are fixed.
The greenish taskbar placeholder is gone. The appearance of the old taskbar
is restored. However the taskbar currently contains only the library
button (but the latter leads to the currently blank welcome page).
Foundation for never-ending friendship between viewer_taskbar.js and
viewer.html has been established by a slight change in how the book name
is obtained and commenting out the rest of the code.
Before this fix, browsing history didn't work at all. Now it mostly
works but there are still some quirks that must be debugged further.
Since session history handling turns out to be a rather complex topic
(see https://html.spec.whatwg.org/multipage/history.html) the work in
that direction will be postponed until other features reach a comparable
level of readiness.
The next goal is to redirect old-style /book/path/to/entry URLs to
/content/book/path/to/entry, which seemed pretty trivial.
However, given the current handling of some endpoint URLs, more work was
required to ensure that invalid endpoint URLs (e.g. "/random/number" or
"/suggest/fr") are not interpreted as content URLs. Previously, that was
not a user-observable issue, since the result would be an immediate 404
error (except in certain edge cases, like handling the request for
"/random/number" when there is a book with name "random" containing an
article at path "/number"). With redirection of URLs that were assumed
to refer to content a 404 error would be issued for the
transformed URL ("/content/random/number") which may be confusing.
Therefore this change is to ensure the correct routing of endpoint URL
handling.
Book content is now served under /content/book/...
The old access to book content via a top-level URL /book/... is so far
preserved for backward compatibility.
Redirects were changed to use the new URL scheme. Links in the search results
still use the old scheme.
If the server is initialized with a library.xml file, then the id
specified in the XML file is used (rather than the UUID recorded in the
ZIM file).
Note that in test/data/library.xml the book ids are fake and
different from the real ZIM IDs; that file was created for testing
of the /catalog endpoint which doesn't access ZIM content, so the
the same ZIM file zimfile.zim was added to library.xml three times as
three different books (with unique human-friendly ids). This explains
the diff in test/library_server.cpp.
Now after porting index.js and taskbar.js to vanilla JS, it is time to remove files.
Deleted static/skin/jquery-ui
Updated customIndexPage template in README.md.
Thank you for your service, jQuery :)
This change centers tiles on welcome page to give a more consistent whitespace look on both sides.
For this, the layout in Isotope JS is changed to masonry.
This change introduces filtering by tags.
To filter, the user can click on the tag name and it will filter it.
A label is added (clickable) to show the tag filter, it can be clicked to remove the filter
This removes the onclick handler around the reset-filter link which redirected to '/?lang='
Everything under the handler was already done on window.onload
Previously, if the following steps were executed:
1. Click a book tile/visit an unrelated link from the address bar
2. Press back button
Then forward history was discarded (forward button gets disabled).
This happened because of the window.history.pushState on every window.onload event. This led to the same link being added in history and thus discarding the previous "forward-history"
This change adds a condition to only push the current state if the queries are not same.
One important missing test is that the content of the customized
resource is read from storage every time rather than once. Testing
that requirement would involve creating temporary files which is a
little more work.
During work on the kiwix-serve front-end, the edit-save-test cycle is
a multistep procedure:
1. build and install libkiwix
2. build kiwix-tools
3. run kiwix-serve
4. reload the web-page in the browser
When making changes in static resources that are served by kiwix-serve
unmodified, the steps 1-3 can be eliminated if kiwix-serve is capable of
serving resources from the file-system. This commit adds such a
functionality to kiwix-serve. Now, if during startup of kiwix-serve the
environment variable `KIWIX_SERVE_CUSTOMIZED_RESOURCES` is defined it is
assumed to point to a file where every line has the following format:
URL MIMETYPE RESOURCE_FILE_PATH
When a request is received by kiwix-serve and its URL matches any of the
URLs read from the customized resource file, then the resource data is
read from the respective file RESOURCE_FILE_PATH and served with
mime-type MIMETYPE.
Though this feature was introduced in order to facilitate the
development of the iframe-based content viewer, it can also be useful to
users who would like to customize the kiwix-serve front-end on their own
(without re-building all of kiwix-serve).
There is some overlap with a feature of the kiwix-compile-resources
script that also allows to override resources. The differences are:
1. The new way of customizing front-end resources has all such resources
listed in a text file and there is a single environment variable
from which the path of that file is read. kiwix-compile-resources
associates a separate environment variable with each resource.
2. The new way uses regular paths to identify a resource. The
kiwix-compile-resources method encodes the resource path by replacing
any non-alphanumeric characters (including the path separator) with
underscores (so that the resulting resource identifier can be used
to construct the name of the environment variable controlling that
resource).
3. The new method allows adding new front-end resources. The old method
only allows to modify existing resources.
4. The new method allows (actually requires) to specify the URL at which
the overriden resource should be served (similarly, the MIME-type can/must
be specified, too). The old method only allows to override the contents of
a resource.
5. The new method only allows to override front-end resources that are
served without any preprocessing by kiwix-serve at runtime. The old
method allows to override template resources as well (note that
internationalization/translation resources cannot be overriden using the
old method, either).
Now that ServerTest.searchResults is in a separate cpp file, there are
no reasons for hiding its test data definition inside the unit test
function.
The diff is much-much simpler if whitespace changes are ignored.
Unit-tests of search results in XML format should work the same way with
a server that would inject a taskbar into HTML responses. This small
change actually validates that taskbar injection is disabled for XML
responses.
The file starts now to be too long.
- Move testing of the search html result in `test/server_html_search.cpp`
- Move common code used to launch server and so
in `test/server_testing_tools.h'
This is mainly code move with a small change:
Instead of setting the default PORT (8001) as a const int in the
`ServerTest` class, we now use SERVER_PORT.
SERVER_PORT must be defined before include `server_testing_tools.h`.
This allow several test to be run in parallele without trying to open
the same port.
If we keep a reference to a `Reader` it is better to (share) owning
the reference. Else the reader may be deleted after we create the searcher.
This is especially the case now we are creating the `Reader` at demand
and we don't store it in the library's cache.
We have to reuse the query the user give us to generate the
pagination links.
At search result rendering step we don't have access to the query object.
The best place to know which arguments are used to select books
(and so which arguments to keep in the pagination links) is when we
parse the query to select books.
Fix tests (pagination links) with book selector other than "books.id="
(pattern=jazz&books.query.lang=eng)
libzim's search is not thread safe (mainly because xapian is not).
So we must protect our search objects from multi thread calls.
The best way to do this is to associate a mutex to the `zim::Searcher`
and lock the searcher each time we access object derivated from the
searcher (search, results, iterator, ...)
When ConcurrentCache store a shared_ptr we may have shared_ptr in used
while the ConcurrentCache has drop it.
When we "recreate" a value to put in the cache, we don't want to recreate
it, but copying the shared_ptr in use.
To do so we use a (unlimited) store of weak_ptr (aka `WeakStore`)
Every created shared_ptr added to the cache has a weak_ptr ref also stored
in the WeakStore, and we check the WeakStore before creating the value.
The prefix will be used to parse a "query to select book" in different context.
For now we have only one context : selecting books for the catalog search.
But we will want to select books to do fulltext search on them
(will be done in later commit)
- Throw a exception if we cannot extract from string.
(We throw the same exception as `std::sto*`)
- Add a specialization to extract string from string
- Add some unit test
Xapian version 1.4.18 contains a bug in snippet generation caused by
incorrect handling of stemming.
The test-point with a search pattern "beatles" produced snippets with no
highlights of the search term. Debugging showed that the search pattern
"beatles" was transformed to a search term "beatl" which then didn't
match the word "beatles" in the text from which a snippet had to be
extracted.
The test case passed on my development machine as well as for most CI
configurations. However the "Packages / build-deb (ubuntu-bionic)"
variant failed because of a slightly different handling of punctuation
at the snippet boundaries:
Test context:
url: /ROOT/search?pattern=beatles&content=zimfile
actual snippet: ...side "Yellow Submarine" ...........
expected snippet: ...-side "Yellow Submarine" ...........
Above mismatch resulted in a looser comparison of the snippet contents
and failed the requirement that the snippet MUST contain highlights
(this is how the said bug in Xapian was discovered).
An attempt to change the search pattern to "field" didn't eliminate the
problem. Despite the search pattern itself being in singular form (i.e.
identical to its stemmed version) the plural form "fields" in the
snippet was still not highlighted.
Using for a search pattern an adjective instead of a noun achieved the
desired outcome.
The "expected" snippets in the test data must be a union of all possible
snippets produced at runtime for a given (document, search terms) pair
on all platforms of interest:
- Overlapping snippets must be properly merged
- Non-overlapping snippets can be joined with a " ... " in between.
This is a preliminary implementation checking only the following
cases:
- no search results
- all search results fitting on a single page
The second test-case fails because of a bug in search renderer (leading
to the pagination footer being pointlessly enabled). Will fix it in the
next commit.
Taskbar injected by a server adds distraction to unit-tests focusing
on the HTML contents of the returned pages. The new test-suite
TaskbarlessServerTest will have taskbar disabled.
In #727 inline CSS [was extracted](e4a4b2f961)
from `static/templates/no_search_result.html` into a separate stylesheet
resource. The purpose was to later
1. get rid of the custom `static/templates/no_search_result.html` error
template and use a general purpose error template instead (this was
accomplished by PR #744).
2. deduplicate the CSS code between `static/templates/no_search_result.html` and
`static/templates/search_result.html` by making the latter to also refer to
an internal CSS resource rather than containing inline stylesheet code.
While preparing to implement the 2nd point, I figured out that
`kiwix::SearchRenderer` is used as a component in `kiwix-desktop` too,
which probably would be upset by a link to a libkiwix's internal CSS resource.
This commit documents that finding.
Revert to the plain old 'i18n_resources_list.txt' file.
Auto discovering of i18n file has a main flaw (and a small bug):
- The main flaw is that rerun the configure will not detect new
translation files. It means that if we use cache in our CI,
new translation will not be included.
- The bug is that on Windows, meson fails with a error about a non existent
`` (empty) file name. I suppose it is because python replace
`\n` by `\r\n` on Windows, and the the `.strip().split('\n')` keeps empty
lines.
The small bug could be fixed, but the main flaw make the whole better if
we use a script to generate the listing.
This commit is somehow a half revert of 2eff5b55a6
Previously, on clicking Magnet, we were redirecting to a different site:
https://download.kiwix.org/zim/other/xyzBookWithDate.zim.magnet
This had the real magnet link as page content
Now we use the real magnet link in the href, thus not redirecting and starting the download right away.
Fix#767
Excluding qqq.json any .json file under static/i18n is now considered to
be a i18n resource. This eliminates the need to update the
i18n_resources_list.txt file every time a new language json file is
added. Thus Translatewiki PRs will not require extra work.
Now the whole content of a resource is preprocessed with a single
invocation of `re.sub()` rather than line-by-line.
Also, the function `get_preprocessed_resource()` returns a single value
rather than a (preprocessed_content, modification_count) pair; the
situation when the preprocessed resource is identical to the source
version is signalled by a return value of None.
The cache-id of resources now includes dependency information. This commit
illustrates that property with the changed cache-id of skin/index.js which
depends on skin/{download,hash,magnet,bittorent}.png.
The implementation is not fool-proof - cyclic dependency between
resources is not detected and will lead to infinite recursion.
The current implementation of resource preprocessing contains a bug
(with respect to the problem that it tries to solve): it doesn't take
into account the dependence of static resources on each other. If
resource A refers to B and B refers to C, then a change in C would
result in its cache id being updated in the preprocessed version of B.
However the cache id of B won't change since the cache id is derived
from the source rather than from the preprocessed output.
This commit is the first step towards addressing the described issue.
Now cache-id of a resource is computed on demand rather than precomputed
for all resources. The only thing remaining is to compute the cache-id
from the preprocessed content.
If during an earlier build a resource was symlinked in the build
directory (because it wasn't modified by preprocessing) and later
changes are made to the resource that result in its preprocessing no
longer being a no-op, then the preprocessing is performed (in place) on
the original resource directly (via the symlink). Therefore any symlinks
must be removed before preprocessing a resource.
kiwix-resources preprocesses all resources rather than only templates. At
this point this doesn't change anything since only (some) template resources
contain KIWIXCACHEID placeholders. But this enhancement opens the door
to the preprocessing of static/skin/index.js (after preprocessing is
able to handle relative links, which comes in the next commit).
The story of search_results.css
static/skin/search_results.css was extracted from
static/templates/no_search_result.html before the latter was dropped.
static/templates/no_search_result.html in turn seems to be a copied and
edited version of static/templates/search_result.html.
In the context of exploratory work on the internationalization of
kiwix-serve (PR #679) I noticed duplication of inline CSS across those
two templates and intended to eliminated it. That goal was not fully
accomplished (static/templates/search_result.html remained untouched)
because by that time PR #679 grew too big and the efforts were diverted
into splitting it into smaller ones. Thus search_results.css slipped
into one of those small PRs, without making much sense because nothing
really justifies preserving custom CSS in the "Fulltext search unavailable"
error page.
At the same time, it served as the only case where a link to a cacheable
resource is generated in C++ code (rather than found in a template).
This poses certain problems to the handling of cache-ids. A workaround
is to expel the URL into a template so that it is processed by
`kiwix-resources`. This commit merely demonstrates that solution. But
whether it should be preserved (or rather the "Fulltext search
unavailable" page should be deprived of CSS) is questionable.
In template resources (found under static/templates), strings of the
form "PATH/TO/STATIC/RESOURCE?KIWIXCACHEID" are expanded into
"PATH/TO/STATIC/RESOURCE?cacheid=CACHEIDVAL" where CACHEIDVAL is a
8-digit hexadecimal hash digest of the file at
static/PATH/TO/STATIC/RESOURCE.
Introduced a new unit-test which will ensure that static resources of
kiwix-serve have the cache ids applied to them in the links embedded into
the HTML code.
At this point there are no cache ids. The new unit-test will help to
visualize how they come into existence.
In the absence of the "userlang" query parameter in the URL, the value
of the "Accept-Language" header is used. However, it is assumed that
"Accept-Language" specifies a single language (rather than a comma
separated list of languages possibly weighted with quality values).
Example:
Accept-Language: fr
// should work
Accept-Language: fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5
// The requested language will be considered to be
// "fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5".
// The i18n code will fail to find resources for such a language
// and will use the default "en" instead.
At this point a potential issue has been revealed. Now we produce
the final HTML via 2-level template expansion
1. Render parameterized messages
2. Render the HTML template
In which templates we should use double mustache "{{}}" (HTML-escaping)
tags and where we may use triple mustache "{{{}}}" (non-escaping) tags?
Introduced a new resource compiler script kiwix-compile-i18n that
processes i18n string data stored in JSON files and generates sorted C++
tables of string keys and values for all languages.
The "Fulltext search unavailable" error page is now generated using the
static/templates/error.html template. Also added two test cases checking
that error page.
Currently an internal server error can be triggered by accessing an
entry belonging to a redirect loop. The ZIM file (test/data/poor.zim)
containing such a loop was copied from openzim/zim-tools repository
(test/data/zimfiles/poor.zim).
404.html no longer contains anything specific to the 404 error and will
henceforth serve (with some enhancements) as a general purpose error
page template.
It is better to directly try to get the `Search` from the cache instead
of getting the `Searcher` first which could be useless in Search already
exist.
SearchInfo is a small helper structure to store information about the
queried search. It regroup already existing information (`patternString`,
geo query, ...) in one structure.
It is also used as key in the cache instead of using a generated string.
Instead of passing the `bookName` and `bookTitle` parameters to
`Response::build_404()`, `withTaskbarInfo()` is applied to its result
when needed. Note, that in `InternalServer::handle_raw()`
`withTaskbarInfo()` was not utilized since the results of the `/raw`
endpoint are not supposed to be decorated with a taskbar.
This was done in preparation for removing the `bookName` and `bookTitle`
parameters from `Response::build_404()`, but since the new function
could already be put to some use in this commit that was done too.
Before this change the meaning of test data in the ServerTest.404WithBodyTesting
unit test entirely depended on the number of entries:
- 2 entries: url, expected body
- 4 entries: url, book name, book title and expected body
This was fragile and non scalable (if other combinations of expected
response data are needed).
This commit defines a mini-DSL taking advantage of operator overloading
that allows to define test data in a robust way with the help of the compiler.
Some code in `TestContentIn404HtmlResponse` is obsoleted by this change
however it is not removed in this commit so that the change is easier to
understand. That will be done next.
Previously, the seachURL was not encoded.
This resulted in an XSS vulnerability, a concept of proof is:
start kiwix-serve
visit - http://192.168.18.1:8081/"><svg onload="alert(1)">
This would display an alert message.
This encodes the searchURL before passing it to searchSuggestionHtml
We create a cache for SuggestionSearcher very similar to that of FT
searcher. User can specify a custom cache size using the environment
variable SUGGESTION_SEARCHER_CACHE_SIZE. It has a default value of 10%
of the number of books in the library.
We use the new cache template to implement two kind of cache.
1: The Searcher cache is more general in terms of its usage. A Searcher
can be used for multiple searches without much change to itself. We
try to retrieve the searcher and perform searches using it whenever
possible, and if not we put a searcher into the cache. User can
specify a custom cache length by manipulating the environment
variable SEARCHER_CACHE_SIZE. It's default value is 10% of all the
books available.
2: The search cache is much more restricted in terms of usage. It's main
purpose is to avoid re-searching on the searcher during page changes
to generate SearchResultSet of various ranges. User can specify a
custom cache length using the environment variable SEARCH_CACHE_SIZE
with a default value of 2;
Adds a std::map<std::string, std::string> with display names for language codes not given by libicu
Fault language codes are taken from library.kiwix.org
The `ServerTest.RandomOnNonExistentBook` unit test was replaced with a
more general one testing multiple 404 scenarios where the content of the
body is checked too.
This test was introduced with the purpose of testing the error message
in the 404 page returned by /random for a non-existent book. The actual
expected output currently present in this new unit-test is too much for
that purpose and may become a maintenance burden if more tests of that
kind are added.
Packages/build-deb CI flows failed on ubuntu-bionic and ubuntu-focal
with the following mismatch in the ServerTest.suggestions unit-test:
```
[ RUN ] ServerTest.suggestions
../test/server.cpp:715: Failure
Expected equality of these values:
r->body
Which is: ...
removeEOLWhitespaceMarkers(expectedResponse)
Which is: ...
With diff:
@@ -2,5 +2,5 @@
{
\"value\" : \"Ray (movie)\",
- \"label\" : \"Ray (<b>movie</b>...\",
+ \"label\" : \"Ray (<b>movie</b>)\",
\"kind\" : \"path\"
, \"path\" : \"A/Ray_(movie)\"
Test context:
url: /ROOT/suggest?content=zimfile&term=movie
```
For some reason (probably, a bug), the implementation of
`Xapian::MSet::snippet()` on those platforms decided that a single closing
parenthesis is more than is appropriate for inclusion in the snippet and
replaced it with a (longer) ellipsis.
Taking advantage of the necessity to work around that bug, the
ServerTest.suggestions's functional coverage was enhanced - the
problematic test point was replaced with a new one using a phrase
instead of a single term.
Earlier querySelector for download button was returning a div, on which we called the getAttribute function hence returning null
This now returns a <span> element which returns the correct link with getAttribute
Our libzim packages are "7.2.0~focal" but the ~ means that "7.2.0" is greater than
"7.2.0~focal" so the dependency can't be satisfied. Depending on "7.2.0~" will
allow "7.2.0~focal" to satisfy the dependency.
As we still create a `Reader` in the deprecated code of `Library`,
we need a way to create a reader without raising a deprecated warning.
So we create a another constructor with a dummy argument and we use it.
As the `Entry` is still created by `Reader` we need a way to create a
entry without raising a deprecated warning.
To do so we create a second constructor with a dummy argument.
This second constructor is private and is not marked as deprecated so we
can use it.
The HumanReadableId can contains special char (`&`/`=`/...)
As it is used as to create a url in the opds template,
we must url encode it.
- We don't need to encode the book id as it is a uuid, it never contains
special char.
- We don't need to encode the book url as it is read from the library and
the url must already be correctly encoded in the library.xml.
(tests modified accordingly)
kiwix::fileExists only checks for file existence now
kiwix::fileReadable will check if the file is readable (implicitly checking for file existence also)
As the name suggests it, this endpoint is not smart :
It returns the content as it is and only if it is present
(no compatibility or whatever).
The only "smart" thing is to return a redirect if the entry is a redirect.
As we render the entry's xml in a separated steps, we need to pass the
rootLocation to all the internal rendering.
Testing with and without root is not so easy.
I've simply made all server tests using a ROOT prefix.
We can assume that if the ROOT is present everywhere we need it, it will not
when we don't need. (As long as we don't hardcode "ROOT" in the server.)
At least on Retina Macbook Pros but most likely on other configurations,
the viewport's sizes is not exactly consistent to integer.
For instance, on a maximized Firefox, document.body.offsetHeight is 1,600.
When looking at the <html> on the inspector, I'd get 1,599.6, so **roughly** the same
but not exactly. Those inconsistencies are present on every level so being too strict
about those is probably not adequate.
This fixes#603 but allowing a 2% margin on the scroll position
to match the _end of screen_ and thus trigger the loading of additional cards.
This means that for the example above, it triggers at 1,568 instead of never reaching 1,600.
2% might be too large but it seems safe considering the potential of various resolutions
we may encounter and I don't see any side effect.
As a result of this clean-up the /suggest endpoint too stopped
generating confusing 404 Not Found errors (which, like in /meta's case
is not that important). Another functional change is that the "term"
parameter became optional.
Before this fix the /meta endpoint could return a 404 Not Found page
saying
The requested URL "/meta" was not found on this server.
Error cases producing such a result were:
- `/meta?content=NON-EXISTING-BOOK&name=metaname`
- `/meta?content=book&name=BAD-META-NAME`
Now a proper message is shown for each of those cases.
This fix is being done just for consistency (the /meta endpoint is not
a user-facing one and the scripts don't bother about error texts).
Now Response::build_404() takes the URL instead of the entire
RequestContext object. An empty url suppresses the
The requested URL "url" was not found on this server.
part of the error text.
Before this fix the /random endpoint could return a 404 Not Found page
saying
The requested URL "/random" was not found on this server.
Error cases producing such a result were:
- `/random?content=NON-EXISTING-BOOK` (can happen when a server is
restarted or the library is reloaded and the current book is no longer
available).
- Failure of the libkiwix routine for picking a random article.
Now a proper message is shown for each of those cases.
Library became thread-safe with the exception of `getBookById()`
and `getBookByPath()` methods - thread safety in those accessors is
rendered meaningless by their return type (they return a reference
to a book which can be removed any time later by another thread).
Introducing a mutex in `Library` necessitates manually implementing the
move constructor and assignment operator. It's better to still delegate
that work to the compiler to eliminate any possibility of bugs when new
data members are added to `Library`. The trick is to move the data into
an auxiliary class `LibraryBase` and derive `Library` from it.
Originally `LibraryManipulator` was an abstract class completely decoupled
from `Library`. Its `addBookToLibrary()` and `addBookmarkToLibrary()`
methods could be defined in an arbitrary way. Now `LibraryManipulator` has to be
bound to a library object, those methods are no longer virtual, they always
update the library and allow for some additional actions via virtual
functions `bookWasAddedToLibrary()` and `bookmarkWasAddedToLibrary()`.
Book.updateTest is going to be modified so that it relies on
functionality tested by Book.updateFromXMLTest. Hence the order of the
tests better reflect that dependency.
Deduplicated the mustache templates static/templates/catalog_v2_entries.xml
and static/templates/catalog_v2_complete_entry.xml (the latter was
renamed to static/templates/catalog_v2_entry.xml).
This will allow handle_suggest API to accept two arguments `start` and
`suggestionLength` that will allow handle_suggest to retrieve
suggestions in the given range rather than the default 0-10 range.
With this, we eventually want to see the usage of getResults giving
a FAILING TEST. This happens because the second argument to
getResults is NOT `end` of the range, but `maxResultCount` to retrieve.
This will be fixed in the next commit.
Language code to human friendly name translation is now done with the
help of the ICU library. It works if the line
```
-include $(LANGSRCDIR)/resfiles.mk
```
in the file `source/data/Makefile.in` of the icu4c dependency is not
commented out. Currently, the said line is commented out (along with
some other include's) by the `icu4c_custom_data.patch` patch of the
`kiwix-build` tool.
Introduces a new member mp_search that houses the zim::Search object,
adds a new constructor for this purpose. This commit also add an
overload for getHtml that takes start and end integers as arguments
since they are not part of the search object we include.
With openzim/libzim#540 we now have a new function to get
illustration(previously favicon in 48x48 size and unity scale) in
multiple sizes. We need to replace getFaviconEntry with this new
getIllustrationItem method.
This changes the output of `/catalog/search` as follows:
- Entire search query (rather than only the value of the `q` parameter)
is put in the <title> node.
- Search performed with an empty query presents itself as "All zims".
- The feed id remains stable for identical searches on the same
library.
/catalog/v2/entries is intended to play the combined role of
/catalog/root.xml and /catalog/search of the old OPDS API. Currently,
the latter role is not yet implemented.
Implementation note: instead of tweaking and reusing
`OPDSDumper::dumpOPDSFeed()`, the generation of the OPDS feed is done via `mustache`
and a new template `static/catalog_v2_entries.xml`.
Note: This commit somewhat relaxes validation of non variable
`<updated>` elements in the OPDS feed - the contents of any `<updated>`
element is replaced with the YYYY-MM-DDThh:mm:ssZ placeholder.
Each sugestions used to be stored as vector of strings to hold various values
such as title, path etc inside them. With this commit, we use the new
dedicated class `SuggestionItem` to do the same.
With openzim/libzim#545 we now support snippet generation of titles
which can be used as the display label on the ui for highlighted titles
via the "label" field.
The old version used plain title which is still available in the value
field.
In certain pages like the search result page, bookName is not of the
form `/bookName/endpoint?parameters`. Rather it is available as a query
parameter. From these pages bookName should be assigned from parameters.
After switching to Xapian-based search in the library/catalog, an empty
query stopped acting as a match-all query. This commit restores the old
behaviour in that regard.
Returning status code 204 in case of an empty results doesn't show the
empty results page as described in #466. Reverting the changes in #396
fixes the issue.
The existing example.zim has an improper main page entry hence an
unusable index as reported in openzim/libzim#521
To replace the buggy zim, a new zim has been generated using the latest
zimwriterfs with latest libzim.
-------------------------------------------------------------------------
The directory used to create zim is given below and are two pages from
wikibooks site:
htmlContent
├── favicon.png
├── FreedomBox for Communities_Offline Wikipedia - Wikibooks, open books for an open world_files
│ ├── index.php
│ ├── load.php
│ ├── poweredby_mediawiki_88x31.png
│ └── wikimedia-button.png
├── FreedomBox for Communities_Offline Wikipedia - Wikibooks, open books for an open world.html
├── Wikibooks_files
│ ├── 234px-Megakaryocyte1.svg.png
│ ├── 287px-ChewyGingerCookies.jpg
│ ├── 36px-Commons-logo.svg.png
│ ├── 40px-Wikiquote-logo.svg.png
│ ├── 41px-Wikispecies-logo.svg.png
│ ├── 46px-Wikisource-logo.svg.png
│ ├── 48px-MediaWiki-2020-icon.svg.png
│ ├── 48px-Phacility_phabricator_logo.png
│ ├── 48px-Wikimedia_Cloud_Services_logo.svg.png
│ ├── 48px-Wikimedia_Community_Logo.svg.png
│ ├── 48px-Wikivoyage-Logo-v3-icon.svg.png
│ ├── 51px-Wiktionary-logo.svg.png
│ ├── 53px-Wikipedia-logo-v2.svg.png
│ ├── 59px-Wikiversity_logo_2017.svg.png
│ ├── 86px-Wikidata-logo.svg.png
│ ├── 88px-Wikinews-logo.svg.png
│ ├── Haskell-logo.png
│ ├── index.php
│ ├── load.php
│ ├── poweredby_mediawiki_88x31.png
│ └── wikimedia-button.png
└── Wikibooks.html
The command for writing the zim:
$ zimwriterfs --welcome=Wikibooks.html --favicon=favicon.png --language=en --title=Wikibooks --description=testZim --creator=test --publisher=test --verbose ./htmlContent ./out/example.zim
Catalog filtering should now be case/diacritics insensitive for all
fields. However it is not validated for language, name and category
fields, and is validated for tags, creator & publisher only for text
supplied in the filter (but not for values read from the book).
Catalog filtering by titles/description was sensitive to diacritics
present in the query string. Fixed that.
Also enhanced the unit test to validate the insensitivity to diacritics
present in either the title/description or the query string.
This change fixes the failure of the LibraryTest.filterByPublisher
unit-test broken by the previous commit.
The previous approach used in `publisherQuery()` for building a phrase
query enforcing the specified prefix for all terms fails if
1. the input phrase contains a non-word term that Xapian's query parser
doesn't like (e.g. a standalone ampersand character, 1/2, a#1, etc);
2. the input phrase contains at least three terms that Xapian's query
parser has no issue with.
Using the `quest` tool (coming with xapian-tools under Ubuntu) the
issue can be demonstrated as follows:
```
$ quest -o phrase -d some_xapian_db "Energy & security"
Parsed Query: Query((energy@1 PHRASE 11 Zsecur@2))
Exactly 0 matches
MSet:
$ quest -o phrase -d some_xapian_db "Energy & security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
$ quest -o phrase -d some_xapian_db 'Energy 1/2 security act'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
$ quest -o phrase -d some_xapian_db "Energy a#1 security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
```
The problem comes from parsing the query with the default operation set
to `OP_PHRASE` (exemplified by the `-o phrase` option in above
invocations of `quest`). A workaround is to parse the phrase with a
default operation of `OP_OR` and then combine all the terms with
`OP_PHRASE`.
Besides stemming should be disabled in order to target an exact phrase
match (save for the non-word terms, if any, that are ignored by the
query parser).
Moved the `filter.hasQuery()` check inside `buildXapianQuery()`.
`Library::filterViaBookDB()` only cares if the query that is going to be
run on the book DB would match all documents. The rest of changes
related to enhancing the usage of Xapian for the catalog search will
happen inside `buildXapianQuery()` and `updateBookDB()`.
Now the LibraryTest.filterCheck unit-test validates the actual entries
returned by `Library::filter` (previously only the count of the results
was checked).
The new unit-test fails with a reason not expected before it was
written. The `Library::filter()` operation returns a correct result
after the call to `removeBookById()` (this was a surprise!) but it has
a side-effect of re-adding an empty book with the id still surviving
in the search DB (the emptiness of this re-created book doesn't allow
it to pass the other filtering criteria, which explains why the result
of `Library::filter()` is correct). Had to add a special check
to the new unit-test against that hidden side-effect of
`Library::removeBookById()` + `Library::filter()` combination.
Language code is converted from ISO 639-3 to ISO 639 (which is
understood by Xapian) via ICU. The previous approach via an explicit
map had its advantages since Xapian has more than one stemmer
implementations for some languages (selectable via Xapian-specific
identifiers). This commit relies on the defaults associated with the
ISO 639 language codes.
The search text in the catalog query is interpreted as partial by
default, but partial query mode can be disabled in C++. The latter
possibility is not exposed via the /catalog/search kiwix-serve endpoint,
though.
1. Get the subset of books matching the q (title/description) parameter
of the search
2. Filter out books not matching the other parameters of the search.
Stage 1. currently works in the old way, but will be replaced by Xapian
based search in subsequent commits.
Since the `value` field of the search suggestion results is HTML
escaped/encoded in the backend (see static/templates/suggestion.json) it
must be decoded in the frontend.
The kiwixlib java wrapper unit test can be run manually via the
src/wrapper/java/org/kiwix/testing/compile_test.sh script.
The test ZIM files in src/wrapper/java/org/kiwix/testing were created
using the create_test_zimfiles. They must be updated/re-generated and
committed in git whenever their source data or the create_test_zimfiles
script changes. Note: small.zim.embedded is not used at this point, it
was created for testing the enhancement coming in a few commits.
libzim were dependent of zlib and we were (wrongly) using its
dependency declaration to link with zlib.
Now that libzim doesn't depends on zlib, we need to fix our build system
and explicitly depend on it.
taskbar is placed *above* content using a `padding-top: 3em;` rule
Currently, in regular case, padding-top is too large and leaves ~4/5px between the
taskbar and the content.
This fixes it by using a `calc()` rule to eliminate this extra space
Mimetype may contain a parameters.
Then, the mimetype would be something like "text/html;foo=bar;foz=baz"
It will contains a `;` and `=` and it conflicts with the same operators
we use to separate the items in our list.
We have to use a more advanced algorithm which takes the context into
account.
Fix#416
Use a heap allocated buffer (with lifetime of Aria2 class) instead of
a stack allocated one.
Original fix made by @ZaWertun. Kudos to him.
Fix #kiwix/kiwix-desktop#123, kiwix/kiwix-desktop#513
and kiwix/kiwix-desktop#423
WaitingThread read some shared memory with the SubProcess
(`mutex`, `m_running`).
When we destroy the SubProcess, we must be sure that WaitingThread has
correctly finished else we may have invalid read/write on freed memory.
warc2zim's service worker captures all requests and then decides what to do based on
availability of the URL in the ZIM or not.
To allow the external URL blocking mechanism, it needs to known whether this was
enabled or not (as those in-iframe links won't be caught).
It detects this by looking for the `window.block_path` variable that is set in the
`block_external.js` script.
As this is fragile, we're adding a comment so that a future maintainer knows that
a third party tools relies on it.
On the CI, the native_dyn docker image is setup with a packaged version
on libmicrohttpd for which `MHD_HTTP_RANGE_NOT_SATISFIABLE` is not
defined.
When the CI will be fixed, we can revert this commit.
Android clang complains about the fact it cannot move the
`std::unique_ptr<ContentResponse>` into a `std::unique_ptr<Response>&&`
(for the implicit `std::unique_ptr<Response>` constructor).
Let's help him a bit.
This is only an "interface" for now as other type of response (entry) may
be "transformed" to a ContentResponse.
We cannot move all the code in the class.
While [spec](https://wiki.openzim.org/wiki/Metadata#Favicon) says that the favicon
should be a 48x48 image, ZIM creators might not respect it.
If a ZIM contains a larger favicon, the UI is broken. This fixes it ans ensures all
favicon have equal sizes, removing the unpleasing lack of harmony that we can see sometimes.
Note that ZIM will smaller size favicon would get blurry as those would be upscaled.
With #403, the article mimetype may be different than "text/html".
It can also be "text/html; raw=true".
(And in fact it already could have any kind of optional argument).
The response detect if taskbar must be added depending of the mimetype.
Now, `set_taskbar` can be call unconditionally
(no need to check for the mimetype)
And we don't need to call set_taskbar if we have no information to set.
Some HTML articles are meant to be displayed through a viewer. In this case,
we know we don't want the server to inject the taskbar nor the link blocker
because the content is not a user-ready web page but a partial element of it.
Such articles still need to be `text/html` to be parsed properly by browsers.
This changes the way we decide to display the tasbar or not.
Previously, we were adding it to every article with a MIME __starting with__ `text/html`.
Now, we're additionally preventing it on `text/html` MIME if there is a `;raw=true` string inside.
This leaves articles with MIME `text/html;raw=true` (warc2zim convention) outside
of the taskbar target.
For similar reasons, the external-link blocker is set to apply to the same set of articles.
Previously, it was applied to all articles which was an (unoticable) mistake.
Originally reported against case sensitivity of the Range header
(see issue #387), this fix applies to all request headers (since
according to RFC 7230 all header fields are case-insensitive, see
https://tools.ietf.org/html/rfc7230#section-3.2). However, a
corresponding unit-test was added only for the Range header.
* Switch to debhelper compat 13
* Increase test timeout via `meson test`
* Depend on same version of libzim as meson.build specifies
* Restore mistakenly deleted libkiwix-dev.manpages
* We still need this file to instruct Debian to associate it with the
libkiwix-dev package.
Although the Kiwix library can be (cross-)compiled on/for many
sytems, the following documentation explains how to do it on POSIX
ones. It is primarly thought for GNU/Linux systems and has been tested
on recent releases of Ubuntu and Fedora.
Although the Libkiwix can be (cross-)compiled on/for many systems, the
following documentation explains how to do it on POSIX ones. It is
primarily thought for GNU/Linux systems and has been tested on recent
releases of Ubuntu and Fedora.
Dependencies
------------
The Kiwix library relies on many third parts software libraries. They
are prerequisites to the Kiwix library compilation. Following
libraries need to be available:
The Libkiwix relies on many third party software libraries. They are
prerequisites to the Libkiwix compilation. Following libraries need to
be available:
* [ICU](https://site.icu-project.org/) (package `libicu-dev` on Ubuntu)
* [ZIM](https://openzim.org/) (package `libzim-dev` on Ubuntu)
* [Pugixml](https://pugixml.org/) (package `libpugixml-dev` on Ubuntu)
* [Aria2](https://aria2.github.io/) (package `aria2` on Ubuntu)
* [Mustache](https://github.com/kainjow/Mustache) (Just copy the
header `mustache.hpp` somewhere it can be found by the compiler and/or
set CPPFLAGS with correct `-I` option). Use Mustache version 3 only.
set CPPFLAGS with correct `-I` option). Use Mustache version 4.1 or above.
* [Libcurl](https://curl.se/libcurl) (`libcurl4-gnutls-dev`, `libcurl4-nss-dev` or `libcurl4-openssl-dev` on Ubuntu)
* [Microhttpd](https://www.gnu.org/software/libmicrohttpd) (package `libmicrohttpd-dev` on Ubuntu)
* [Zlib](https://zlib.net/) (package `zlib1g-dev` on Ubuntu)
To test the code:
* [Google Test](https://github.com/google/googletest) (package `googletest` on Ubuntu)
The following dependency needs to be available at runtime:
* [Aria2](https://aria2.github.io/) (package `aria2` on Ubuntu)
These dependencies may or may not be packaged by your operating
system. They may also be packaged but only in an older version. The
compilation script will tell you if one of them is missing or too old.
In the worse case, you will have to download and compile bleeding edge
In the worst case, you will have to download and compile bleeding edge
version by hand.
If you want to install these dependencies locally, then use the
`kiwix-lib` directory as install prefix.
`libkiwix` directory as install prefix.
Environment
-------------
The Kiwix library builds using [Meson](https://mesonbuild.com/) version
0.43 or higher. Meson relies itself on Ninja, pkg-config and few other
The Libkiwix builds using [Meson](https://mesonbuild.com/) version
0.45 or higher. Meson relies itself on Ninja, pkg-config and few other
compilation tools.
Install first the few common compilation tools:
@@ -71,7 +79,7 @@ section.
Compilation
-----------
Once all dependencies are installed, you can compile the Kiwix library
Once all dependencies are installed, you can compile the Libkiwix
with:
```bash
meson . build
@@ -79,12 +87,47 @@ ninja -C build
```
By default, it will compile dynamic linked libraries. All binary files
will be created in the "build" directory created automatically by
will be created in the `build` directory created automatically by
Meson. If you want statically linked libraries, you can add
`--default-library=static` option to the Meson command.
Depending of you system, `ninja` may be called `ninja-build`.
The android wrapper uses deprecated methods of libkiwix so it cannot be compiled
with `werror=true` (the default). So you must pass `-Dwerror=false` to meson:
```bash
meson . build -Dwrapper=android -Dwerror=false
ninja -C build
```
Static files compilation
------------------------
Libkiwix has a few static files 'compiled' within the binary
code. This is mostly Javascript/HTML/pictures necessary for the HTTP
daemon.
These static files are available in the `static` directory and are
compiled by custom Python code available in this repository `scripts`
directory. This happens automatically at compilation time without any
additional command to run.
To avoid HTTP caching issues, the URLs (to the static content) are
appended with a `cacheid` parameter (this is called "cache
busting"). This `cacheid` value derived from the
[sha1sum](https://en.wikipedia.org/wiki/Sha1sum) of each targeted
static file. As a consequence, each time you change a static file, the
corresponding `cacheid` value will change.
To properly test this feature, this `cacheid` needs to be added
manually to the automated tests and has to be commited. After
modifying the needed static file, [run the automated
tests](#Testing). They will fail, but the inspection of the testing
log will give you the new `cacheid` value(s). Finally update
`test/server.cpp` with the appropriate `cacheid` value(s) which have
changed.
Testing
-------
@@ -97,7 +140,7 @@ meson test
Installation
------------
If you want to install the Kiwix library and the headers you just have
If you want to install the Libkiwix and the headers you just have
compiled on your system, here we go:
```bash
ninja -C build install
@@ -108,7 +151,7 @@ where you want to install the libraries. After the installation
succeeded, you may need to run `ldconfig` (as `root`).
Uninstallation
------------
--------------
If you want to uninstall the Kiwix library:
```bash
@@ -118,6 +161,55 @@ ninja -C build uninstall
Like for the installation, you might need to run the command as `root`
(or using `sudo`).
Custom Index Page
-----------------
to use custom welcome page mention `customIndexPage` argument in `kiwix::internalServer()` or use `kiwix::server->setCustomIndexTemplate()`.
(note - while using custom html file please mention all external links as absolute path.)
to create a HTML template with custom JS you need to have a look at various OPDS based endpoints as mentioned [here](https://wiki.kiwix.org/wiki/OPDS) to load books.
To use JS provided by kiwix-serve you can use the following template to start with ->
if not BINARY_RESOURCE_EXTENSIONS.isdisjoint(TEXT_RESOURCE_EXTENSIONS):
raise RuntimeError(f"The following file type extensions are declared to be both binary and text: {BINARY_RESOURCE_EXTENSIONS.intersection(TEXT_RESOURCE_EXTENSIONS)}")
def is_binary_resource(filename):
_, extension = os.path.splitext(filename)
is_binary = extension in BINARY_RESOURCE_EXTENSIONS
is_text = extension in TEXT_RESOURCE_EXTENSIONS
if not is_binary and not is_text:
# all file type extensions of static resources must be listed
# in either BINARY_RESOURCE_EXTENSIONS or TEXT_RESOURCE_EXTENSIONS
raise RuntimeError(f"Unknown file type extension: {extension}")
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.