Added an event listener for message event.
The idea is the website which embeds the widget will send a message using postMessage().
The expected message is:
{
css: // custom CSS code
js: // custom JS code
}
The next goal is to redirect old-style /book/path/to/entry URLs to
/content/book/path/to/entry, which seemed pretty trivial.
However, given the current handling of some endpoint URLs, more work was
required to ensure that invalid endpoint URLs (e.g. "/random/number" or
"/suggest/fr") are not interpreted as content URLs. Previously, that was
not a user-observable issue, since the result would be an immediate 404
error (except in certain edge cases, like handling the request for
"/random/number" when there is a book with name "random" containing an
article at path "/number"). With redirection of URLs that were assumed
to refer to content a 404 error would be issued for the
transformed URL ("/content/random/number") which may be confusing.
Therefore this change is to ensure the correct routing of endpoint URL
handling.
Book content is now served under /content/book/...
The old access to book content via a top-level URL /book/... is so far
preserved for backward compatibility.
Redirects were changed to use the new URL scheme. Links in the search results
still use the old scheme.
If the server is initialized with a library.xml file, then the id
specified in the XML file is used (rather than the UUID recorded in the
ZIM file).
Note that in test/data/library.xml the book ids are fake and
different from the real ZIM IDs; that file was created for testing
of the /catalog endpoint which doesn't access ZIM content, so the
the same ZIM file zimfile.zim was added to library.xml three times as
three different books (with unique human-friendly ids). This explains
the diff in test/library_server.cpp.
Now after porting index.js and taskbar.js to vanilla JS, it is time to remove files.
Deleted static/skin/jquery-ui
Updated customIndexPage template in README.md.
Thank you for your service, jQuery :)
This change centers tiles on welcome page to give a more consistent whitespace look on both sides.
For this, the layout in Isotope JS is changed to masonry.
This change introduces filtering by tags.
To filter, the user can click on the tag name and it will filter it.
A label is added (clickable) to show the tag filter, it can be clicked to remove the filter
This removes the onclick handler around the reset-filter link which redirected to '/?lang='
Everything under the handler was already done on window.onload
Previously, if the following steps were executed:
1. Click a book tile/visit an unrelated link from the address bar
2. Press back button
Then forward history was discarded (forward button gets disabled).
This happened because of the window.history.pushState on every window.onload event. This led to the same link being added in history and thus discarding the previous "forward-history"
This change adds a condition to only push the current state if the queries are not same.
One important missing test is that the content of the customized
resource is read from storage every time rather than once. Testing
that requirement would involve creating temporary files which is a
little more work.
During work on the kiwix-serve front-end, the edit-save-test cycle is
a multistep procedure:
1. build and install libkiwix
2. build kiwix-tools
3. run kiwix-serve
4. reload the web-page in the browser
When making changes in static resources that are served by kiwix-serve
unmodified, the steps 1-3 can be eliminated if kiwix-serve is capable of
serving resources from the file-system. This commit adds such a
functionality to kiwix-serve. Now, if during startup of kiwix-serve the
environment variable `KIWIX_SERVE_CUSTOMIZED_RESOURCES` is defined it is
assumed to point to a file where every line has the following format:
URL MIMETYPE RESOURCE_FILE_PATH
When a request is received by kiwix-serve and its URL matches any of the
URLs read from the customized resource file, then the resource data is
read from the respective file RESOURCE_FILE_PATH and served with
mime-type MIMETYPE.
Though this feature was introduced in order to facilitate the
development of the iframe-based content viewer, it can also be useful to
users who would like to customize the kiwix-serve front-end on their own
(without re-building all of kiwix-serve).
There is some overlap with a feature of the kiwix-compile-resources
script that also allows to override resources. The differences are:
1. The new way of customizing front-end resources has all such resources
listed in a text file and there is a single environment variable
from which the path of that file is read. kiwix-compile-resources
associates a separate environment variable with each resource.
2. The new way uses regular paths to identify a resource. The
kiwix-compile-resources method encodes the resource path by replacing
any non-alphanumeric characters (including the path separator) with
underscores (so that the resulting resource identifier can be used
to construct the name of the environment variable controlling that
resource).
3. The new method allows adding new front-end resources. The old method
only allows to modify existing resources.
4. The new method allows (actually requires) to specify the URL at which
the overriden resource should be served (similarly, the MIME-type can/must
be specified, too). The old method only allows to override the contents of
a resource.
5. The new method only allows to override front-end resources that are
served without any preprocessing by kiwix-serve at runtime. The old
method allows to override template resources as well (note that
internationalization/translation resources cannot be overriden using the
old method, either).
Now that ServerTest.searchResults is in a separate cpp file, there are
no reasons for hiding its test data definition inside the unit test
function.
The diff is much-much simpler if whitespace changes are ignored.
Unit-tests of search results in XML format should work the same way with
a server that would inject a taskbar into HTML responses. This small
change actually validates that taskbar injection is disabled for XML
responses.
The file starts now to be too long.
- Move testing of the search html result in `test/server_html_search.cpp`
- Move common code used to launch server and so
in `test/server_testing_tools.h'
This is mainly code move with a small change:
Instead of setting the default PORT (8001) as a const int in the
`ServerTest` class, we now use SERVER_PORT.
SERVER_PORT must be defined before include `server_testing_tools.h`.
This allow several test to be run in parallele without trying to open
the same port.
If we keep a reference to a `Reader` it is better to (share) owning
the reference. Else the reader may be deleted after we create the searcher.
This is especially the case now we are creating the `Reader` at demand
and we don't store it in the library's cache.
We have to reuse the query the user give us to generate the
pagination links.
At search result rendering step we don't have access to the query object.
The best place to know which arguments are used to select books
(and so which arguments to keep in the pagination links) is when we
parse the query to select books.
Fix tests (pagination links) with book selector other than "books.id="
(pattern=jazz&books.query.lang=eng)
libzim's search is not thread safe (mainly because xapian is not).
So we must protect our search objects from multi thread calls.
The best way to do this is to associate a mutex to the `zim::Searcher`
and lock the searcher each time we access object derivated from the
searcher (search, results, iterator, ...)
When ConcurrentCache store a shared_ptr we may have shared_ptr in used
while the ConcurrentCache has drop it.
When we "recreate" a value to put in the cache, we don't want to recreate
it, but copying the shared_ptr in use.
To do so we use a (unlimited) store of weak_ptr (aka `WeakStore`)
Every created shared_ptr added to the cache has a weak_ptr ref also stored
in the WeakStore, and we check the WeakStore before creating the value.
The prefix will be used to parse a "query to select book" in different context.
For now we have only one context : selecting books for the catalog search.
But we will want to select books to do fulltext search on them
(will be done in later commit)
- Throw a exception if we cannot extract from string.
(We throw the same exception as `std::sto*`)
- Add a specialization to extract string from string
- Add some unit test
Xapian version 1.4.18 contains a bug in snippet generation caused by
incorrect handling of stemming.
The test-point with a search pattern "beatles" produced snippets with no
highlights of the search term. Debugging showed that the search pattern
"beatles" was transformed to a search term "beatl" which then didn't
match the word "beatles" in the text from which a snippet had to be
extracted.
The test case passed on my development machine as well as for most CI
configurations. However the "Packages / build-deb (ubuntu-bionic)"
variant failed because of a slightly different handling of punctuation
at the snippet boundaries:
Test context:
url: /ROOT/search?pattern=beatles&content=zimfile
actual snippet: ...side "Yellow Submarine" ...........
expected snippet: ...-side "Yellow Submarine" ...........
Above mismatch resulted in a looser comparison of the snippet contents
and failed the requirement that the snippet MUST contain highlights
(this is how the said bug in Xapian was discovered).
An attempt to change the search pattern to "field" didn't eliminate the
problem. Despite the search pattern itself being in singular form (i.e.
identical to its stemmed version) the plural form "fields" in the
snippet was still not highlighted.
Using for a search pattern an adjective instead of a noun achieved the
desired outcome.
The "expected" snippets in the test data must be a union of all possible
snippets produced at runtime for a given (document, search terms) pair
on all platforms of interest:
- Overlapping snippets must be properly merged
- Non-overlapping snippets can be joined with a " ... " in between.
This is a preliminary implementation checking only the following
cases:
- no search results
- all search results fitting on a single page
The second test-case fails because of a bug in search renderer (leading
to the pagination footer being pointlessly enabled). Will fix it in the
next commit.
Taskbar injected by a server adds distraction to unit-tests focusing
on the HTML contents of the returned pages. The new test-suite
TaskbarlessServerTest will have taskbar disabled.
In #727 inline CSS [was extracted](e4a4b2f961)
from `static/templates/no_search_result.html` into a separate stylesheet
resource. The purpose was to later
1. get rid of the custom `static/templates/no_search_result.html` error
template and use a general purpose error template instead (this was
accomplished by PR #744).
2. deduplicate the CSS code between `static/templates/no_search_result.html` and
`static/templates/search_result.html` by making the latter to also refer to
an internal CSS resource rather than containing inline stylesheet code.
While preparing to implement the 2nd point, I figured out that
`kiwix::SearchRenderer` is used as a component in `kiwix-desktop` too,
which probably would be upset by a link to a libkiwix's internal CSS resource.
This commit documents that finding.
Revert to the plain old 'i18n_resources_list.txt' file.
Auto discovering of i18n file has a main flaw (and a small bug):
- The main flaw is that rerun the configure will not detect new
translation files. It means that if we use cache in our CI,
new translation will not be included.
- The bug is that on Windows, meson fails with a error about a non existent
`` (empty) file name. I suppose it is because python replace
`\n` by `\r\n` on Windows, and the the `.strip().split('\n')` keeps empty
lines.
The small bug could be fixed, but the main flaw make the whole better if
we use a script to generate the listing.
This commit is somehow a half revert of 2eff5b55a6
Previously, on clicking Magnet, we were redirecting to a different site:
https://download.kiwix.org/zim/other/xyzBookWithDate.zim.magnet
This had the real magnet link as page content
Now we use the real magnet link in the href, thus not redirecting and starting the download right away.
Fix#767
Excluding qqq.json any .json file under static/i18n is now considered to
be a i18n resource. This eliminates the need to update the
i18n_resources_list.txt file every time a new language json file is
added. Thus Translatewiki PRs will not require extra work.
Now the whole content of a resource is preprocessed with a single
invocation of `re.sub()` rather than line-by-line.
Also, the function `get_preprocessed_resource()` returns a single value
rather than a (preprocessed_content, modification_count) pair; the
situation when the preprocessed resource is identical to the source
version is signalled by a return value of None.
The cache-id of resources now includes dependency information. This commit
illustrates that property with the changed cache-id of skin/index.js which
depends on skin/{download,hash,magnet,bittorent}.png.
The implementation is not fool-proof - cyclic dependency between
resources is not detected and will lead to infinite recursion.
The current implementation of resource preprocessing contains a bug
(with respect to the problem that it tries to solve): it doesn't take
into account the dependence of static resources on each other. If
resource A refers to B and B refers to C, then a change in C would
result in its cache id being updated in the preprocessed version of B.
However the cache id of B won't change since the cache id is derived
from the source rather than from the preprocessed output.
This commit is the first step towards addressing the described issue.
Now cache-id of a resource is computed on demand rather than precomputed
for all resources. The only thing remaining is to compute the cache-id
from the preprocessed content.
If during an earlier build a resource was symlinked in the build
directory (because it wasn't modified by preprocessing) and later
changes are made to the resource that result in its preprocessing no
longer being a no-op, then the preprocessing is performed (in place) on
the original resource directly (via the symlink). Therefore any symlinks
must be removed before preprocessing a resource.
kiwix-resources preprocesses all resources rather than only templates. At
this point this doesn't change anything since only (some) template resources
contain KIWIXCACHEID placeholders. But this enhancement opens the door
to the preprocessing of static/skin/index.js (after preprocessing is
able to handle relative links, which comes in the next commit).
The story of search_results.css
static/skin/search_results.css was extracted from
static/templates/no_search_result.html before the latter was dropped.
static/templates/no_search_result.html in turn seems to be a copied and
edited version of static/templates/search_result.html.
In the context of exploratory work on the internationalization of
kiwix-serve (PR #679) I noticed duplication of inline CSS across those
two templates and intended to eliminated it. That goal was not fully
accomplished (static/templates/search_result.html remained untouched)
because by that time PR #679 grew too big and the efforts were diverted
into splitting it into smaller ones. Thus search_results.css slipped
into one of those small PRs, without making much sense because nothing
really justifies preserving custom CSS in the "Fulltext search unavailable"
error page.
At the same time, it served as the only case where a link to a cacheable
resource is generated in C++ code (rather than found in a template).
This poses certain problems to the handling of cache-ids. A workaround
is to expel the URL into a template so that it is processed by
`kiwix-resources`. This commit merely demonstrates that solution. But
whether it should be preserved (or rather the "Fulltext search
unavailable" page should be deprived of CSS) is questionable.
In template resources (found under static/templates), strings of the
form "PATH/TO/STATIC/RESOURCE?KIWIXCACHEID" are expanded into
"PATH/TO/STATIC/RESOURCE?cacheid=CACHEIDVAL" where CACHEIDVAL is a
8-digit hexadecimal hash digest of the file at
static/PATH/TO/STATIC/RESOURCE.
Introduced a new unit-test which will ensure that static resources of
kiwix-serve have the cache ids applied to them in the links embedded into
the HTML code.
At this point there are no cache ids. The new unit-test will help to
visualize how they come into existence.
In the absence of the "userlang" query parameter in the URL, the value
of the "Accept-Language" header is used. However, it is assumed that
"Accept-Language" specifies a single language (rather than a comma
separated list of languages possibly weighted with quality values).
Example:
Accept-Language: fr
// should work
Accept-Language: fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5
// The requested language will be considered to be
// "fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5".
// The i18n code will fail to find resources for such a language
// and will use the default "en" instead.
At this point a potential issue has been revealed. Now we produce
the final HTML via 2-level template expansion
1. Render parameterized messages
2. Render the HTML template
In which templates we should use double mustache "{{}}" (HTML-escaping)
tags and where we may use triple mustache "{{{}}}" (non-escaping) tags?
Introduced a new resource compiler script kiwix-compile-i18n that
processes i18n string data stored in JSON files and generates sorted C++
tables of string keys and values for all languages.
The "Fulltext search unavailable" error page is now generated using the
static/templates/error.html template. Also added two test cases checking
that error page.
Currently an internal server error can be triggered by accessing an
entry belonging to a redirect loop. The ZIM file (test/data/poor.zim)
containing such a loop was copied from openzim/zim-tools repository
(test/data/zimfiles/poor.zim).
404.html no longer contains anything specific to the 404 error and will
henceforth serve (with some enhancements) as a general purpose error
page template.
It is better to directly try to get the `Search` from the cache instead
of getting the `Searcher` first which could be useless in Search already
exist.
SearchInfo is a small helper structure to store information about the
queried search. It regroup already existing information (`patternString`,
geo query, ...) in one structure.
It is also used as key in the cache instead of using a generated string.
Instead of passing the `bookName` and `bookTitle` parameters to
`Response::build_404()`, `withTaskbarInfo()` is applied to its result
when needed. Note, that in `InternalServer::handle_raw()`
`withTaskbarInfo()` was not utilized since the results of the `/raw`
endpoint are not supposed to be decorated with a taskbar.
This was done in preparation for removing the `bookName` and `bookTitle`
parameters from `Response::build_404()`, but since the new function
could already be put to some use in this commit that was done too.
Before this change the meaning of test data in the ServerTest.404WithBodyTesting
unit test entirely depended on the number of entries:
- 2 entries: url, expected body
- 4 entries: url, book name, book title and expected body
This was fragile and non scalable (if other combinations of expected
response data are needed).
This commit defines a mini-DSL taking advantage of operator overloading
that allows to define test data in a robust way with the help of the compiler.
Some code in `TestContentIn404HtmlResponse` is obsoleted by this change
however it is not removed in this commit so that the change is easier to
understand. That will be done next.
Previously, the seachURL was not encoded.
This resulted in an XSS vulnerability, a concept of proof is:
start kiwix-serve
visit - http://192.168.18.1:8081/"><svg onload="alert(1)">
This would display an alert message.
This encodes the searchURL before passing it to searchSuggestionHtml
We create a cache for SuggestionSearcher very similar to that of FT
searcher. User can specify a custom cache size using the environment
variable SUGGESTION_SEARCHER_CACHE_SIZE. It has a default value of 10%
of the number of books in the library.
We use the new cache template to implement two kind of cache.
1: The Searcher cache is more general in terms of its usage. A Searcher
can be used for multiple searches without much change to itself. We
try to retrieve the searcher and perform searches using it whenever
possible, and if not we put a searcher into the cache. User can
specify a custom cache length by manipulating the environment
variable SEARCHER_CACHE_SIZE. It's default value is 10% of all the
books available.
2: The search cache is much more restricted in terms of usage. It's main
purpose is to avoid re-searching on the searcher during page changes
to generate SearchResultSet of various ranges. User can specify a
custom cache length using the environment variable SEARCH_CACHE_SIZE
with a default value of 2;
Adds a std::map<std::string, std::string> with display names for language codes not given by libicu
Fault language codes are taken from library.kiwix.org
The `ServerTest.RandomOnNonExistentBook` unit test was replaced with a
more general one testing multiple 404 scenarios where the content of the
body is checked too.
This test was introduced with the purpose of testing the error message
in the 404 page returned by /random for a non-existent book. The actual
expected output currently present in this new unit-test is too much for
that purpose and may become a maintenance burden if more tests of that
kind are added.
Packages/build-deb CI flows failed on ubuntu-bionic and ubuntu-focal
with the following mismatch in the ServerTest.suggestions unit-test:
```
[ RUN ] ServerTest.suggestions
../test/server.cpp:715: Failure
Expected equality of these values:
r->body
Which is: ...
removeEOLWhitespaceMarkers(expectedResponse)
Which is: ...
With diff:
@@ -2,5 +2,5 @@
{
\"value\" : \"Ray (movie)\",
- \"label\" : \"Ray (<b>movie</b>...\",
+ \"label\" : \"Ray (<b>movie</b>)\",
\"kind\" : \"path\"
, \"path\" : \"A/Ray_(movie)\"
Test context:
url: /ROOT/suggest?content=zimfile&term=movie
```
For some reason (probably, a bug), the implementation of
`Xapian::MSet::snippet()` on those platforms decided that a single closing
parenthesis is more than is appropriate for inclusion in the snippet and
replaced it with a (longer) ellipsis.
Taking advantage of the necessity to work around that bug, the
ServerTest.suggestions's functional coverage was enhanced - the
problematic test point was replaced with a new one using a phrase
instead of a single term.
Earlier querySelector for download button was returning a div, on which we called the getAttribute function hence returning null
This now returns a <span> element which returns the correct link with getAttribute
Our libzim packages are "7.2.0~focal" but the ~ means that "7.2.0" is greater than
"7.2.0~focal" so the dependency can't be satisfied. Depending on "7.2.0~" will
allow "7.2.0~focal" to satisfy the dependency.
As we still create a `Reader` in the deprecated code of `Library`,
we need a way to create a reader without raising a deprecated warning.
So we create a another constructor with a dummy argument and we use it.
As the `Entry` is still created by `Reader` we need a way to create a
entry without raising a deprecated warning.
To do so we create a second constructor with a dummy argument.
This second constructor is private and is not marked as deprecated so we
can use it.
The HumanReadableId can contains special char (`&`/`=`/...)
As it is used as to create a url in the opds template,
we must url encode it.
- We don't need to encode the book id as it is a uuid, it never contains
special char.
- We don't need to encode the book url as it is read from the library and
the url must already be correctly encoded in the library.xml.
(tests modified accordingly)
kiwix::fileExists only checks for file existence now
kiwix::fileReadable will check if the file is readable (implicitly checking for file existence also)
As the name suggests it, this endpoint is not smart :
It returns the content as it is and only if it is present
(no compatibility or whatever).
The only "smart" thing is to return a redirect if the entry is a redirect.
As we render the entry's xml in a separated steps, we need to pass the
rootLocation to all the internal rendering.
Testing with and without root is not so easy.
I've simply made all server tests using a ROOT prefix.
We can assume that if the ROOT is present everywhere we need it, it will not
when we don't need. (As long as we don't hardcode "ROOT" in the server.)
At least on Retina Macbook Pros but most likely on other configurations,
the viewport's sizes is not exactly consistent to integer.
For instance, on a maximized Firefox, document.body.offsetHeight is 1,600.
When looking at the <html> on the inspector, I'd get 1,599.6, so **roughly** the same
but not exactly. Those inconsistencies are present on every level so being too strict
about those is probably not adequate.
This fixes#603 but allowing a 2% margin on the scroll position
to match the _end of screen_ and thus trigger the loading of additional cards.
This means that for the example above, it triggers at 1,568 instead of never reaching 1,600.
2% might be too large but it seems safe considering the potential of various resolutions
we may encounter and I don't see any side effect.
As a result of this clean-up the /suggest endpoint too stopped
generating confusing 404 Not Found errors (which, like in /meta's case
is not that important). Another functional change is that the "term"
parameter became optional.
Before this fix the /meta endpoint could return a 404 Not Found page
saying
The requested URL "/meta" was not found on this server.
Error cases producing such a result were:
- `/meta?content=NON-EXISTING-BOOK&name=metaname`
- `/meta?content=book&name=BAD-META-NAME`
Now a proper message is shown for each of those cases.
This fix is being done just for consistency (the /meta endpoint is not
a user-facing one and the scripts don't bother about error texts).
Now Response::build_404() takes the URL instead of the entire
RequestContext object. An empty url suppresses the
The requested URL "url" was not found on this server.
part of the error text.
Before this fix the /random endpoint could return a 404 Not Found page
saying
The requested URL "/random" was not found on this server.
Error cases producing such a result were:
- `/random?content=NON-EXISTING-BOOK` (can happen when a server is
restarted or the library is reloaded and the current book is no longer
available).
- Failure of the libkiwix routine for picking a random article.
Now a proper message is shown for each of those cases.
Library became thread-safe with the exception of `getBookById()`
and `getBookByPath()` methods - thread safety in those accessors is
rendered meaningless by their return type (they return a reference
to a book which can be removed any time later by another thread).
Introducing a mutex in `Library` necessitates manually implementing the
move constructor and assignment operator. It's better to still delegate
that work to the compiler to eliminate any possibility of bugs when new
data members are added to `Library`. The trick is to move the data into
an auxiliary class `LibraryBase` and derive `Library` from it.
Originally `LibraryManipulator` was an abstract class completely decoupled
from `Library`. Its `addBookToLibrary()` and `addBookmarkToLibrary()`
methods could be defined in an arbitrary way. Now `LibraryManipulator` has to be
bound to a library object, those methods are no longer virtual, they always
update the library and allow for some additional actions via virtual
functions `bookWasAddedToLibrary()` and `bookmarkWasAddedToLibrary()`.
Book.updateTest is going to be modified so that it relies on
functionality tested by Book.updateFromXMLTest. Hence the order of the
tests better reflect that dependency.
Deduplicated the mustache templates static/templates/catalog_v2_entries.xml
and static/templates/catalog_v2_complete_entry.xml (the latter was
renamed to static/templates/catalog_v2_entry.xml).
This will allow handle_suggest API to accept two arguments `start` and
`suggestionLength` that will allow handle_suggest to retrieve
suggestions in the given range rather than the default 0-10 range.
With this, we eventually want to see the usage of getResults giving
a FAILING TEST. This happens because the second argument to
getResults is NOT `end` of the range, but `maxResultCount` to retrieve.
This will be fixed in the next commit.
Language code to human friendly name translation is now done with the
help of the ICU library. It works if the line
```
-include $(LANGSRCDIR)/resfiles.mk
```
in the file `source/data/Makefile.in` of the icu4c dependency is not
commented out. Currently, the said line is commented out (along with
some other include's) by the `icu4c_custom_data.patch` patch of the
`kiwix-build` tool.
Introduces a new member mp_search that houses the zim::Search object,
adds a new constructor for this purpose. This commit also add an
overload for getHtml that takes start and end integers as arguments
since they are not part of the search object we include.
With openzim/libzim#540 we now have a new function to get
illustration(previously favicon in 48x48 size and unity scale) in
multiple sizes. We need to replace getFaviconEntry with this new
getIllustrationItem method.
This changes the output of `/catalog/search` as follows:
- Entire search query (rather than only the value of the `q` parameter)
is put in the <title> node.
- Search performed with an empty query presents itself as "All zims".
- The feed id remains stable for identical searches on the same
library.
/catalog/v2/entries is intended to play the combined role of
/catalog/root.xml and /catalog/search of the old OPDS API. Currently,
the latter role is not yet implemented.
Implementation note: instead of tweaking and reusing
`OPDSDumper::dumpOPDSFeed()`, the generation of the OPDS feed is done via `mustache`
and a new template `static/catalog_v2_entries.xml`.
Note: This commit somewhat relaxes validation of non variable
`<updated>` elements in the OPDS feed - the contents of any `<updated>`
element is replaced with the YYYY-MM-DDThh:mm:ssZ placeholder.
Each sugestions used to be stored as vector of strings to hold various values
such as title, path etc inside them. With this commit, we use the new
dedicated class `SuggestionItem` to do the same.
With openzim/libzim#545 we now support snippet generation of titles
which can be used as the display label on the ui for highlighted titles
via the "label" field.
The old version used plain title which is still available in the value
field.
In certain pages like the search result page, bookName is not of the
form `/bookName/endpoint?parameters`. Rather it is available as a query
parameter. From these pages bookName should be assigned from parameters.
After switching to Xapian-based search in the library/catalog, an empty
query stopped acting as a match-all query. This commit restores the old
behaviour in that regard.
Returning status code 204 in case of an empty results doesn't show the
empty results page as described in #466. Reverting the changes in #396
fixes the issue.
The existing example.zim has an improper main page entry hence an
unusable index as reported in openzim/libzim#521
To replace the buggy zim, a new zim has been generated using the latest
zimwriterfs with latest libzim.
-------------------------------------------------------------------------
The directory used to create zim is given below and are two pages from
wikibooks site:
htmlContent
├── favicon.png
├── FreedomBox for Communities_Offline Wikipedia - Wikibooks, open books for an open world_files
│ ├── index.php
│ ├── load.php
│ ├── poweredby_mediawiki_88x31.png
│ └── wikimedia-button.png
├── FreedomBox for Communities_Offline Wikipedia - Wikibooks, open books for an open world.html
├── Wikibooks_files
│ ├── 234px-Megakaryocyte1.svg.png
│ ├── 287px-ChewyGingerCookies.jpg
│ ├── 36px-Commons-logo.svg.png
│ ├── 40px-Wikiquote-logo.svg.png
│ ├── 41px-Wikispecies-logo.svg.png
│ ├── 46px-Wikisource-logo.svg.png
│ ├── 48px-MediaWiki-2020-icon.svg.png
│ ├── 48px-Phacility_phabricator_logo.png
│ ├── 48px-Wikimedia_Cloud_Services_logo.svg.png
│ ├── 48px-Wikimedia_Community_Logo.svg.png
│ ├── 48px-Wikivoyage-Logo-v3-icon.svg.png
│ ├── 51px-Wiktionary-logo.svg.png
│ ├── 53px-Wikipedia-logo-v2.svg.png
│ ├── 59px-Wikiversity_logo_2017.svg.png
│ ├── 86px-Wikidata-logo.svg.png
│ ├── 88px-Wikinews-logo.svg.png
│ ├── Haskell-logo.png
│ ├── index.php
│ ├── load.php
│ ├── poweredby_mediawiki_88x31.png
│ └── wikimedia-button.png
└── Wikibooks.html
The command for writing the zim:
$ zimwriterfs --welcome=Wikibooks.html --favicon=favicon.png --language=en --title=Wikibooks --description=testZim --creator=test --publisher=test --verbose ./htmlContent ./out/example.zim
Catalog filtering should now be case/diacritics insensitive for all
fields. However it is not validated for language, name and category
fields, and is validated for tags, creator & publisher only for text
supplied in the filter (but not for values read from the book).
Catalog filtering by titles/description was sensitive to diacritics
present in the query string. Fixed that.
Also enhanced the unit test to validate the insensitivity to diacritics
present in either the title/description or the query string.
This change fixes the failure of the LibraryTest.filterByPublisher
unit-test broken by the previous commit.
The previous approach used in `publisherQuery()` for building a phrase
query enforcing the specified prefix for all terms fails if
1. the input phrase contains a non-word term that Xapian's query parser
doesn't like (e.g. a standalone ampersand character, 1/2, a#1, etc);
2. the input phrase contains at least three terms that Xapian's query
parser has no issue with.
Using the `quest` tool (coming with xapian-tools under Ubuntu) the
issue can be demonstrated as follows:
```
$ quest -o phrase -d some_xapian_db "Energy & security"
Parsed Query: Query((energy@1 PHRASE 11 Zsecur@2))
Exactly 0 matches
MSet:
$ quest -o phrase -d some_xapian_db "Energy & security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
$ quest -o phrase -d some_xapian_db 'Energy 1/2 security act'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
$ quest -o phrase -d some_xapian_db "Energy a#1 security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
```
The problem comes from parsing the query with the default operation set
to `OP_PHRASE` (exemplified by the `-o phrase` option in above
invocations of `quest`). A workaround is to parse the phrase with a
default operation of `OP_OR` and then combine all the terms with
`OP_PHRASE`.
Besides stemming should be disabled in order to target an exact phrase
match (save for the non-word terms, if any, that are ignored by the
query parser).
Moved the `filter.hasQuery()` check inside `buildXapianQuery()`.
`Library::filterViaBookDB()` only cares if the query that is going to be
run on the book DB would match all documents. The rest of changes
related to enhancing the usage of Xapian for the catalog search will
happen inside `buildXapianQuery()` and `updateBookDB()`.
Now the LibraryTest.filterCheck unit-test validates the actual entries
returned by `Library::filter` (previously only the count of the results
was checked).
The new unit-test fails with a reason not expected before it was
written. The `Library::filter()` operation returns a correct result
after the call to `removeBookById()` (this was a surprise!) but it has
a side-effect of re-adding an empty book with the id still surviving
in the search DB (the emptiness of this re-created book doesn't allow
it to pass the other filtering criteria, which explains why the result
of `Library::filter()` is correct). Had to add a special check
to the new unit-test against that hidden side-effect of
`Library::removeBookById()` + `Library::filter()` combination.
Language code is converted from ISO 639-3 to ISO 639 (which is
understood by Xapian) via ICU. The previous approach via an explicit
map had its advantages since Xapian has more than one stemmer
implementations for some languages (selectable via Xapian-specific
identifiers). This commit relies on the defaults associated with the
ISO 639 language codes.
The search text in the catalog query is interpreted as partial by
default, but partial query mode can be disabled in C++. The latter
possibility is not exposed via the /catalog/search kiwix-serve endpoint,
though.
1. Get the subset of books matching the q (title/description) parameter
of the search
2. Filter out books not matching the other parameters of the search.
Stage 1. currently works in the old way, but will be replaced by Xapian
based search in subsequent commits.
Since the `value` field of the search suggestion results is HTML
escaped/encoded in the backend (see static/templates/suggestion.json) it
must be decoded in the frontend.
The kiwixlib java wrapper unit test can be run manually via the
src/wrapper/java/org/kiwix/testing/compile_test.sh script.
The test ZIM files in src/wrapper/java/org/kiwix/testing were created
using the create_test_zimfiles. They must be updated/re-generated and
committed in git whenever their source data or the create_test_zimfiles
script changes. Note: small.zim.embedded is not used at this point, it
was created for testing the enhancement coming in a few commits.
libzim were dependent of zlib and we were (wrongly) using its
dependency declaration to link with zlib.
Now that libzim doesn't depends on zlib, we need to fix our build system
and explicitly depend on it.
taskbar is placed *above* content using a `padding-top: 3em;` rule
Currently, in regular case, padding-top is too large and leaves ~4/5px between the
taskbar and the content.
This fixes it by using a `calc()` rule to eliminate this extra space
Mimetype may contain a parameters.
Then, the mimetype would be something like "text/html;foo=bar;foz=baz"
It will contains a `;` and `=` and it conflicts with the same operators
we use to separate the items in our list.
We have to use a more advanced algorithm which takes the context into
account.
Fix#416
Use a heap allocated buffer (with lifetime of Aria2 class) instead of
a stack allocated one.
Original fix made by @ZaWertun. Kudos to him.
Fix #kiwix/kiwix-desktop#123, kiwix/kiwix-desktop#513
and kiwix/kiwix-desktop#423
WaitingThread read some shared memory with the SubProcess
(`mutex`, `m_running`).
When we destroy the SubProcess, we must be sure that WaitingThread has
correctly finished else we may have invalid read/write on freed memory.
warc2zim's service worker captures all requests and then decides what to do based on
availability of the URL in the ZIM or not.
To allow the external URL blocking mechanism, it needs to known whether this was
enabled or not (as those in-iframe links won't be caught).
It detects this by looking for the `window.block_path` variable that is set in the
`block_external.js` script.
As this is fragile, we're adding a comment so that a future maintainer knows that
a third party tools relies on it.
On the CI, the native_dyn docker image is setup with a packaged version
on libmicrohttpd for which `MHD_HTTP_RANGE_NOT_SATISFIABLE` is not
defined.
When the CI will be fixed, we can revert this commit.
Android clang complains about the fact it cannot move the
`std::unique_ptr<ContentResponse>` into a `std::unique_ptr<Response>&&`
(for the implicit `std::unique_ptr<Response>` constructor).
Let's help him a bit.
This is only an "interface" for now as other type of response (entry) may
be "transformed" to a ContentResponse.
We cannot move all the code in the class.
While [spec](https://wiki.openzim.org/wiki/Metadata#Favicon) says that the favicon
should be a 48x48 image, ZIM creators might not respect it.
If a ZIM contains a larger favicon, the UI is broken. This fixes it ans ensures all
favicon have equal sizes, removing the unpleasing lack of harmony that we can see sometimes.
Note that ZIM will smaller size favicon would get blurry as those would be upscaled.
With #403, the article mimetype may be different than "text/html".
It can also be "text/html; raw=true".
(And in fact it already could have any kind of optional argument).
The response detect if taskbar must be added depending of the mimetype.
Now, `set_taskbar` can be call unconditionally
(no need to check for the mimetype)
And we don't need to call set_taskbar if we have no information to set.
Some HTML articles are meant to be displayed through a viewer. In this case,
we know we don't want the server to inject the taskbar nor the link blocker
because the content is not a user-ready web page but a partial element of it.
Such articles still need to be `text/html` to be parsed properly by browsers.
This changes the way we decide to display the tasbar or not.
Previously, we were adding it to every article with a MIME __starting with__ `text/html`.
Now, we're additionally preventing it on `text/html` MIME if there is a `;raw=true` string inside.
This leaves articles with MIME `text/html;raw=true` (warc2zim convention) outside
of the taskbar target.
For similar reasons, the external-link blocker is set to apply to the same set of articles.
Previously, it was applied to all articles which was an (unoticable) mistake.
Originally reported against case sensitivity of the Range header
(see issue #387), this fix applies to all request headers (since
according to RFC 7230 all header fields are case-insensitive, see
https://tools.ietf.org/html/rfc7230#section-3.2). However, a
corresponding unit-test was added only for the Range header.
* Switch to debhelper compat 13
* Increase test timeout via `meson test`
* Depend on same version of libzim as meson.build specifies
* Restore mistakenly deleted libkiwix-dev.manpages
* We still need this file to instruct Debian to associate it with the
libkiwix-dev package.
Same as #372.
I originally missed this because meson didn't know about sh4 until
recently (c.f. https://github.com/mesonbuild/meson/pull/7359), but
this should work fine on older meson versions too.
Debian wants to have the source files for minified scripts. Also the
jqueryui.com downloader is broken, so if we ever needed to modify/fix the
library, it would be good to have a non-minified version on hand.
The non-minified version comes from
<https://github.com/components/jqueryui/blob/1.11.0/jquery-ui.js>, which
was the source for the now-deprecated bower package manager.
We can currently only build for Ubuntu versions because we're not yet
publishing libzim for Debian.
Development builds (on commits to master) will build against master libzim
while release builds (on tag pushes) will build against the most recent
release of libzim.
This define `VERSION` and may conflict with dependent projects.
If some want to get the version of kiwix-lib they can include
`kiwix_config.h` directly.
Previous API were using an internal vector to store the suggestions search
results.
The new API takes a vector as out argument. So user can call the functions
without having to protect the search.
We should change the android API to reflect the change but it is a bit
more complex to do at JNI level. As android do not call it multithreaded
we are safe for now. And we need the new API asap for kiwix-desktop.
So we keep the same API on android for now, the new api will be made
in next version.
Some architectures, specifically armel, mipsel, m68k & powerpc in
Debian, need to explicitly link to atomic.
Use meson to see if the target's CPU family is one of those, and if so,
pass -latomic to the linker.
Tested on armel and mipsel machines to verify passing -latomic works, and
on armhf and amd64 to ensure normal builds aren't broken.
Fixes#371.
The Debian/Ubuntu package for mustache.hpp installs it to
/usr/include/kainjow/mustache.hpp. Have meson look for it in that include
directory as well before erroring out.
Fixes#318.
On windows, `httplib.h` must be included before `windows.h`
We do not include directly `windows.h` in the test but it is included
indirectly by other headers.
Let's include httplib first.
libmicrohttpd handles HEAD requests by dropping the body of the response
(if any). Hence letting a HEAD request through into the code that
processes GET requests is safe.
Also added server unit-tests related to the handling of HEAD requests.
`python` binary is not installed on all platform.
But `python3` is (because meson is python3).
And the script we launch is python3 so use the correct version.
Response::set_entry() was upgraded from a simple setter to a method
performing certain business logic that was previously taken care of by
InternalServer::handle_content().
This change failed to build under the following platforms
due to an older version of libmicrohttpd missing support for
the MHD_DAEMON_INFO_BIND_PORT query:
- Linux (native_dyn)
- Linux (win32_dyn)
Added a new unit test 'test/server.cpp'. The pre-existing unit test
test/kiwixserve.cpp tests kiwixlib indirectly by running the kiwix-serve
executable which may be built with another version of the library.
Included with this commit is the cpp-httplib C++ HTTP/HTTPS library
(https://github.com/yhirose/cpp-httplib, v0.5.11).
The new file test/httplib.h was copied from
https://raw.githubusercontent.com/yhirose/cpp-httplib/v0.5.11/httplib.h
This is surprising, but C++11 fstream doesn't have a constructor
that take wchar as path.
So, on windows, we cannot open a stream on a path containing non ascii
char. VC++ provide an extension for that, but it is not standard and
g++ mingwin doesn't provide it.
So move all our write/read tools function to the plain old c versions,
using _wopen to open wide path on windows.
We must use the wide version of the getenv to correctly handle the case
we have accents in the user directory.
This also change the default dataDirectory on windows from $APPDATA to
$APPDATA/kiwix.
Fixed a regression introduced in block-external-links feature.
For cleaner source, the taskbar (and the block-external JS file) were both
attached to `<head>\n`.
Unfortunately, this isn't safe enough as some ZIM files might have all kinds of HTML
syntax. Sotoki for instance have no CR after head, rendering the attachment impossible.
Note: realizing this method is somehow fragile as any HTML content with extra attribute
on the `<head>` tag or without a `<head>` tag would break the taskbar and the block external feature.
Mustache 4 was release in late october 2019 and is not compatible with kiwix-lib as is.
Replacement in taskbar is messed-up and breaks suggestions.
Until this is fixed/adapted, inform users to use version 3.
if a link contains nested elements like `<a href="http://somewhere"><strong>goto</strong></a>`
then the link is trigger by the `<strong />` element which has no `href` attribute.
We were thus releasing the event in this case, resulting in legitimate external links
not blocked.
We are now looking for the closest `<a />` parent (might be self) to retrieve the `href`
attribute and capture if necessary.
- `setBlockExternalLinks()` on server
- zero-dependency JS code
- JS script added in `inject_externallinks_blocker()`
- changed URL to `/catch/external?source=<source>`
In many use cases, it is not wanted to have user accidentaly click on external links
and leave the served ZIM content.
This could be because the result is unpredictible (reader not implementing this properly)
or because the serve user knows there's no backup internet connexion or because there is
an induced cost behind external links that doesn't affect served content.
using a new flag (`blockExternalLinks`) on `Response`/`setTaskBar`, a piece of JS code
is injected into the taskbar code.
This code adds a JS handler on all link click events and verifies the destination.
If the destination appears to be an external link (1), the link target is changed to
a specific URL:
```
/external?source=<original_uri>
```
(1) external is a link that's not on the same origin and starts with either `http:` `https:` or `//`.
Server implements a new handler on `/external` that displays a new page (`captured_external.html`)
which returns a generic message explaining the situation and offering to click on the link
again should the user really want to.
This is done by specifically asking `set_taskbar` to not block external requests on that page.
This approach allows integrators using a reverse proxy to handle that endpoint differently (rebrand it)
1. `Server` now has an `m_blockExternalLinks` defaulting to `false`
1. `Server.setTaskbar` is extended to support an additional bool to set the variable.
1. `Response` now has an `m_blockExternalLinks`
1. `Response` constr expects an additional bool for `blockExternalLinks`.
1. `Response.set_taskbar` is extended to support an additional bool to set the variable.
1. JNI/Java Wrapper reflects the extensions.
1. New resource file `templates/block_external.js` (included in head_part). Should it be in skin?
1. New resource file `templates/captured_external.html` for `handle_captured_external()`
1. Added a comment on `head_part.html` to help with JS insertion at the right place
1. `introduce_taskbar()` conditionnaly inserts the JS inside the taskbar
We must correctly quote path with space on windows.
This is needed as we can't launch command using a array of string on
windows but by giving only one string using space as separator.
Fixkiwix/kiwix-desktop#268
No real change. Reordering setting and dumping of attribute in the same
order (mostly) they are declared in book.h make it easier to detect missing
attribute.
Downloader::startDownload has a new parameter option which is a vector of
pair that represents the options that can be set for adding a uri with aria2
with the function Aria2::addUri.
Aria2::addUri uses this parameter to set the struct of parameters for the
aria2 command
This mainly use the "new memory system". No need to call dispose function.
Rename the class to Library to conform with the naming semantics
(JNIKiwix* use old memory system)
The JNIKiwixManager is used to manage (insertion of book in) the library.
It is created, as needed, using an existing Library as input.
It is then used to add books, parse library.xml or opds content.
Then it can be destruct (and must be) with the `dispose` method.
```java
library = JNILibrary(...);
manager = JNIManager(library);
manager.parseOpds(opdscontent);
manager.dispose();
// library contains the books declared in the opds content.
// Use the library methods to get the books' info.
```
As we use the main library of gtest (if already installed)
we don't need to (and must not) define a main function for the tests.
Does the same (use main library) if we use the subproject.
All path must be utf8. This is already the case in all our project.
(If this not the case, this is a bug)
So we don't need to have a version with a native and utf8 path.
For some reason, `homebrew install gcovr` now install python3 as
dependency.
This is a dependency since a long time, I don't know why it was not
installing it before.
But the installation fails because it cannot link correctly python3
because python2 is already installed.
Then meson, when it tries to run the conversion script, fails because
the script cannot find python3.
The solution I've found is to unlink python2 and relink python3 to have
a correct installation of python3.
Set the different filter's fields only when we are requested to filter
them. Else, we ends to requests that some fields are empty.
If the request has no argument, we raise an exception (catched) and so
we don't set the corresponding field in the filter.
Fix#303
The default value of this parameter is false, in this case all the bookmarks
are returned, otherwise only those who are related to books of the library.
Api changes :
- removeLastPathElement do not takes extra arguments
`removePreSeparator` and `removePostSeparator`.
This is not needed as path do not need special tailing separator.
- Only one function `split`. Arguments can be implicitly convert to
string. No need for overloading functions to explicitly cast them.
- `split` function takes another argument `trimEmpty`. If true, empty
element are removed.
Path manipulation now almost pass trough a vector<string> to store each
path's part.
Most of the complex works is now made in the normalizeParts function.
There are two executable path :
- The user one (the appimage path)
- The real one (in the appimage archive)
When we search of `library.xml` we need the user one.
But when we search of `aria2c` or `kiwix-serve` we need the real one.
Fixkiwix/kiwix-desktop#256
If kiwix-desktop use a `library.xml` in the same directory than the
executable, we need to use it instead of the default one.
Instead of detect again the `library.xml` to use, let `kiwix-desktop` set
the library to use.
This also fix a issue when `/` is not a valid path separator in windows.
AppImage works by decompressing the "program" in a temporary directory.
So the executable path is not the path of the AppImage file.
By using the environment variables set by appimage we can find the correct
"path" of the executable.
Fixkiwix/kiwix-desktop#46
Android need to handle the redirection by doing a redirection in the web
view, not by providing the content of the targeted article.
This is already what we do in kiwix-serve or ios.
The API should be far better by returning a Entry but for now,
we just change the given url if the article is a redirection.
The server will be running some code on the behalf of the calling code.
We really don't what to crash the library (and the binary) because
of a wrong request.
This code is mainly copied from kiwix-tools.
But :
- Move all the response thing in a new class Response.
- This Response class is responsible to handle all the MHD_response
configuration. This way the server handle a global object and do
no call to MHD_response*
- Server uses a lot more the templating system with mustache.
There are still few regex operations (because we need to
change a content already existing).
- By default, the server serves the content using the id as name.
- Server creates a new Searcher per request. This way, we don't have
to protect the search for multi-thread and we can do several search
in the same time.
- search results are not cached, this will allow future improvement in the
search algorithm.
- the home page is not cached.
- Few more verbose information (number of request served, time spend to
respond to a request).
TOOD:
- Readd interface selection.
- Do Android wrapper.
- Remove KiwixServer (who use a external process).
-
This class is used to map an id (uuid) to a name (potentially human
readable).
This will be use by the server to or renderer to use a different "name"
than the id.
The default NameMapper provided use the id as a name.
Instead of having a single method `listBooksIds` that tries to be
exhaustive about all the filter and sort option, split the method in
two separated methods `filter` and `sort`.
The `filter` method takes a `Filter` object that represent on what we are
filtering. This object has to be construct before calling `filter`.
```cpp
Filter filter;
filter.query("Astring");
filter.acceptTags({"nopic"});
// return all book in eng and with "Astring" in the tile or description".
library.filter(filter);
//equivalent to
library.listBooksIds(ALL, UNSORTED, "Astring", "", "", "", {"nopic"});
// or better
library.filter(Filter().query("Astring").acceptTags({"nopic"}));
```
The method `listBooksIds` has been marked as deprecated.
Add a small test on the library.
The `exit` function will call the functions registered.
Qt may (and does) register function and may (and does) hangup waiting
for some mutex to be free.
Here we are in a forked process and we want to process to exit without
doing any cleanup (because we are a clone of a process that will continue
and we don't want to mess it).
This functions enable to stop and resume download with aria2.
The Downloader's constructor now checks the paused downloads with the
function "tellWaiting()" to get them at the start of kiwix-desktop.
Mingw doesn't implement it. So, we should not use it.
I suppose that it was working before because mingw package for debian trusty
simply no provides a "thread" header.
We may face to include the native "thread" header.
Rpath is not needed to run installed binaries. Most of GNU/Linux
distributions strictly forbid it.
Signed-off-by: Vitaly Zaitsev <vitaly@easycoding.org>
They are not used at all and the windows version need some extra libs
that complexify the code.
Remove them for now. If it happens that they are needed, we will readd
them.
We must use absolute path whenever possible.
Relative path has sense only related to the "interaction" with the user
(current directory, library location, ...).
The `common` name is from the time where kiwix was only one repository
for all the project (android, desktop, server...).
Now we have split the repositories and kiwix-lib is the "common" repo,
the "common" directory is somehow nonsense.
There is a bug in meson 0.43.0 about the option depend_files
(mesonbuild/meson#2633)
By using the `files('search_result.tmpl')`, we workaround the bug and
have everything working whatever the meson version is.
Mustache templating system is a bit simpler than ctpp2 and ctpp2 is no
more maintained (see #189).
We are moving to the kainjow's Mustache project
(https://github.com/kainjow/Mustache).
It simplify a lot our system has it is header only and we don't have to
precompile the template.
Fix#21
This feature is considered obsolete for a while.
In fact, it was already not supported since June 2018 as we were compiling
xapian without the chert backend support.
Assume that we don't support it and remove it from the code.
See kiwix/kiwix-tools#245
This is a API break. library.xml files will still work but the indexPath
and indexType will be dropped silently from the file.
The library now contains (simple) methods to handle bookmarks.
The bookmark are stored in a separate xml file.
Bookmark are mainly a couple (`zimId`, `articleUrl`).
However, in the xml we store a bit more data :
- The article's title (for display)
- The book's title, lang and date (for potential update of zim files)
By default, we are searching in the PATH env var.
However, with an appImage, the executable directory is not in the PATH,
so we have to use an absolute path if we can.
If we cannot find the aria2c executable in the executable directory let's
try to use the system one.
`reader.getFileSize()` return the size of the zim in Kbyte in a
`unsigned int` (32 bits). This is ok as it would overflow if the size
of the size is greater than 4294967295 kbytes (so ~4Tbytes).
However, we need to convert the return size into a unsigned 64 bits integer
else, when converting to bytes, we will overflow at 4Gbytes.
Even in `m_size` is a uint64_t.
When we do a search and paging the result, we need to display to the
user the total number of book, not only the `itemsPerPage`.
So, we need to parse correctly the xml to keep information of the total
number of book.
The value is store as a string in in the xml, so we cannot use getAsI.
We have to get the string and parse it to an int.
We cannot use strtoull because android stdc++ lib doesn't have it.
We have to implement our how parseFromString function using a
istringstream.
When parsing a opds feed, the favicon is a url, not a dataurl.
If we download the favicon all the times, it may take a lot of time to
parse the feed.
We store the url and download the favicon only when needed (when displayed)
- "winsock2.h" needs to be included before "windows.h". But if a
compilation unit include "windows.h" and after "networkTools.h", we
fails and it is complicated to handle. The include must not be in the
header but in the cpp
- windows define some ERROR macro. It is a pitty but we cannot use `ERROR`
in our enum.
- If build statically using mingw we need to define `CURL_STATICLIB`
Library client (kiwix-desktop) need to know when a book is added to
library by the manager. By using a LibraryManipulator, we can do
dependency injection.
It is up to the book to manage its attribute.
Also remove the `absolutePath` (and `indexAbsolutePath`). The `Book::path` is always stored
absolute.
The fact that the path can be stored absolute or relative in the
`library.xml` is not relevant for the book.
With jdk8, `javac` has an option `-h` to generate the header files of
native classes.
So there is no need to run `javah` several times.
As there is now only one command to run (`javac`), there is no need for
the wrapper script `gen_kiwix.sh`.
Fix#167
The correct path for xapian database should be "X/fulltext/xapian",
not "Z//fulltextIndex/xapian".
So lets check for the right path and fallback to the wrong one (but
used in old zims).
The double '/' in the path is a bug of zimwriterfs and is specific
to the xapian database.
We must handle this correctly in `hasFulltextIndex` and not (buggly) in
`pathExists`.
(Hopefully, it seems that pathExists were used only by hasFulltextIndex)
The previous API suffer different problems:
- It was difficult to handle articles redirecting to other article.
- It was not possible to get few information (title) without getting
the whole content.
The new API introduce the new class `Entry` that act as a proxy to an
article in the zim file.
Methods of `Reader` now return an `Entry` and the user has to call
`Entry`'s methods to get useful information.
No redirection is made explicitly.
If an entry is not found, an exception is raised instead of returning
an invalid `Entry`.
The common pattern to get the content of an entry become :
```
std::string content;
try {
auto entry = reader.getEntryFromPath(path);
entry = entry.getFinalEntry();
content = entry.getContent();
} catch (NoEntry& e) {
...
}
```
Older methods are keep (with the same behavior) but are marked as
deprecated.
The library's books are created in the metadata in the opds.
As the opds stream is by definition a distant "library", there is no
zim to read to complete missing information.
This can lead to incomplete `library.xml`.
The downloader is using libaria2.
For now, only one download can be run a the time.
A download will start only if (and as soon as) no download is running.
If we check the later the `loopCounter` with 42, we must pre-increment the
content. Else, in case of infinite loop, the `loopCounter` will be 43.
Related to kiwix/kiwix-tools#168
`libzim` will not search in zim file without embedded fulltext index.
If we don't want to mess up with result index, we must not store "wrong"
reader.
Fix#111
If the `nativeHandle` is null, the JNIKiwixReader is invalid and we must
not use it.
Throwing an exception for the caller code to handle this properly.
And previously, user code has no way to detect something went wrong :/
For binary content (not compressed), it could be interesting to
directly read the content in the zim file instead of using `kiwix-lib`.
This method returns the needed information to do so (if possible).
Even if we use the add_reader method to search into embedded full text
index, we need to specify the global `contentHumanReadableId` as it will
be used to generate "page links".
Add the jni method `getContentPart` to get only a part of the artcicle
content.
The method can be used to get a part of the content or to know the size
of the full content.
This is a major API break. User code will have to be rewritten.
Before this commit, API was a unique object wrapping the library and
handle a global state with one `Reader` and one `Writer` at the time.
Now, the API is axed around three main objects :
- The `JNIKiwixReader`, a wrapper around a `kiwix::Reader` (who allow to
read one zim)
- The `JNIKiwixSearcher`, a wrapper around a `kiwix::Searcher` (who allow
to search through one or more reader(s))
- The `JNIKiwixSearcher.Result` a result of a search. Allowing to get all
information about a result (title, url, content, snippet, ...)
This is a small quick and dirty API to do geo query.
It is not possible with this API to do a query search and a geo search.
It's either one or the other.
We should think about a better global API to do searching and provide
both of them in the same time (libzim does it).
70 is a too small limit for the number of results.
Users need at least 100.
As the html rendering will fails with more than 144 results,
explicitly limits the number of search to 140.
Fixeskiwix/kiwix-tools#92
Ctpp2 templates have a limit step number. If the template to render is
too big, the rendering fails, throwing an exception.
From our tests, it seems that, with the template we have, the default
step limit allow us to render 68 results only.
By doubling the limit, we can render up to 144 results.
2017-11-06 12:10:48 +01:00
267 changed files with 31620 additions and 6937 deletions
These tools should be packaged if you use a cutting edge operating
system. If not, have a look to the [Troubleshooting](#Troubleshooting)
section.
Compilation
-----------
Once all dependencies are installed, you can compile kiwix-lib with:
```
mkdir build
Once all dependencies are installed, you can compile the Libkiwix
with:
```bash
meson . build
cd build
ninja
ninja -C build
```
By default, it will compile dynamic linked libraries. If you want
statically linked libraries, you can add `--default-library=static`
option to the Meson command.
By default, it will compile dynamic linked libraries. All binary files
will be created in the `build` directory created automatically by
Meson. If you want statically linked libraries, you can add
`--default-library=static` option to the Meson command.
Depending of you system, `ninja` may be called `ninja-build`.
The android wrapper uses deprecated methods of libkiwix so it cannot be compiled
with `werror=true` (the default). So you must pass `-Dwerror=false` to meson:
```bash
meson . build -Dwrapper=android -Dwerror=false
ninja -C build
```
Testing
-------
To run the automated tests:
```bash
cd build
meson test
```
Installation
------------
If you want to install the libraries you just have compiled on your
system, here we go:
If you want to install the Libkiwix and the headers you just have
compiled on your system, here we go:
```bash
ninja -C build install
```
ninja install
You might need to run the command as root (or using `sudo`), depending
where you want to install the libraries. After the installation
succeeded, you may need to run `ldconfig` (as `root`).
Uninstallation
------------
If you want to uninstall the Kiwix library:
```bash
ninja -C build uninstall
```
Like for the installation, you might need to run the command as `root`
(or using `sudo`).
Troubleshooting
---------------
If you need to install Meson "manually":
```bash
virtualenv -p python3 ./ # Create virtualenv
source bin/activate # Activate the virtualenv
pip3 install meson # Install Meson
hash -r # Refresh bash paths
```
If you need to install Ninja "manually":
```bash
git clone git://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
mkdir ../bin
cp ninja ../bin
cd ..
```
You might need to run the command as root, depending where you want to
install the libraries.
Custom Index Page
-----------------
to use custom welcome page mention `customIndexPage` argument in `kiwix::internalServer()` or use `kiwix::server->setCustomIndexTemplate()`.
(note - while using custom html file please mention all external links as absolute path.)
to create a HTML template with custom JS you need to have a look at various OPDS based endpoints as mentioned [here](https://wiki.kiwix.org/wiki/OPDS) to load books.
To use JS provided by kiwix-serve you can use the following template to start with ->
methodCall.newParamValue().set(99);// max number of downloads to be returned, don't know how to set this properly assumed that there will not be more than 99 paused downloads.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.