Compare commits

...

326 Commits
0.1 ... 4.1.0

Author SHA1 Message Date
Matthieu Gautier
42b7692f9b Merge pull request #208 from kiwix/new_version
Version 4.1.0
2019-03-19 16:35:57 +01:00
Matthieu Gautier
cd654b9cae Version 4.1.0 2019-03-19 16:26:14 +01:00
Matthieu Gautier
15dafcaa80 Force use of meson 0.49.2 2019-03-19 15:27:03 +01:00
Matthieu Gautier
f71f2935e0 Merge pull request #204 from kiwix/library_filter_tag
Allow the library to be filtered by tags.
2019-03-07 17:19:31 +01:00
Matthieu Gautier
c6254d9504 Allow the library to be filtered by tags.
This add an argument to `listBooksIds` to filter by tags.
So, this is an API break.
2019-03-07 17:08:39 +01:00
Matthieu Gautier
f1a046757e Merge pull request #203 from kiwix/fix_lang_mapping
Fix the language mapping.
2019-03-05 18:43:00 +01:00
Matthieu Gautier
93af3aa2d1 Fix the language mapping.
The previous mapping was taken from an unknown (:/) source.

The new mapping is generated with a script taking
https://www.loc.gov/standards/iso639-2/php/code_list.php as source.

The source list is sanitized to keep only language for which we
(http://library.kiwix.org/) have content.
2019-03-05 17:53:37 +01:00
Matthieu Gautier
336a987bb2 Merge pull request #202 from kiwix/update_readme_mustache
Add information about mustache dependency in the README.
2019-03-04 17:03:56 +01:00
Matthieu Gautier
72b4af4d65 Add information about mustache dependency in the README. 2019-03-04 14:26:40 +01:00
Matthieu Gautier
9aa1c65d7a Merge pull request #200 from kiwix/new_version
New version 4.0.1
2019-02-22 11:18:55 +01:00
Matthieu Gautier
ad6b20a530 New version 4.0.1 2019-02-22 10:29:16 +01:00
Matthieu Gautier
c1d04cc5b5 Merge pull request #199 from kiwix/fix_warning_android
Correctly initialize variable.
2019-02-19 14:15:20 +01:00
Matthieu Gautier
af9734c87f Correctly initialize variable. 2019-02-19 14:05:37 +01:00
Matthieu Gautier
a7a0798f99 Merge pull request #198 from kiwix/use_correct_dep_archive
Use new xz archive.
2019-02-19 14:05:08 +01:00
Matthieu Gautier
0154fdd190 Use new xz archive. 2019-02-19 13:31:26 +01:00
Matthieu Gautier
788d16ec01 Merge pull request #197 from kiwix/ensure_path_abs
Ensure path abs
2019-02-07 15:55:51 +01:00
Matthieu Gautier
35d812a5f7 Ensure the book's path is absolute.
We must use absolute path whenever possible.
Relative path has sense only related to the "interaction" with the user
(current directory, library location, ...).
2019-02-07 15:22:33 +01:00
Matthieu Gautier
432f9c30a3 Remove unused variable url. 2019-02-07 15:20:18 +01:00
Matthieu Gautier
ab94ac0ee8 Merge pull request #195 from kiwix/new_version
New version
2019-01-29 11:38:06 +01:00
Matthieu Gautier
1ac6d4cb20 New version 4.0.0 2019-01-29 11:29:59 +01:00
Matthieu Gautier
26b61a2d09 We do not need the exact version 0.43.0 for meson. 2019-01-29 11:29:59 +01:00
Matthieu Gautier
aab88c9022 Merge pull request #194 from kiwix/common2tools
[API break] Move all the tools in the tools directory instead of common.
2019-01-23 16:55:09 +01:00
Matthieu Gautier
af7689e3e8 [API break] Move all the tools in the tools directory instead of common.
The `common` name is from the time where kiwix was only one repository
for all the project (android, desktop, server...).

Now we have split the repositories and kiwix-lib is the "common" repo,
the "common" directory is somehow nonsense.
2019-01-23 15:31:38 +01:00
Matthieu Gautier
ecb2a80baf Merge pull request #193 from kiwix/fix_uninitalized
Correctly initialize retVal.
2019-01-23 12:05:34 +01:00
Matthieu Gautier
b996a2877c Correctly initialize retVal. 2019-01-23 11:51:30 +01:00
Matthieu Gautier
a98594c084 Merge pull request #192 from kiwix/workaround_depend_files
Workaround a bug in meson 0.43.0 about custom_target's depend_files option.
2019-01-10 16:08:24 +01:00
Matthieu Gautier
b9696dceac Workaround a bug in meson 0.43.0 about custom_target's depend_files option.
There is a bug in meson 0.43.0 about the option depend_files
(mesonbuild/meson#2633)

By using the `files('search_result.tmpl')`, we workaround the bug and
have everything working whatever the meson version is.
2019-01-10 15:59:42 +01:00
Matthieu Gautier
550b6df414 Merge pull request #191 from kiwix/mustache_template
Move the templating system to mustache instead of ctpp2.
2019-01-10 11:45:56 +01:00
Matthieu Gautier
be498c3b16 Make the string Tools functions available in android. 2019-01-09 18:29:20 +01:00
Matthieu Gautier
92c9a47a0d Move the templating system to mustache instead of ctpp2.
Mustache templating system is a bit simpler than ctpp2 and ctpp2 is no
more maintained (see #189).
We are moving to the kainjow's Mustache project
(https://github.com/kainjow/Mustache).

It simplify a lot our system has it is header only and we don't have to
precompile the template.

Fix #21
2019-01-09 18:28:48 +01:00
Matthieu Gautier
c73ac9f2cd Merge pull request #190 from kiwix/no_external_index
Remove support for external index.
2019-01-08 16:13:54 +01:00
Matthieu Gautier
5159d985c6 Remove support for external index.
This feature is considered obsolete for a while.
In fact, it was already not supported since June 2018 as we were compiling
xapian without the chert backend support.

Assume that we don't support it and remove it from the code.
See kiwix/kiwix-tools#245

This is a API break. library.xml files will still work but the indexPath
and indexType will be dropped silently from the file.
2019-01-07 16:47:08 +01:00
Matthieu Gautier
cb98f11ddc Merge pull request #188 from kiwix/create_directory
Create the datadirectory to not fail to write the aria2 session file.
2018-12-14 16:44:46 +01:00
Matthieu Gautier
29046bfc05 Create the datadirectory to not fail to write the aria2 session file.
Fix kiwix/kiwix-desktop#69
2018-12-14 15:24:13 +01:00
Matthieu Gautier
dd5dd14ec9 Merge pull request #187 from kiwix/new_version
new version 3.1.1
2018-12-13 18:05:35 +01:00
Matthieu Gautier
49a606a043 new version 3.1.1 2018-12-13 17:29:21 +01:00
Matthieu Gautier
b641f7b116 Merge pull request #186 from kiwix/fix_library
Fix library
2018-12-11 17:08:30 +01:00
Matthieu Gautier
e6d7ba06fb Convert the standard opds date to our format (YYYY-MM-DD) 2018-12-11 17:02:02 +01:00
Matthieu Gautier
0f812c6584 The update entry of the book should be the date of the book, not the feed. 2018-12-11 17:01:33 +01:00
Matthieu Gautier
716c87dd20 Remove duplicate language attribute in the libxml dumper.
Silly copy/paste.
2018-12-11 17:00:56 +01:00
Matthieu Gautier
090c4f5970 Merge pull request #185 from kiwix/new_version
new version 3.1.0
2018-12-03 11:21:15 +01:00
Matthieu Gautier
cf28af4439 new version 3.1.0 2018-12-02 15:56:00 +01:00
Matthieu Gautier
6777bfeecf Merge pull request #184 from kiwix/bookmarks
Bookmarks
2018-12-02 15:52:52 +01:00
Matthieu Gautier
12498e2cfe Add bookmarks support.
The library now contains (simple) methods to handle bookmarks.
The bookmark are stored in a separate xml file.

Bookmark are mainly a couple (`zimId`, `articleUrl`).
However, in the xml we store a bit more data :
- The article's title (for display)
- The book's title, lang and date (for potential update of zim files)
2018-12-02 15:47:29 +01:00
Matthieu Gautier
b5ce60a627 Move the dump of the library into library.xml in a specific class.
The same way the dump into a opds feed is in a specific class.
2018-11-28 12:09:28 +01:00
Matthieu Gautier
c9cc58973c Merge pull request #183 from kiwix/book_faviconUrl
Add Book::getFaviconUrl
2018-11-15 17:53:24 +01:00
Matthieu Gautier
062124a2a0 Add Book::getFaviconUrl 2018-11-15 17:47:41 +01:00
Matthieu Gautier
622b22b2cc Merge pull request #180 from kiwix/new_version
New version 3.0.3
2018-11-12 18:05:43 +01:00
Matthieu Gautier
2821b9e06a New version 3.0.3 2018-11-12 16:48:35 +01:00
Matthieu Gautier
ac49776792 Merge pull request #182 from kiwix/fix_aria2c_launch
Wait a bit more between attempts to connect to aria2c rpc.
2018-11-12 16:28:21 +01:00
Matthieu Gautier
94a053e821 Wait a bit more between attempts to connect to aria2c rpc. 2018-11-12 16:14:21 +01:00
Matthieu Gautier
84e831eae9 Merge pull request #181 from kiwix/fix_aria2c_launch
Correctly run aria2c when packaged with kiwix-desktop in appimage.
2018-11-12 14:41:57 +01:00
Matthieu Gautier
4b9692bbd5 Correctly run aria2c when packaged with kiwix-desktop in appimage.
By default, we are searching in the PATH env var.
However, with an appImage, the executable directory is not in the PATH,
so we have to use an absolute path if we can.

If we cannot find the aria2c executable in the executable directory let's
try to use the system one.
2018-11-12 14:35:17 +01:00
Matthieu Gautier
be6f96adc0 Merge pull request #179 from kiwix/fix_library
Fix library
2018-11-12 12:21:54 +01:00
Matthieu Gautier
4b31842c4a Correctly convert filesize from Kbyte to byte.
`reader.getFileSize()` return the size of the zim in Kbyte in a
`unsigned int` (32 bits). This is ok as it would overflow if the size
of the size is greater than 4294967295 kbytes (so ~4Tbytes).

However, we need to convert the return size into a unsigned 64 bits integer
else, when converting to bytes, we will overflow at 4Gbytes.
Even in `m_size` is a uint64_t.
2018-11-12 12:16:05 +01:00
Matthieu Gautier
cf1cfe774e Correctly check for ArticleCount and MediaCount before writing them. 2018-11-12 10:58:10 +01:00
Matthieu Gautier
82b38b96e2 Merge pull request #178 from kiwix/fix_en_mapping
Fix en mapping
2018-11-12 10:57:18 +01:00
Matthieu Gautier
8c4b9fbe95 Add missing en->eng mapping to codeisomapping.
The most common used language was missing :/

Fix kiwix/kiwix-desktop#51
2018-11-12 10:36:45 +01:00
Matthieu Gautier
ab63cb2fb8 Sort codeisomapping alphabetically.
This is only code formating, no real change.
2018-11-12 10:34:19 +01:00
Matthieu Gautier
3958b2a06f Make the internal map codeisomapping static.
Symbole should not be visible outside of the compilation unit.
2018-11-12 10:33:35 +01:00
Matthieu Gautier
9fa7d78ba1 Merge pull request #176 from kiwix/win_relpath
Win relpath
2018-11-03 12:33:14 +01:00
Matthieu Gautier
57d3552b97 New version 3.0.2 2018-11-03 12:20:13 +01:00
Matthieu Gautier
d4ecda40ff Use the correct separator when computing relativePath. 2018-11-03 12:18:54 +01:00
Matthieu Gautier
802df71410 Merge pull request #175 from kiwix/fix
Fix
2018-11-02 17:32:22 +01:00
Matthieu Gautier
4d904c4d8b New version 3.0.1 2018-11-02 17:10:05 +01:00
Matthieu Gautier
9ab44e6a5f Get information about the total number of book of a search.
When we do a search and paging the result, we need to display to the
user the total number of book, not only the `itemsPerPage`.

So, we need to parse correctly the xml to keep information of the total
number of book.
2018-11-02 17:04:55 +01:00
Matthieu Gautier
5f4c04e79e Fix use of getAsI when parsing download rpc.
The value is store as a string in in the xml, so we cannot use getAsI.
We have to get the string and parse it to an int.
We cannot use strtoull because android stdc++ lib doesn't have it.

We have to implement our how parseFromString function using a
istringstream.
2018-11-02 17:03:03 +01:00
Matthieu Gautier
360c913230 Merge pull request #174 from kiwix/new_version
New version 3.0.0
2018-10-31 14:47:48 +01:00
Matthieu Gautier
a60ffe78d5 New version 3.0.0 2018-10-31 14:35:22 +01:00
Matthieu Gautier
b977b08683 Merge pull request #173 from kiwix/subprocess_windows
Subprocess windows
2018-10-31 14:04:21 +01:00
Matthieu Gautier
bb07ff5610 Do not add NULL at end of commandLine on Windows. 2018-10-31 13:56:42 +01:00
Matthieu Gautier
1787e30440 Better launch of the aria2 process.
By setting the ApplicationName to NULL, CreateProcessW will
search for the application in the path.
2018-10-31 13:56:42 +01:00
Matthieu Gautier
ccb3d8639d Use correct name for aria2c on windows. 2018-10-30 18:43:30 +01:00
Matthieu Gautier
5ed095531e Correctly set pkgconfig file for static curl linking. 2018-10-30 12:59:30 +01:00
Matthieu Gautier
29e554b47b Include pthread 2018-10-29 14:30:35 +01:00
Matthieu Gautier
68dc4d40b5 Include windows.h before synchapi.h 2018-10-29 12:20:00 +01:00
Matthieu Gautier
8dbc34e9ae Merge pull request #172 from kiwix/alpha2toalpha3
Alpha2toalpha3
2018-10-26 14:27:55 +02:00
Matthieu Gautier
2682fa8f9c Remove unecessary variable or output. 2018-10-26 14:19:10 +02:00
Matthieu Gautier
a22f962722 Correctly store the size of the book in the library.
`reader.getFileSize()` return ko.
2018-10-26 14:18:40 +02:00
Matthieu Gautier
a1876e3b27 Add a method converta2toa3 to convert language code alpha2 to alpha3.
Qt give use alpha2 language code but we use alpha3.
2018-10-26 14:18:06 +02:00
Matthieu Gautier
50b7e5664a Merge pull request #171 from kiwix/remoteContentManager
Remote content manager
2018-10-24 16:48:52 +02:00
Matthieu Gautier
ad654ead08 Do not force the download port to be 80.
We may want to use url with port != 80.
2018-10-24 11:56:38 +02:00
Matthieu Gautier
c6206edfb4 Do not always download the favicon of a book. Download as needed.
When parsing a opds feed, the favicon is a url, not a dataurl.
If we download the favicon all the times, it may take a lot of time to
parse the feed.

We store the url and download the favicon only when needed (when displayed)
2018-10-24 11:56:05 +02:00
Matthieu Gautier
c20ae18bff An opds feed can also be the openSearch result.
We must be able to set the correct entry in the feed for a searchResult.
2018-10-24 11:51:38 +02:00
Matthieu Gautier
b1508c0b98 Better listBooksIds supported mode.
Only have REMOTE or LOCAL is a bit restrictive. By using flags a user
can specify for complex request.
2018-10-24 11:50:11 +02:00
Matthieu Gautier
2d59e12a4d Merge pull request #170 from kiwix/content_manager
Content manager
2018-10-24 11:18:14 +02:00
Matthieu Gautier
1b44eb33f3 [TRAVIS] Last osx version of travis already have python3 installed. 2018-10-24 11:07:10 +02:00
Matthieu Gautier
34021994cd Fix for Android
- No std::to_string. We have to implement it with a ostringstream
- No pthread_cancel. So we use pthread_kill to send a signal to the thread.
2018-10-24 10:48:53 +02:00
Matthieu Gautier
910ce5f10d Fix for Windows
- "winsock2.h" needs to be included before "windows.h". But if a
  compilation unit include "windows.h" and after "networkTools.h", we
  fails and it is complicated to handle. The include must not be in the
  header but in the cpp
- windows define some ERROR macro. It is a pitty but we cannot use `ERROR`
  in our enum.
- If build statically using mingw we need to define `CURL_STATICLIB`
2018-10-24 10:47:12 +02:00
Matthieu Gautier
c66c7e9c20 Store the size of the book in OPDSFeed. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
ad69fdd8c0 Move the download method from the downloader to networkTools.
The download method is a simple method to download content.
It use curl to download the content instead of aria.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
a73ef23f6e Keep the book size in byte in memory (instead of in kb)
We keep the size in kb in library.xml for compatibility.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
fe6d5fa93e Store the downloadId in the book (and in the library). 2018-10-24 10:47:12 +02:00
Matthieu Gautier
43ff8565d1 Add a Download class to encapsulate a aria2 download. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
f718c4c472 Add a LibraryManipulator.
Library client (kiwix-desktop) need to know when a book is added to
library by the manager. By using a LibraryManipulator, we can do
dependency injection.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
8176a6eded Be more resilient to potential aria2 error. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
bb1f777078 Store the aria2 session and recover from it. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
829c34dd69 Store in the book instance if the given path is valid.
The path may exist and not be valid if the zim file is not truncated
(ie, interrupted download)
2018-10-24 10:47:12 +02:00
Matthieu Gautier
9c0f9696ed Better beautifyInteger and beautifyFileSize. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
be6dc01b4f Add few helper methods to xmlrpc objects. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
18fc5cb4df Correctly set the aria2 secret rpc. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
996829e4d7 Allow a OPDSDumper to dump only a subset of the library. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
5128861136 Remove default value for book pointer of readBookFromPath.
This is a nonsense to accept NULL pointer here.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
7804bf2276 Reimplement listBooksIds.
No real improvement.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
99e313f915 Clean includes of manager.h 2018-10-24 10:47:12 +02:00
Matthieu Gautier
839320d5e7 Move the Book class in its own source file. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
1e8f85eaff Rename methods title() into getTitle().
Same for all attributes.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
e0704b3b21 Move the initialization code of a book from xml|opds into Book. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
57fbb98bca Do not store the favicon base64 encoded in the book.
The fact that the favicon is base64 encoded in a storage detail.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
c7f9218350 base64_encode takes a string instead of a char* 2018-10-24 10:47:12 +02:00
Matthieu Gautier
66a9a69480 Move the code updating a book from a reader in the Book class. 2018-09-06 18:30:37 +02:00
Matthieu Gautier
04b05dd68b Remove removeBookById from the Manager.
Use the same method of the `Library`.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
aa6772b345 Remove the "last" book functionnality.
- This is not used by any application.
- This is application specific and should not be stored in the library
  (who is a list of book).
2018-09-06 18:30:37 +02:00
Matthieu Gautier
efae3e0d2f Do not make the Manager responsible to create the Library.
The `Manager` manage a library already existing.
This avoid the Library clone stuff.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
bba3c252e4 Make the member of the book protected.
It is up to the book to manage its attribute.

Also remove the `absolutePath` (and `indexAbsolutePath`). The `Book::path` is always stored
absolute.
The fact that the path can be stored absolute or relative in the
`library.xml` is not relevant for the book.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
57ac6f0305 Use a map to store the Library's books.
Having the books sorted is useless.
We handle books by id not by index.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
541fb0cfd1 Remove the "current" book functionnality.
- This is not used by any application.
- This is application specific and should not be stored in the library
  (who is a list of book).
2018-09-06 18:30:37 +02:00
Matthieu Gautier
c9eac04050 Make the Library`s book vector private.
Move a lot of methods from Manager to Library. Because books is private
and thoses methods are better in Library.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
741c67786a Add update method to Book. 2018-09-06 18:30:37 +02:00
Matthieu Gautier
db9000f706 Make the downloader use the aria2c wrapper instead of the aria2 library. 2018-09-06 18:30:34 +02:00
Matthieu Gautier
0a93cb0872 Add aria2 downloader using subprocess aria2c. 2018-09-06 18:29:49 +02:00
Matthieu Gautier
f4846c1ac8 Add a tool's function to get the data directory.
The data directory is where kiwix application should store data.
2018-08-29 15:28:52 +02:00
Matthieu Gautier
9b516ac35d Add a small wrapper around pugixml do handle xmprpc. 2018-08-29 15:28:52 +02:00
Matthieu Gautier
79b780b75b Move the function to convert from xml_node to string in otherTools.
This can be usefull elsewhere than in opds_dumper
2018-08-29 15:28:52 +02:00
Matthieu Gautier
f3dd83907d Add backend to launch subprocess.
The windows backend is not tested.
2018-08-29 15:28:52 +02:00
Matthieu Gautier
c351e7ccf1 Merge pull request #168 from kiwix/java_jdk8+
Update jni build script to java jdk 8+.
2018-08-20 17:53:39 +02:00
Matthieu Gautier
7c634738dd Update jni build script to java jdk 8+.
With jdk8, `javac` has an option `-h` to generate the header files of
native classes.
So there is no need to run `javah` several times.

As there is now only one command to run (`javac`), there is no need for
the wrapper script `gen_kiwix.sh`.

Fix #167
2018-08-20 12:17:29 +02:00
Matthieu Gautier
4378c52c27 Merge pull request #165 from kiwix/new_version
New version 2.0.2
2018-08-03 21:39:11 +02:00
Matthieu Gautier
790fa99143 New version 2.0.2 2018-08-03 19:37:21 +02:00
Matthieu Gautier
db6717e199 Merge pull request #164 from kiwix/windows
Windows
2018-08-03 19:31:31 +02:00
Matthieu Gautier
bf2188af14 [Windows] Add extra link arguments to build test on windows. 2018-08-03 19:22:39 +02:00
Matthieu Gautier
fd9b6569af Include unistd.h only on unix platform. 2018-08-03 19:22:39 +02:00
Matthieu Gautier
3cf58b5f5b Make libaria2 an optional dependency.
We don't compile libaria2 on Windows.
2018-08-03 19:22:39 +02:00
Matthieu Gautier
182be5d124 Merge pull request #163 from kiwix/android_better_log
[Android] Better error message when failing to read in zim file.
2018-07-31 13:48:11 +02:00
Matthieu Gautier
dbcc9140b9 [Android] Better error message when failing to read in zim file.
Let's print the exception's message to allow us to better understand
what went wrong.
2018-07-31 11:49:44 +02:00
Matthieu Gautier
d46aff00d1 Merge pull request #159 from kiwix/mhutti1/kiwixlib-fixes
Various kiwixlib fixes
2018-07-27 10:47:21 +02:00
mhutti1
d61580f599 Set JNI values to NULL on error 2018-07-27 10:11:24 +02:00
mhutti1
3227b29c90 Follow redirects in favicons 2018-07-27 10:11:24 +02:00
Matthieu Gautier
4cb55e1eef Merge pull request #161 from kiwix/new_dep_arrchive_root
New deps archives now contains the BUILD_${PLATFORM} directory.
2018-07-27 10:10:42 +02:00
Matthieu Gautier
9ec3358119 New deps archives now contains the BUILD_${PLATFORM} directory. 2018-07-27 09:36:35 +02:00
Kelson
cf21f1793c Merge pull request #158 from kiwix/compilation_fix
Use -llog only for Android
2018-07-07 22:08:11 +02:00
Kelson
c0d5e091d3 Small update of the README 2018-07-07 21:15:48 +02:00
Kelson
620f1b5e13 Use -llog only for Android 2018-07-07 20:19:34 +02:00
Kelson
1e8e897f4a Merge pull request #155 from kiwix/mhutti1/jni-corrupt-zim
JNI better log & stop crashing if exception thrown at ZIM file opening
2018-06-29 22:13:16 +02:00
Isaac Hutt
76ca4b0cee Add -llog 2018-06-29 16:57:32 +02:00
mhutti1
709baae934 Convert all JNI cerrs to Android log messages 2018-06-29 16:35:11 +02:00
mhutti1
ea8cd9f1a9 Correctly pass 0 through JNI if ZIM file is corrupted 2018-06-29 15:30:13 +02:00
Kelson
452e7f8883 Merge pull request #152 from kiwix/updated_readme
Update README
2018-06-25 06:14:36 +02:00
Emmanuel Engelhart
a66b178633 Update README 2018-06-24 22:16:16 +02:00
Matthieu Gautier
0c26b08dce Merge pull request #149 from kiwix/version_2.0.1
New version 2.0.1
2018-06-15 18:22:04 +02:00
Matthieu Gautier
2a03147662 New version 2.0.1 2018-06-15 18:07:51 +02:00
Matthieu Gautier
1164cf7444 Merge pull request #150 from kiwix/gcc4.8
[TRAVIS] Compile using the default compiler version 4.8.
2018-06-15 18:07:22 +02:00
Matthieu Gautier
6ef2d5ff4b [TRAVIS] Compile using the default compiler version 4.8. 2018-06-15 08:47:28 +02:00
Matthieu Gautier
3a00c4d671 Merge pull request #147 from kiwix/icu_namespace
Icu namespace
2018-06-11 15:21:30 +02:00
Matthieu Gautier
5025ee4963 Fix icudt version.
We have move to icu version 58 a while ago.
2018-06-11 14:38:32 +02:00
Matthieu Gautier
9aaf82a36d Explicitly use icu namespace.
Fix #145
2018-06-11 14:36:34 +02:00
Kelson
2e38aa796f Merge pull request #141 from kiwix/mhutti1/url-decoding
Decode reserved characters in URLs
2018-06-01 11:28:37 +02:00
mhutti1
fa99cce68d Decode reserved characters in URLs 2018-05-20 18:41:39 +01:00
Matthieu Gautier
fc6a0bcea2 Merge pull request #140 from kiwix/no_stopwords_resources
Remove unused static resources.
2018-05-15 11:37:35 +02:00
Matthieu Gautier
622d2fc23d Remove unused static resources.
Stop words are not use anymore since a long time now that indexing has
been moved to libzim. No need to embedded them in kiwix-lib.
2018-05-15 11:30:30 +02:00
Matthieu Gautier
48933a3b3e Merge pull request #139 from kiwix/fix_parseUrl
Fix parse url
2018-05-14 18:33:59 +02:00
Matthieu Gautier
c0b1c6013e Fix parsing of url
Fix kiwix/kiwix-tools#193
2018-05-14 17:41:05 +02:00
Matthieu Gautier
433a47c3fe Add unittest structure.
No tests, just everything to add tests later.
2018-05-14 17:40:43 +02:00
Matthieu Gautier
e9ab074b5d Merge pull request #136 from kiwix/2.0.0
2.0.0
2018-04-23 20:15:45 +02:00
Matthieu Gautier
45a000edaa New version 2.0.0 2018-04-23 18:06:49 +02:00
Matthieu Gautier
e216c44034 kiwix-lib needs libzim>=3.3.0 2018-04-23 18:06:49 +02:00
Matthieu Gautier
59661626e9 Merge pull request #135 from kiwix/update_README
Add dependency `libaria2` in the README.
2018-04-23 18:06:24 +02:00
Matthieu Gautier
6b0d2788aa Add dependency libaria2 in the README. 2018-04-23 17:40:21 +02:00
Matthieu Gautier
1b49c632b3 Merge pull request #123 from kiwix/new_api
New api
2018-04-23 17:07:45 +02:00
Chris Li
68665693c5 fixed some typos in the docs string 2018-04-19 18:04:07 +02:00
Matthieu Gautier
1dd828e79c Fix pathExists and check for correct path for xapian index.
The correct path for xapian database should be "X/fulltext/xapian",
not "Z//fulltextIndex/xapian".

So lets check for the right path and fallback to the wrong one (but
used in old zims).

The double '/' in the path is a bug of zimwriterfs and is specific
to the xapian database.
We must handle this correctly in `hasFulltextIndex` and not (buggly) in
`pathExists`.
(Hopefully, it seems that pathExists were used only by hasFulltextIndex)
2018-04-19 18:04:07 +02:00
Matthieu Gautier
135028c16a Introduce better API to manipulate entries in a zim file.
The previous API suffer different problems:
- It was difficult to handle articles redirecting to other article.
- It was not possible to get few information (title) without getting
  the whole content.

The new API introduce the new class `Entry` that act as a proxy to an
article in the zim file.

Methods of `Reader` now return an `Entry` and the user has to call
`Entry`'s methods to get useful information.
No redirection is made explicitly.
If an entry is not found, an exception is raised instead of returning
an invalid `Entry`.

The common pattern to get the content of an entry become :

```
std::string content;
try {
  auto entry = reader.getEntryFromPath(path);
  entry = entry.getFinalEntry();
  content = entry.getContent();
} catch (NoEntry& e) {
  ...
}
```

Older methods are keep (with the same behavior) but are marked as
deprecated.
2018-04-19 18:04:07 +02:00
Matthieu Gautier
1f3fcd85a0 Allow us to declare method to be deprecated. 2018-04-19 18:04:07 +02:00
Matthieu Gautier
6e13d44459 Merge pull request #129 from kiwix/opds
Opds
2018-04-19 18:02:59 +02:00
Matthieu Gautier
47ce044e3e Add method to Manager to populate the library from a opds stream.
The library's books are created in the metadata in the opds.
As the opds stream is by definition a distant "library", there is no
zim to read to complete missing information.

This can lead to incomplete `library.xml`.
2018-04-19 17:53:08 +02:00
Matthieu Gautier
1f091da3f4 Add a downloader tools to download files.
The downloader is using libaria2.

For now, only one download can be run a the time.
A download will start only if (and as soon as) no download is running.
2018-04-19 17:53:08 +02:00
Matthieu Gautier
d4fefd1a57 Add a function to create a temporary directory. 2018-04-19 17:53:05 +02:00
Matthieu Gautier
9f86b59d1d Add a function to get the content of a file. 2018-04-19 17:53:02 +02:00
Matthieu Gautier
2164faba44 Add a potential search description link in the opds stream. 2018-04-19 17:08:01 +02:00
Matthieu Gautier
b48428e443 Be able to create a OPDSDumper without library and associate it later. 2018-04-19 17:08:01 +02:00
Matthieu Gautier
ad92af928b Be able to filter a library.
This generate a new library only with the corresponding books.
2018-04-19 17:08:01 +02:00
Matthieu Gautier
ee51c470b4 Allow the manager to dump the opds feed of the whole library. 2018-04-19 17:08:01 +02:00
Matthieu Gautier
5398d69231 Merge pull request #134 from kiwix/macos
Build kiwix-lib on macos.
2018-04-19 15:37:22 +02:00
Matthieu Gautier
c0bc2ed111 Build kiwix-lib on macos.
Also try to speed up a bit the build by :
- installing packages using the travis apt plugin and do not use sudo
- Use prebuild ninja binary.
2018-04-19 15:29:48 +02:00
Matthieu Gautier
10893ae19f Merge pull request #125 from kiwix/no_warning
Try to compile kiwix-lib without warning.
2018-04-18 17:05:50 +02:00
Matthieu Gautier
ec097ab267 Try to compile kiwix-lib without warning. 2018-04-18 16:57:27 +02:00
Matthieu Gautier
32ad40a5b0 Merge pull request #133 from kiwix/rpath
Set the RPATH of kiwix-lib.
2018-04-17 17:09:43 +02:00
Matthieu Gautier
d686de7ec3 Set the RPATH of kiwix-lib.
As we cannot change (DY)LD_LIBRARY_PATH on macos, we have to use rpath.
2018-04-17 16:27:31 +02:00
Matthieu Gautier
8d6f1196de Merge pull request #132 from kiwix/ctpp2_lib_dir
Find ctpp2 lib in the normal lib dir and fallback to 'lib'.
2018-04-17 15:35:58 +02:00
Matthieu Gautier
a216ad5a6f Find ctpp2 lib in the normal lib dir and fallback to 'lib'.
ctpp2 libs should be in the "normal" lib dir, so search in it.
The 'lib' dir should only be used as a fallback.
2018-04-17 14:37:19 +02:00
Matthieu Gautier
3849f0ae8b Merge pull request #128 from kiwix/fix_version
New version 1.1.1
2018-03-29 17:49:10 +02:00
Matthieu Gautier
f2413f6680 New version 1.1.1 2018-03-27 17:22:38 +02:00
Matthieu Gautier
8ae388562e Merge pull request #127 from kiwix/new_version
New version 1.1.0.
2018-03-27 12:01:40 +02:00
Matthieu Gautier
a55824acc7 New version 1.1.0. 2018-03-27 11:05:02 +02:00
Matthieu Gautier
58395d266c Merge pull request #126 from kiwix/infinite_loop
Correctly pre-increment loopCounter.
2018-03-26 10:03:56 +02:00
Matthieu Gautier
313f6731b0 Correctly pre-increment loopCounter.
If we check the later the `loopCounter` with 42, we must pre-increment the
content. Else, in case of infinite loop, the `loopCounter` will be 43.

Related to kiwix/kiwix-tools#168
2018-03-25 17:21:40 +02:00
Matthieu Gautier
e23949a9fa Merge pull request #121 from kiwix/check_internal_search
Check `internal->_search` before using it.
2018-03-12 18:53:56 +01:00
Matthieu Gautier
ee6831d665 Check internal->_search before using it.
If a search has been set and a user try to get the nextResult or
restart the search, `internal->_search` will be NULL.
2018-03-12 17:45:18 +01:00
Matthieu Gautier
14653c6958 Merge pull request #120 from kiwix/doc
Doc
2018-03-12 17:43:47 +01:00
Matthieu Gautier
f8a2e4c503 Only add a reader to the searcher if the reader as fulltext index.
`libzim` will not search in zim file without embedded fulltext index.
If we don't want to mess up with result index, we must not store "wrong"
reader.

Fix #111
2018-03-12 17:34:45 +01:00
Matthieu Gautier
57a197d38d Make getCurrentBookId const. 2018-03-12 17:34:45 +01:00
Matthieu Gautier
cc38d0e5e4 Make searcher's method reset private. 2018-03-12 17:34:45 +01:00
Matthieu Gautier
b6ba10af2a Remove unnecessary currentArticleOffset.
This protected member is never used.
2018-03-12 17:34:45 +01:00
Matthieu Gautier
f93f50087b Remove unnecessary setBookIndex.
We can use default argument instead of creating a new method.
2018-03-12 17:34:45 +01:00
Matthieu Gautier
63339793d2 Add some documentation to kiwix-lib API
Fix #116
2018-03-12 17:34:45 +01:00
Kelson
5ee5929714 Merge pull request #119 from RohanBh/fix-meson-installation
Fix meson installation error by using pip3
2018-03-10 09:13:35 +01:00
RohanBh
683b5249a2 Fix meson installation error by using pip3 2018-03-10 03:10:09 +05:30
Matthieu Gautier
698578ee73 Merge pull request #113 from kiwix/JNI_Reader_exception
Make JNIKiwixReader throw an exception if something goes wrong at creation.
2018-02-01 18:03:05 +01:00
Matthieu Gautier
6adf95c329 Make JNIKiwixReader throw an exception if something goes wrong at creation.
If the `nativeHandle` is null, the JNIKiwixReader is invalid and we must
not use it.

Throwing an exception for the caller code to handle this properly.
And previously, user code has no way to detect something went wrong :/
2018-02-01 17:18:54 +01:00
Matthieu Gautier
9fc840b377 Merge pull request #104 from kiwix/mhutti1/search-snippet
Allow JNI to access search snippets
2017-12-18 14:27:57 +01:00
mhutti1
97bcf57d53 Allow JNI to access search snippets 2017-12-15 16:02:49 +00:00
Kelson
3c614ae47f Merge pull request #103 from kiwix/mhutti1/videofix
Fix JNI to work with kiwix-android
2017-12-14 20:07:02 +01:00
mhutti1
f303c7502d Fix JNI to work with kiwix-android 2017-12-14 17:32:03 +00:00
Matthieu Gautier
0c8c19a6fb Merge pull request #102 from kiwix/direct_access
Direct access
2017-12-13 16:31:51 +00:00
Matthieu Gautier
16bd34e6a6 Add a method in the JNI API to get direct access information.
For binary content (not compressed), it could be interesting to
directly read the content in the zim file instead of using `kiwix-lib`.

This method returns the needed information to do so (if possible).
2017-12-13 17:22:26 +01:00
Matthieu Gautier
5a953f191b Remove a small warning. 2017-12-13 17:11:10 +01:00
Matthieu Gautier
c947cceac8 Merge pull request #101 from kiwix/compilation-fixes
Force usage of meson 0.43.0.
2017-12-13 16:10:22 +00:00
Matthieu Gautier
35859a3689 Force usage of meson 0.43.0.
Static compilation is broken with meson 0.44.0
2017-12-13 16:48:12 +01:00
Matthieu Gautier
9b3da52f00 Merge pull request #100 from kiwix/gcc5
Compile using gcc-5 on native ubuntu.
2017-12-04 11:17:41 +00:00
Matthieu Gautier
dee482b2dc Compile using gcc-5 on native ubuntu.
As dependencies prepared by kiwix-build are build using gcc-5
(kiwix/kiwix-build@7fc557d),
we need to also compile libzim using gcc-5.
2017-12-04 11:06:44 +00:00
Matthieu Gautier
281b136ea8 Merge pull request #99 from kiwix/better_search_result_html
Better search result html
2017-11-27 12:46:17 +00:00
Matthieu Gautier
41c92cfc3c Better calculate the start of the last search page.
The increment between pages should always be a multiple of
`resultCountPerPage`.
2017-11-27 12:39:04 +00:00
Matthieu Gautier
64dc5131c0 Be able to specify the global contentHumanReadableId without a index.
Even if we use the add_reader method to search into embedded full text
index, we need to specify the global `contentHumanReadableId` as it will
be used to generate "page links".
2017-11-27 12:37:13 +00:00
Kelson
189c972d17 Merge pull request #97 from kiwix/better_url_encoding
Better URL encoding
2017-11-26 16:01:41 +01:00
kelson42
28b0588df4 Better URL encoding 2017-11-23 19:26:41 +01:00
Matthieu Gautier
2357af8f58 Merge pull request #98 from kiwix/jni_byte_range
Add a API to get only a part of a article content.
2017-11-23 12:40:50 +01:00
Matthieu Gautier
4e5d9f0360 Add a API to get only a part of a article content.
Add the jni method `getContentPart` to get only a part of the artcicle
content.

The method can be used to get a part of the content or to know the size
of the full content.
2017-11-22 19:06:54 +00:00
Matthieu Gautier
2125cd65fa Merge pull request #78 from kiwix/multisearch_jni
Multisearch jni
2017-11-22 17:13:45 +01:00
mhutti1
520c1edf31 Fix JNI android integration 2017-11-22 14:54:03 +00:00
mhutti1
d2f7503cfa Fix JNI for android integration 2017-11-22 14:54:03 +00:00
Matthieu Gautier
7a59779b77 Change JNI API of kiwix-lib (mainly to support multi-zims search)
This is a major API break. User code will have to be rewritten.

Before this commit, API was a unique object wrapping the library and
handle a global state with one `Reader` and one `Writer` at the time.

Now, the API is axed around three main objects :
 - The `JNIKiwixReader`, a wrapper around a `kiwix::Reader` (who allow to
   read one zim)
 - The `JNIKiwixSearcher`, a wrapper around a `kiwix::Searcher` (who allow
   to search through one or more reader(s))
 - The `JNIKiwixSearcher.Result` a result of a search. Allowing to get all
   information about a result (title, url, content, snippet, ...)
2017-11-22 14:54:03 +00:00
Matthieu Gautier
766b64dddc Update gen_kiwix.sh to not be dependent of the number of arguments. 2017-11-22 14:46:01 +00:00
Matthieu Gautier
e2f16f6030 Merge pull request #95 from kiwix/geo_loc
Add small API to do geo query.
2017-11-20 16:33:10 +01:00
Matthieu Gautier
b9ac7084ac Add small API to do geo query.
This is a small quick and dirty API to do geo query.

It is not possible with this API to do a query search and a geo search.
It's either one or the other.

We should think about a better global API to do searching and provide
both of them in the same time (libzim does it).
2017-11-14 17:32:06 +01:00
Matthieu Gautier
0bd2a15651 Merge pull request #94 from kiwix/bigger_search
Bigger search
2017-11-06 12:30:09 +01:00
Matthieu Gautier
0e8c8f68c5 Extend search limits to 140.
70 is a too small limit for the number of results.
Users need at least 100.

As the html rendering will fails with more than 144 results,
explicitly limits the number of search to 140.

Fixes kiwix/kiwix-tools#92
2017-11-06 12:23:13 +01:00
Matthieu Gautier
382655d83c Explicitly set ctpp2 iIMaxSteps to extends search beyond 68 results.
Ctpp2 templates have a limit step number. If the template to render is
too big, the rendering fails, throwing an exception.

From our tests, it seems that, with the template we have, the default
step limit allow us to render 68 results only.

By doubling the limit, we can render up to 144 results.
2017-11-06 12:10:48 +01:00
Matthieu Gautier
f0bcb1960b Merge pull request #93 from kiwix/pkg_config_version
Fix version in pkg_config.
2017-10-23 18:14:56 +02:00
Matthieu Gautier
d4f0344d9d Fix version in pkg_config. 2017-10-23 15:20:56 +02:00
Matthieu Gautier
48078c809b Merge pull request #92 from kiwix/new_version
New release 1.0.0
2017-10-23 10:11:38 +02:00
Matthieu Gautier
3134ab6b56 New release 1.0.0 2017-10-20 15:19:59 +02:00
Matthieu Gautier
41e3707f1b Merge pull request #90 from kiwix/fix_resource_script
[resource_compiler] Make the exception public.
2017-10-10 14:17:44 +02:00
Matthieu Gautier
d801ff36f6 [resource_compiler] Make the exception public.
This is useless to raise an exception if the exception in not published
in the header.
2017-10-10 13:55:43 +02:00
Matthieu Gautier
5623fedfd0 Merge pull request #91 from kiwix/static_deps
Build with static argument when building for android.
2017-10-10 13:55:20 +02:00
Matthieu Gautier
25a05cc64a Build with static dependencies when building for android or static. 2017-10-10 10:48:48 +02:00
Matthieu Gautier
192a249d23 Merge pull request #88 from kiwix/legoktm-patch2
Rename compile_resources.py to less generic name
2017-09-26 18:02:42 +02:00
Kunal Mehta
5c118a87a1 Rename compile_resources.py to less generic name 2017-09-26 17:56:55 +02:00
Matthieu Gautier
ba35f097d9 Merge pull request #89 from kiwix/use_sudo
Use sudo to install pip3 packages.
2017-09-26 17:56:27 +02:00
Matthieu Gautier
093e8c0498 Use sudo to install pip3 packages. 2017-09-26 17:50:05 +02:00
Matthieu Gautier
8b90221866 Merge pull request #80 from kiwix/no_search_on_splitted
Claims that multi part zim has no embedded full text index.
2017-08-15 14:33:35 -04:00
Matthieu Gautier
5c2280e7c7 Claims that multi part zim has no embedded full text index.
We cannot search into an embedded fulltext index if the zim is multipart.
Instead of crashing, let's pretend we have no fulltext index.
2017-08-15 14:19:53 -04:00
Matthieu Gautier
ebd3f622ff Merge pull request #81 from kiwix/no_ctpp2
Allow kiwix-lib to compile without ctpp2c.
2017-08-14 11:19:43 -04:00
Chris Li
cf93c8719f Allow kiwix-lib to compile without ctpp2c.
ctpp2c is used to pre-compile the template resource.
However, on OSX, ctpp2c seems to be difficult to compile, as we don't need
ctpp2 at all on OSX/iOS, lets just stop to force the use of ctpp2c.
2017-08-14 10:42:16 -04:00
Matthieu Gautier
a794849993 Merge pull request #79 from kiwix/fix_get_html
Always set the humanReadableName with the readable in kiwix-search.
2017-08-10 09:45:35 -04:00
Matthieu Gautier
1ff1bf6168 Always set the humanReadableName with the readable in kiwix-search.
We always need a humanReadableName associated with a content to search in.
Do not separate the two values (human readable name and zim) in two
different functions.

This way, we avoid miss-use of the Searcher who could lead to segfault.
2017-08-10 09:20:11 -04:00
Matthieu Gautier
b6e51055a3 Merge pull request #75 from kiwix/kelson42-patch-licensing
Fix license header #73
2017-08-07 14:00:56 +01:00
Kelson
d17e94fd9c Fix license header #73 2017-08-07 11:49:48 +02:00
Kelson
44a282fa4c Merge pull request #74 from kiwix/get_content_title
getContent* methods also allow to get the title.
2017-08-02 21:02:27 +02:00
Matthieu Gautier
d3acae1fd2 getContent* methods also allow to get the title.
Add a `title` write argument to `getContent*` methods.
This argument is filled with the title of the content get.

Also update the JNI accordingly.

Related to kiwix/kiwix-android#214
2017-07-26 11:03:32 +02:00
Kelson
cbb1018a02 Merge pull request #72 from kiwix/jni_licensing_cleaning
Jni licensing cleaning
2017-07-22 21:02:12 +02:00
kelson42
1d1dfbf4da Fix JNI licensing #71 2017-07-22 09:29:28 +02:00
kelson42
b163351b2e Merge branch 'master' of https://github.com/kiwix/kiwix-lib 2017-07-19 22:04:45 +02:00
Kelson
e531c353a6 Merge pull request #69 from kiwix/new_authors
New authors
2017-07-19 22:04:22 +02:00
kelson42
c363933bf4 Create AUTHORS file #48 2017-07-19 21:56:20 +02:00
kelson42
5d46f28926 Create AUTHORS file #48 2017-07-19 21:55:05 +02:00
Kelson
9fa2cfc66b Merge pull request #68 from kiwix/workding_fix
Fix wording problem
2017-07-19 21:49:04 +02:00
kelson42
b6a58d1684 Fix wording problem 2017-07-19 21:39:38 +02:00
kelson42
e3780a2d77 Fix workding problem 2017-07-19 21:38:28 +02:00
Matthieu Gautier
473b62c9b8 Merge pull request #66 from kiwix/multisearch
Multisearch
2017-07-18 16:07:46 +02:00
Matthieu Gautier
bc5f4f5de4 Use right contentId to generate the article url in search template.
As we do multisearch, we must use the associated contentID of the result
to generate the url.
2017-07-18 10:04:40 +02:00
Matthieu Gautier
9cc329dbd2 Support multi-zims search in kiwix-lib.
All the code was already in zimlib.
It is mainly a update of the code using zimlib.

No JNI change for now to not break the API.
2017-07-18 10:04:40 +02:00
Matthieu Gautier
3991e648ed Be able to get the reader index from a search result. 2017-07-17 18:16:11 +02:00
Matthieu Gautier
8d39b0b343 Search result objects now have a get_content method.
This was not necessary when searching in only one zim file as `url` was
enough to get the article (and so the content).

If we want to search in several zim in the same time, we need a way to get
the content directly.
2017-07-17 18:16:11 +02:00
Matthieu Gautier
4a51dd9e00 Fix memory link.
If a `searcher` is already created we must delete it.
If we set the pointer to NULL before, we will never delete it.
2017-07-17 18:16:11 +02:00
Matthieu Gautier
c56e1f0446 Merge pull request #62 from kiwix/suggestion
Suggestions now use xapian database when available.
2017-07-17 17:57:36 +02:00
Matthieu Gautier
d0371cd133 Suggestions now use xapian database when available.
If a embedded fulltext database is present, suggestion will search in it :
 - insensitive case search.
 - search for terms in the middle of the title.
 - xapian will try to complete the last word of the query (as if a '*'
   were added at the end)
2017-07-17 17:17:13 +02:00
Matthieu Gautier
57720ca57b Merge pull request #65 from kiwix/generate_ctpp2_template
Do not crash if no source_dir is given.
2017-07-17 09:59:06 +02:00
Matthieu Gautier
c5b291e1ed Do not crash if no source_dir is given. 2017-07-12 18:35:38 +02:00
Matthieu Gautier
baf254f1aa Merge pull request #64 from kiwix/generate_ctpp2_template
Use ctpp2c to generate template from source instead of use generated one.
2017-07-12 15:50:24 +02:00
Matthieu Gautier
64cc69f6ae Use ctpp2c to generate template from source instead of use generated one.
Fixes #50.
2017-07-12 15:45:44 +02:00
Matthieu Gautier
6da3604df6 Merge pull request #63 from kiwix/remove_unused_tree_h
Removed unused tree.h
2017-07-12 10:19:05 +02:00
Emmanuel Engelhart
89afabc4cd Removed unused tree.h 2017-07-11 20:15:11 +02:00
Matthieu Gautier
80f6d0bf46 Merge pull request #61 from kiwix/code_format
Format all the code using clang-format.
2017-07-11 17:24:19 +02:00
Matthieu Gautier
f76e9d2dbf Format all the code using clang-format.
Add a script `format_code.sh` to easily format the code.
2017-07-05 15:22:34 +02:00
Matthieu Gautier
a205ff00c8 Merge pull request #59 from kiwix/v0.2
Dump the version to 0.2.0
2017-06-27 14:41:27 +02:00
Matthieu Gautier
96f199a327 Dump the version to 0.2.0
Time to make a release.
2017-06-27 14:26:13 +02:00
Matthieu Gautier
0be3aa9d38 Merge pull request #56 from swills/src_reader.cpp_build_fix
Fix type error in build
2017-06-16 15:30:11 +02:00
Steve Wills
4f57e765e5 Fix type error in build
Compilation fails on clang 3.4.1 (and presumably later, tho I haven't tested) with

```
src/reader.cpp:131:59: error: no viable conversion from 'iterator' (aka '__map_iterator<typename __base::iterator>') to 'std::map<std::string, unsigned int>::const_iterator' (aka '__map_const_iterator<typename __base::const_iterator>')
      std::map<std::string, unsigned int>::const_iterator it = counterMap.find("text/html");
                                                          ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/v1/map:713:29: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'iterator' (aka '__map_iterator<typename __base::iterator>') to 'const std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<std::__1::basic_string<char>, unsigned int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char>, unsigned int>, void *> *, long> > &' for 1st argument
class _LIBCPP_TYPE_VIS_ONLY __map_const_iterator
                            ^
```

because we are not using the right type for the map iterator. As we are using
C++11, let's use `auto` and make compiler set the right type for us.
2017-06-16 08:47:59 -04:00
Matthieu Gautier
2bcd43af98 Merge pull request #55 from kiwix/fix_android_dirname
Use the cpu name instead of cpu_family as install dir name.
2017-06-14 10:26:28 +02:00
Matthieu Gautier
eb2c750431 Use the android abi instead of cpu_family as install dir name.
Android will look in specific repository to find native libs.
We need to use the `android_abi` name defined in the cross_compilation
file instead of `cpu_family`.
2017-06-14 10:20:18 +02:00
Matthieu Gautier
7132775d67 Merge pull request #54 from kiwix/searchinxapian
Re-add xapian searcher in kiwix-lib.
2017-05-24 18:28:39 +02:00
Matthieu Gautier
c44b2acb56 Re-add xapian searcher in kiwix-lib.
libzim only know how to read embedded full text index in a zim file.
This is nice as we want to embedded the full text index in zim file and
not have separated full text index.

However, we still have some zim+separated index we have to read.
So we have to support the search in separated index for a while.
2017-05-24 16:08:00 +02:00
Matthieu Gautier
0343c23f82 Merge pull request #53 from kiwix/zim_no_countermeta
Check that 'M/Counter' exists before trying to read it.
2017-05-23 17:37:07 +02:00
Matthieu Gautier
7005b65901 Check that 'M/Counter' exists before trying to read it.
Some old zim files may not have a 'Counter` metadata article.
We have to handle this correctly.
2017-05-23 15:26:01 +02:00
Kelson
d360b9143c Merge pull request #51 from kiwix/declare_ctpp2_include_path
Meson 'ctpp2_include_path' has to be declared
2017-05-19 09:10:31 +02:00
Kelson
9963c73150 Meson 'ctpp2_include_path' has to be declared 2017-05-16 16:23:50 +02:00
Matthieu Gautier
41d6f9884c Merge pull request #47 from kiwix/no_ssh_key
Get dependencies from http server, not from ssh.
2017-04-24 17:17:44 +02:00
Matthieu Gautier
8823880348 Get dependencies from http server, not from ssh.
`kiwix-build` now publish intermediate dependencies archives in a
http accessible location.

Let's use this location instead of `scp` the archives.
2017-04-24 17:10:41 +02:00
Matthieu Gautier
ac169558c4 Merge pull request #46 from kiwix/update_android
Update kiwix-lib to new kiwix-android way of building.
2017-04-24 17:01:39 +02:00
Matthieu Gautier
2e43b7e82d Update kiwix-lib to new kiwix-android way of building.
`kiwix-android` is using `kiwix-lib` as an external java application now.
So we need `kiwix-lib` build system to also install application files
(manifest, resources, ..).
2017-04-24 16:37:31 +02:00
Matthieu Gautier
4485cc8d0f Merge pull request #42 from kiwix/search_in_libzim
Search in libzim
2017-04-11 13:26:23 +02:00
Matthieu Gautier
3be4d92c53 Correctly check if we are compiling for linux or not.
In C++11 `linux` is not a reserved word, so compilators do not define it.
A correct way to check if we are compiling for linux is to check for
`__linux__`.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
44a77f5846 Update android jni wrapper to new API. 2017-04-10 14:28:25 +02:00
Matthieu Gautier
9abdc6ce02 Move to c++11.
Zimlib move to c++11 and so, we need a c++11 compiler.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
5ca419bee7 Use the new search API in zimlib.
We do not use xapian anymore. This is all handled by zimlib.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
37f29da63e Beautify a bit the code.
No real change. Just do less code or use higher level API.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
94670847ef Use const when possible in the reader.
Most read operation do not modify the content. So let's use const
as far as possible.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
93b53cc6d0 Merge pull request #43 from kiwix/travisci
Add travis
2017-04-10 14:27:36 +02:00
Matthieu Gautier
cf273a06b4 Add TravisCI.
Now the project is build on every PR using TravisCI.

The project dependencies are get from the archive generated by kiwix-build.
2017-04-10 14:05:58 +02:00
Julian Harty
43e9763091 Merge pull request #41 from kiwix/less_header
Move unicode headers in cpp.
2017-04-06 16:18:02 +02:00
Matthieu Gautier
ef661a2e25 Move unicode headers in cpp.
Unicode headers ends by defining the DONE symbol in a enum.
It can clash with other includes.
(For instance the httpd.h from apache who use `#define DONE -2`).

Both project should not declare such common symbols publicly but we have
to do with them anyway.
2017-04-06 16:17:00 +02:00
Matthieu Gautier
7baa1b9e62 Merge pull request #40 from kiwix/no_indexer
Remove the indexer functionnality from kiwix-lib.
2017-04-06 15:38:43 +02:00
Matthieu Gautier
e28dbe7c7e Remove the indexer functionnality from kiwix-lib.
This is not used anymore.
2017-04-06 15:35:30 +02:00
Matthieu Gautier
2906202056 Merge pull request #39 from kiwix/fix_indexer
Do not use remove readStopWords method.
2017-04-06 13:24:53 +02:00
Matthieu Gautier
ce6c782b66 Do not use remove readStopWords method.
Commit b8d950c removes this symbol.
The indexer is not used anymore and will be soon removed.
So for now, just remove the call to readStopWords until we totally
remove the indexer code.
2017-04-06 13:20:59 +02:00
Matthieu Gautier
9771506985 Merge pull request #35 from kiwix/stem_stop
Let's use stem and stop words information (if) present in the database.
2017-04-04 17:07:48 +02:00
Matthieu Gautier
b8d950c1a0 Use the stop words stored in the database to configure the queryparser.
To properly search in the xapian database, we need to use the same
stop words that the ones used during the indexing.
2017-04-04 17:06:49 +02:00
Matthieu Gautier
998db0eb2b Use the language stored in the database to configure the queryparser.
To properly search in the xapian database, we need a stemmer using the
same language that the one used during the indexing.
2017-04-04 17:06:49 +02:00
Kelson
46fab22a73 Merge pull request #37 from kiwix/fix_android
The `Result` class is not in the `kiwix` namespace. (fix android build)
2017-03-30 07:44:44 +02:00
Matthieu Gautier
72e41082ca The Result class is not in the kiwix namespace.
The commit 83d2725 adapt the jni wrapper to the new search API but try to
use the `Result` class from the `kiwix` namespace but `Result` is not in
the namespace.

A correct fix would be to move `Result` in `kiwix` but it also change the
API for other tools (kiwix-tools). As we will move the search
functionality in `zimlib` it is better to just do this silly fix and
update the API latter when moving the search functionality.
2017-03-29 17:12:23 +02:00
Kelson
c06a041100 Merge pull request #36 from kiwix/no_cpp11
Remove C++11 syntax introduced by commit 9be2abe.
2017-03-28 19:48:30 +02:00
Matthieu Gautier
cecb65e314 Remove C++11 syntax introduced by commit 9be2abe.
The `for( auto elem: elems)` syntax is a C++11 syntax.
We are not using C++11 (even if it would be good idea).
This works on recent compiler (on Fedora 25) but fails on older one
(on Travis).
2017-03-28 17:14:25 +02:00
Matthieu Gautier
62d26c27ff Merge pull request #33 from kiwix/snippets
Snippets
2017-03-28 11:37:45 +02:00
Matthieu Gautier
074c1bcffa Try to generate the snippet if it is not present in the database.
We generate the snippet from the content of the article in the zim so
we need to have a access to the reader.
2017-03-21 16:28:03 +01:00
Matthieu Gautier
9be2abedf3 Check if a valuemaps metadata is available in the database and use it.
This way, we do not make assumption of where the values are stored.
2017-03-21 16:26:03 +01:00
Matthieu Gautier
83d27255cf Do not create all the results at once. Be a bit lazy.
We don't need to generate a vector of result when we do a search.
We better to just keep the handle to the current MSetIterator and
generate the wanted values when needed.
2017-03-21 16:20:17 +01:00
427 changed files with 161228 additions and 9239 deletions

12
.clang-format Normal file
View File

@@ -0,0 +1,12 @@
BasedOnStyle: Google
BinPackArguments: false
BinPackParameters: false
BreakBeforeBinaryOperators: All
BreakBeforeBraces: Linux
DerivePointerAlignment: false
SpacesInContainerLiterals: false
Standard: Cpp11
AllowShortFunctionsOnASingleLine: Inline
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false

37
.travis.yml Normal file
View File

@@ -0,0 +1,37 @@
language: cpp
dist: trusty
sudo: false
cache: ccache
before_install:
- PATH=$PATH:$HOME/bin
install: travis/install_deps.sh
script: travis/compile.sh
env:
matrix:
- PLATFORM="native_static"
- PLATFORM="native_dyn"
- PLATFORM="win32_static"
- PLATFORM="win32_dyn"
- PLATFORM="android_arm"
- PLATFORM="android_arm64"
addons:
apt:
packages:
- cmake
- python3.5
- python3-pip
- libbz2-dev
- ccache
- zlib1g-dev
- uuid-dev
- libctpp2-dev
- ctpp2-utils
- libmicrohttpd-dev
- g++-mingw-w64-i686
- gcc-mingw-w64-i686
- gcc-mingw-w64-base
- mingw-w64-tools
matrix:
include:
- env: PLATFORM="native_dyn"
os: osx

17
AUTHORS Normal file
View File

@@ -0,0 +1,17 @@
Automactic <christopherliqd@gmail.com>
Ayoub DARDORY <ayoubuto@gmail.com>
Cristian Patrasciuc <cristip@google.com>
Dattaz <taz@dattaz.fr>
Elad Keyshawn <elad.keyshawn@gmail.com>
Emmanuel Engelhart <kelson@kiwix.org>
Isaac <mhutti1@gmail.com>
jleow00 <leow.yonghan.jerome@gmail.com>
Julian Harty <julianharty@gmail.com>
Kiran Mathew Koshy <kiranmathewkoshy@gmail.com>
Kunal Mehta <legoktm@member.fsf.org>
Matthieu Gautier <mgautier@kymeria.fr>
Rashiq Ahmad <rashiq.z@gmail.com>
Renaud Gaudin <reg@kiwix.org>
Shivam <ssarodia@gmail.com>
Steve Wills <steve@mouf.net>
Synhershko <synhershko@users.sourceforge.net>

197
ChangeLog Normal file
View File

@@ -0,0 +1,197 @@
kiwix-lib 4.1.0
===============
* Allow the library to be filtered by tags.
* Fix language mapping.
* Update README about mustache dependency.
kiwix-lib 4.0.1
===============
* Fix "maybe uninitialize variable" issue.
* Ensure path are stored correctly (absolute path) in the library.
* [CI] Use the new deps archive xz
kiwix-lib 4.0.0
===============
* [API break] Remove support for external index.
* Move to the mustache templating system instead of ctpp2.
* Make meson.build works for meson>=0.43.0
* [API break] Move the basic tools from the `common` directory to `tools`.
kiwix-lib 3.1.1
===============
* The OPDS feed book's date must be the date of the book, not the date of the
feed generation.
* Convert the standard opds date to our format (YYYY-MM-DD)
* Remove duplicate language attribute in the libxml dumper.
* Create the datadirectory to not fail to write a file in a non-existent
directory
kiwix-lib 3.1.0
===============
* Add a method to get the favicon url of book (if available).
* Move dump code of library.xml in a specific class.
* Add a first support to bookmarks
kiwix-lib 3.0.3
===============
* Add the 'en' language to the mapping alpha2-code ('en') to alpha3-code
('eng').
* Correctly write the 'ArticleCount' and 'MediaCount' in the library.xml.
* Correctly fill the book size for the zim file size.
* Fix launch of aria2c.
kiwix-lib 3.0.2
===============
* Use the correct path separator when computing relativePath on Windows.
kiwix-lib 3.0.1
===============
* Small fix about parsing the opdsStream.
kiwix-lib 3.0.0
===============
* Change the downloader to use aria2 using a separated process (with rpc)
instead of using the libaria2. This simplify a lot the link process to
libaria2 on Windows.
- kiwix-lib doesn't depend on libaria2 anymore.
- kiwix-lib now depends on libcurl.
* [API break] Library class API has been updated :
- Books are referenced by id, not index. A lot of methods have been
updated this way.
- Books "list" is now private.
- There is no more "current" book.
- listBooksIds's filters have been updated.
* [API break] Book class API has been updated :
- Move the definition of Book in `book.h`.
- Use getter/setter methods instead public members.
- Size (getSize/setSize) is now returned in bytes, not kB.
- Dependending of how the book has been initialized (opdsfeed), the
faviconUrl may be stored in the book, the favicon being downloaded when
using `getFavicon`.
- The path (and indexPath) are always absolute path.
- Book has now a downloadId, corresponding to the aria2 download id (if
exists)
* [API break] Manager class API has been updated :
- The manager is mainly use to fill a Libray from a "library.xml" file or
opds feed. Other operations (has removeBookById, setBookPath, filter, ...)
have been removed.
- The manager use a intermediate class (LibraryManipulator) to add book to
the library. This dependency injection allow caller code to hook the add
of a book to the library.
- The manager work on a existing Library. It doesn't how a internal
Library.
* [API break] OpdsDumper class API has been updated :
- dumpOPDSFeed method now take the list of bookIds to dump instead of
dumping all books in the library.
- OpdsDumper can now dump openSearch result information (total result
count, start index, ...).
* [API break] Common tools API has been updated :
- `base64_encode` and `base64_decode` take std::string as arguments.
- New `download` function in networkTools.h using libcurl.
- New `getDataDirectory` function in pathTools.
- Better `beautifyInteger` and `beautifyFileSize` functions.
- New `nodeToString` function serializing a pugi::xml_node to a string.
- New `converta2toa3` function to convert alpha2 language code to aplha3
language code.
kiwix-lib 2.0.2
===============
* [Android] Forward c++ errors message de Java world.
* Follow redirection of favicon.
* Make aria2 dependency optional.
* Inculde unistd.h only on unix platform.
kiwix-lib 2.0.1
===============
* Fix parsing of url.
* Remove unused static resources.
* Correctly decode reserved characters in URLs.
* Explicitly use icu namespace to allow use of packaged icu lib.
kiwix-lib 2.0.0
===============
* Introduce a new API to retrive content from a reader.
* Introduce the `Entry` class.
* Reader's methods return an `Entry`.
* Content and other information can be retrieved from the `Entry`.
* Older Reader's methods are depreciated.
* Add an `OPDSDumper` class to dump a whole `Library` as an OPDS feed.
* Add a tool function to get the content of a file.
* Add a tool function to create a tempory directory.
* Add a `Downloader` class to download a file.
* Allow the manager to populate a `Library` from an OPDS feed.
* Try to locate libctpp2 in default system libdir and then fallback in 'lib'
directory.
* Build kiwix-lib setting RPATH.
* Build kiwix-lib without warning (werror=true)
* Build kiwix-lib on macos.
kiwix-lib 1.1.1
===============
* Correct the name of kiwix-lib (from `kiwixlib`) in meson.build to generate
dist archive with the correct name.
* Libzim version need to be at least 3.2.0
kiwix-lib 1.1.0
===============
* Allow for more than 70 search result per page in html results rendering
(kiwix/kiwix-tools#92)
* Add a small api to do geo queries.
* Add multi-search support in the JNI (#67)
* Add an API to get only one part of an article.
* Add an API to get direct location of an article content in the zim file.
* Improve urlencoding
* Fix pagination in html results rendering.
* Compile using gcc-5 on Travis.
* Allow JNI to access search snippets.
* JNI throw an exception instead of returning an invalid object if something
goes wrong.
* Add doctext documentation. (#116)
* Various bug fixes.
kiwix-lib 1.0.0
===============
* Correctly regenerate template resource using cttp2c at compilation time.
* Suggestion use xapian database when available
* Support multi-zim search in kiwix-lib (a search can now search on several
embedded database in zims in the same time)
* Fix some wording
* Fix license issues
* Add out argument to jni getContent* method to get the title of article in
the same time we get the content
* Rename `compile_resources.py` script to `kiwix-compile-resources`
* Use static lib when building for android or in "static mode"
* Make the ResourceNotFound exception public
kiwix-lib 0.2.0
===============
* Generate the snippet from the article content if the snippet is not
directly in the database.
This provide better snippets as they now depending of the query.
* Use the stopwords and the language stored in the fulltext index database to
parse the user query.
* Remove the indexer functionnality.
* Move to C++11 standard.
* Use the fulltext search of the zimlib.
We still have the fulltext search code in kiwix-lib to be able to search in
fulltext index by side of a zim file. (To be remove in the future)
* Few API hanges
* Change a lot of `Reader` methods to const methods.
* Fix some crashes.

110
README.md
View File

@@ -15,9 +15,9 @@ to [kiwix-build](https://github.com/kiwix/kiwix-build).
Preamble
--------
Although the Kiwix library can be compiled/cross-compiled on/for many
Although the Kiwix library can be (cross-)compiled on/for many
sytems, the following documentation explains how to do it on POSIX
ones. It is primarly though for GNU/Linux systems and has been tested
ones. It is primarly thought for GNU/Linux systems and has been tested
on recent releases of Ubuntu and Fedora.
Dependencies
@@ -33,10 +33,11 @@ libraries need to be available:
(package libzim-dev on Ubuntu)
* Pugixml ........................................ http://pugixml.org/
(package libpugixml-dev on Ubuntu)
* ctpp2 ........................................ http://ctpp.havoc.ru/
(package libctpp2-dev on Ubuntu)
* Xapian ......................................... https://xapian.org/
(package libxapian-dev on Ubuntu)
* libaria2 .................................. https://aria2.github.io/
(no package on Ubuntu)
* Mustache ....................... https://github.com/kainjow/Mustache
(Just copy the header mustache.hpp somewhere it can be found by the
compiler and/or set CPPFLAGS with correct '-I' option)
These dependencies may or may not be packaged by your operating
system. They may also be packaged but only in an older version. The
@@ -47,72 +48,91 @@ version by hand.
If you want to install these dependencies locally, then use the
kiwix-lib directory as install prefix.
If you compile ctpp2 from source and want to compile the Kiwix library
statically then you will probably need to rename ctpp2 static library
from ctpp2-st.a to ctpp2.a.
Environnement
Environment
-------------
The Kiwix library builds using [Meson](http://mesonbuild.com/) version
0.34 or higher. Meson relies itself on Ninja, pkg-config and few other
0.39 or higher. Meson relies itself on Ninja, pkg-config and few other
compilation tools.
Install first the few common compilation tools:
* Automake
* Libtool
* Virtualenv
* Meson
* Ninja
* Pkg-config
Then install Meson itself:
```
virtualenv -p python3 ./ # Create virtualenv
source bin/activate # Activate the virtualenv
pip install meson # Install Meson
hash -r # Refresh bash paths
```
Finally download and build Ninja locally:
```
git clone git://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
mkdir ../bin
cp ninja ../bin
cd ..
```
These tools should be packaged if you use a cutting edge operating
system. If not, have a look to the "Troubleshooting" section.
Compilation
-----------
Once all dependencies are installed, you can compile kiwix-lib with:
Once all dependencies are installed, you can compile the Kiwix library
with:
```
mkdir build
meson . build
cd build
ninja
ninja -C build
```
By default, it will compile dynamic linked libraries. If you want
statically linked libraries, you can add `--default-library=static`
option to the Meson command.
By default, it will compile dynamic linked libraries. All binary files
will be created in the "build" directory created automatically by
Meson. If you want statically linked libraries, you can add
`--default-library=static` option to the Meson command.
Depending of you system, `ninja` may be called `ninja-build`.
Installation
------------
If you want to install the libraries you just have compiled on your
system, here we go:
If you want to install the Kiwix library and the headers you just have
compiled on your system, here we go:
```
ninja install
ninja -C build install
```
You might need to run the command as root (or using 'sudo'), depending
where you want to install the libraries. After the installation
succeeded, you may need to run ldconfig (as root).
Uninstallation
------------
If you want to uninstall the Kiwix library:
```
ninja -C build uninstall
```
Like for the installation, you might need to run the command as root
(or using 'sudo').
Troubleshooting
---------------
If you need to install Meson "manually":
```
virtualenv -p python3 ./ # Create virtualenv
source bin/activate # Activate the virtualenv
pip3 install meson # Install Meson
hash -r # Refresh bash paths
```
If you need to install Ninja "manually":
```
git clone git://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
mkdir ../bin
cp ninja ../bin
cd ..
```
You might need to run the command as root, depending where you want to
install the libraries.
If the compilation still fails, you might need to get a more recent
version of a dependency than the one packaged by your Linux
distribution. Try then with a source tarball distributed by the
problematic upstream project or even directly from the source code
repository.
License
-------

36
format_code.sh Executable file
View File

@@ -0,0 +1,36 @@
#!/usr/bin/bash
files=(
"include/library.h"
"include/common/stringTools.h"
"include/common/pathTools.h"
"include/common/otherTools.h"
"include/common/regexTools.h"
"include/common/networkTools.h"
"include/manager.h"
"include/reader.h"
"include/kiwix.h"
"include/xapianSearcher.h"
"include/searcher.h"
"src/library.cpp"
"src/android/kiwix.cpp"
"src/android/org/kiwix/kiwixlib/JNIKiwixBool.java"
"src/android/org/kiwix/kiwixlib/JNIKiwix.java"
"src/android/org/kiwix/kiwixlib/JNIKiwixString.java"
"src/android/org/kiwix/kiwixlib/JNIKiwixInt.java"
"src/searcher.cpp"
"src/common/pathTools.cpp"
"src/common/regexTools.cpp"
"src/common/otherTools.cpp"
"src/common/networkTools.cpp"
"src/common/stringTools.cpp"
"src/xapianSearcher.cpp"
"src/manager.cpp"
"src/reader.cpp"
)
for i in "${files[@]}"
do
echo $i
clang-format -i -style=file $i
done

119
include/book.h Normal file
View File

@@ -0,0 +1,119 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_BOOK_H
#define KIWIX_BOOK_H
#include <string>
namespace pugi {
class xml_node;
}
namespace kiwix
{
class OPDSDumper;
class Reader;
/**
* A class to store information about a book (a zim file)
*/
class Book
{
public:
Book();
~Book();
bool update(const Book& other);
void update(const Reader& reader);
void updateFromXml(const pugi::xml_node& node, const std::string& baseDir);
void updateFromOpds(const pugi::xml_node& node, const std::string& urlHost);
std::string getHumanReadableIdFromPath();
bool readOnly() const { return m_readOnly; }
const std::string& getId() const { return m_id; }
const std::string& getPath() const { return m_path; }
bool isPathValid() const { return m_pathValid; }
const std::string& getTitle() const { return m_title; }
const std::string& getDescription() const { return m_description; }
const std::string& getLanguage() const { return m_language; }
const std::string& getCreator() const { return m_creator; }
const std::string& getPublisher() const { return m_publisher; }
const std::string& getDate() const { return m_date; }
const std::string& getUrl() const { return m_url; }
const std::string& getName() const { return m_name; }
const std::string& getTags() const { return m_tags; }
const std::string& getOrigId() const { return m_origId; }
const uint64_t& getArticleCount() const { return m_articleCount; }
const uint64_t& getMediaCount() const { return m_mediaCount; }
const uint64_t& getSize() const { return m_size; }
const std::string& getFavicon() const;
const std::string& getFaviconUrl() const { return m_faviconUrl; }
const std::string& getFaviconMimeType() const { return m_faviconMimeType; }
const std::string& getDownloadId() const { return m_downloadId; }
void setReadOnly(bool readOnly) { m_readOnly = readOnly; }
void setId(const std::string& id) { m_id = id; }
void setPath(const std::string& path);
void setPathValid(bool valid) { m_pathValid = valid; }
void setTitle(const std::string& title) { m_title = title; }
void setDescription(const std::string& description) { m_description = description; }
void setLanguage(const std::string& language) { m_language = language; }
void setCreator(const std::string& creator) { m_creator = creator; }
void setPublisher(const std::string& publisher) { m_publisher = publisher; }
void setDate(const std::string& date) { m_date = date; }
void setUrl(const std::string& url) { m_url = url; }
void setName(const std::string& name) { m_name = name; }
void setTags(const std::string& tags) { m_tags = tags; }
void setOrigId(const std::string& origId) { m_origId = origId; }
void setArticleCount(uint64_t articleCount) { m_articleCount = articleCount; }
void setMediaCount(uint64_t mediaCount) { m_mediaCount = mediaCount; }
void setSize(uint64_t size) { m_size = size; }
void setFavicon(const std::string& favicon) { m_favicon = favicon; }
void setFaviconMimeType(const std::string& faviconMimeType) { m_faviconMimeType = faviconMimeType; }
void setDownloadId(const std::string& downloadId) { m_downloadId = downloadId; }
protected:
std::string m_id;
std::string m_downloadId;
std::string m_path;
bool m_pathValid;
std::string m_title;
std::string m_description;
std::string m_language;
std::string m_creator;
std::string m_publisher;
std::string m_date;
std::string m_url;
std::string m_name;
std::string m_tags;
std::string m_origId;
uint64_t m_articleCount;
uint64_t m_mediaCount;
bool m_readOnly;
uint64_t m_size;
mutable std::string m_favicon;
std::string m_faviconUrl;
std::string m_faviconMimeType;
};
}
#endif

68
include/bookmark.h Normal file
View File

@@ -0,0 +1,68 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_BOOKMARK_H
#define KIWIX_BOOKMARK_H
#include <string>
namespace pugi {
class xml_node;
}
namespace kiwix
{
/**
* A class to store information about a bookmark (an article in a book)
*/
class Bookmark
{
public:
Bookmark();
~Bookmark();
void updateFromXml(const pugi::xml_node& node);
const std::string& getBookId() const { return m_bookId; }
const std::string& getBookTitle() const { return m_bookTitle; }
const std::string& getUrl() const { return m_url; }
const std::string& getTitle() const { return m_title; }
const std::string& getLanguage() const { return m_language; }
const std::string& getDate() const { return m_date; }
void setBookId(const std::string& bookId) { m_bookId = bookId; }
void setBookTitle(const std::string& bookTitle) { m_bookTitle = bookTitle; }
void setUrl(const std::string& url) { m_url = url; }
void setTitle(const std::string& title) { m_title = title; }
void setLanguage(const std::string& language) { m_language = language; }
void setDate(const std::string& date) { m_date = date; }
protected:
std::string m_bookId;
std::string m_bookTitle;
std::string m_url;
std::string m_title;
std::string m_language;
std::string m_date;
};
}
#endif

24
include/common.h Normal file
View File

@@ -0,0 +1,24 @@
#ifndef _KIWIX_COMMON_H_
#define _KIWIX_COMMON_H_
#include <zim/zim.h>
#ifdef __GNUC__
#define DEPRECATED __attribute__((deprecated))
#elif defined(_MSC_VER)
#define DEPRECATED __declspec(deprecated)
#else
#praga message("WARNING: You need to implement DEPRECATED for this compiler")
#define DEPRECATED
#endif
namespace kiwix {
typedef zim::size_type size_type;
typedef zim::offset_type offset_type;
}
#endif //_KIWIX_COMMON_H_

View File

@@ -1,4 +0,0 @@
#include <string>
std::string base64_encode(unsigned char const* , unsigned int len);
std::string base64_decode(std::string const& s);

View File

@@ -1,71 +0,0 @@
/*
* Copyright 2011-2012 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_STRINGTOOLS_H
#define KIWIX_STRINGTOOLS_H
#include <unicode/translit.h>
#include <unicode/normlzr.h>
#include <unicode/unistr.h>
#include <unicode/rep.h>
#include <unicode/uniset.h>
#include <unicode/ustring.h>
#include <unicode/ucnv.h>
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <sstream>
#include "pathTools.h"
namespace kiwix {
#ifndef __ANDROID__
std::string beautifyInteger(const unsigned int number);
std::string beautifyFileSize(const unsigned int number);
std::string urlEncode(const std::string &c);
void printStringInHexadecimal(const char *s);
void printStringInHexadecimal(UnicodeString s);
void stringReplacement(std::string& str, const std::string& oldStr, const std::string& newStr);
std::string encodeDiples(const std::string& str);
#endif
std::string removeAccents(const std::string &text);
void loadICUExternalTables();
std::string urlDecode(const std::string &c);
std::vector<std::string> split(const std::string&, const std::string&);
std::vector<std::string> split(const char*, const char*);
std::vector<std::string> split(const std::string&, const char*);
std::vector<std::string> split(const char*, const std::string&);
std::string ucAll(const std::string &word);
std::string lcAll(const std::string &word);
std::string ucFirst(const std::string &word);
std::string lcFirst(const std::string &word);
std::string toTitle(const std::string &word);
std::string normalize(const std::string &word);
}
#endif

View File

File diff suppressed because it is too large Load Diff

View File

@@ -1,79 +0,0 @@
/*
* Copyright 2013 Renaud Gaudin <reg@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef _CTPP2_VM_STRING_LOADER_HPP__
#define _CTPP2_VM_STRING_LOADER_HPP__ 1
#include <ctpp2/CTPP2VMLoader.hpp>
#include <ctpp2/CTPP2Util.hpp>
#include <ctpp2/CTPP2Exception.hpp>
#include <ctpp2/CTPP2VMExecutable.hpp>
#include <ctpp2/CTPP2VMInstruction.hpp>
#include <ctpp2/CTPP2VMMemoryCore.hpp>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <string>
/**
@file VMStringLoader.hpp
@brief Load program core from file
*/
namespace CTPP // C++ Template Engine
{
// FWD
struct VMExecutable;
/**
@class VMStringLoader CTPP2VMStringLoader.hpp <CTPP2VMStringLoader.hpp>
@brief Load program core from file
*/
class CTPP2DECL VMStringLoader:
public VMLoader
{
public:
/**
*/
VMStringLoader(CCHAR_P rawContent, size_t rawContentSize);
/**
@brief Get ready-to-run program
*/
const VMMemoryCore * GetCore() const;
/**
@brief A destructor
*/
~VMStringLoader() throw();
private:
/** Program core */
VMExecutable * oCore;
/** Ready-to-run program */
VMMemoryCore * pVMMemoryCore;
};
} // namespace CTPP
#endif // _CTPP2_VM_STRING_LOADER_HPP__
// End.

104
include/downloader.h Normal file
View File

@@ -0,0 +1,104 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_DOWNLOADER_H
#define KIWIX_DOWNLOADER_H
#include <string>
#include <vector>
#include <map>
#include <pthread.h>
#include <memory>
namespace kiwix
{
class Aria2;
struct DownloadedFile {
DownloadedFile()
: success(false) {}
bool success;
std::string path;
};
class AriaError : public std::runtime_error {
public:
AriaError(const std::string& message) : std::runtime_error(message) {}
};
class Download {
public:
typedef enum { K_ACTIVE, K_WAITING, K_PAUSED, K_ERROR, K_COMPLETE, K_REMOVED, K_UNKNOWN } StatusResult;
Download() :
m_status(K_UNKNOWN) {}
Download(std::shared_ptr<Aria2> p_aria, std::string did)
: mp_aria(p_aria),
m_status(K_UNKNOWN),
m_did(did) {};
void updateStatus(bool follow=false);
StatusResult getStatus() { return m_status; }
std::string getDid() { return m_did; }
std::string getFollowedBy() { return m_followedBy; }
uint64_t getTotalLength() { return m_totalLength; }
uint64_t getCompletedLength() { return m_completedLength; }
uint64_t getDownloadSpeed() { return m_downloadSpeed; }
uint64_t getVerifiedLength() { return m_verifiedLength; }
std::string getPath() { return m_path; }
std::vector<std::string>& getUris() { return m_uris; }
protected:
std::shared_ptr<Aria2> mp_aria;
StatusResult m_status;
std::string m_did = "";
std::string m_followedBy = "";
uint64_t m_totalLength;
uint64_t m_completedLength;
uint64_t m_downloadSpeed;
uint64_t m_verifiedLength;
std::vector<std::string> m_uris;
std::string m_path;
};
/**
* A tool to download things.
*
*/
class Downloader
{
public:
Downloader();
virtual ~Downloader();
void close();
Download* startDownload(const std::string& uri);
Download* getDownload(const std::string& did);
size_t getNbDownload() { return m_knownDownloads.size(); }
std::vector<std::string> getDownloadIds();
private:
std::map<std::string, std::unique_ptr<Download>> m_knownDownloads;
std::shared_ptr<Aria2> mp_aria;
};
}
#endif

190
include/entry.h Normal file
View File

@@ -0,0 +1,190 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_ENTRY_H
#define KIWIX_ENTRY_H
#include <stdio.h>
#include <zim/article.h>
#include <exception>
#include <string>
using namespace std;
namespace kiwix
{
class NoEntry : public std::exception {};
/**
* A entry represent an.. entry in a zim file.
*/
class Entry
{
public:
/**
* Default constructor.
*
* Construct an invalid entry.
*/
Entry() = default;
/**
* Construct an entry making reference to an zim article.
*
* @param article a zim::Article object
*/
Entry(zim::Article article);
virtual ~Entry() = default;
/**
* Get the path of the entry.
*
* The path is the "key" of an entry.
*
* @return the path of the entry.
*/
std::string getPath() const;
/**
* Get the title of the entry.
*
* @return the title of the entry.
*/
std::string getTitle() const;
/**
* Get the content of the entry.
*
* The string is a copy of the content.
* If you don't want to do a copy, use get_blob.
*
* @return the content of the entry.
*/
std::string getContent() const;
/**
* Get the blob of the entry.
*
* A blob make reference to the content without copying it.
*
* @param offset The starting offset of the blob.
* @return the blob of the entry.
*/
zim::Blob getBlob(offset_type offset = 0) const;
/**
* Get the blob of the entry.
*
* A blob make reference to the content without copying it.
*
* @param offset The starting offset of the blob.
* @param size The size of the blob.
* @return the blob of the entry.
*/
zim::Blob getBlob(offset_type offset, size_type size) const;
/**
* Get the info for direct access to the content of the entry.
*
* Some entry (ie binary ones) have their content plain stored
* in the zim file. Knowing the offset where the content is stored
* an user can directly read the content in the zim file bypassing the
* kiwix-lib/libzim.
*
* @return A pair specifying where to read the content.
* The string is the real file to read (may be different that .zim
* file if zim is cut).
* The offset is the offset to read in the file.
* Return <"",0> if is not possible to read directly.
*/
std::pair<std::string, offset_type> getDirectAccessInfo() const;
/**
* Get the size of the entry.
*
* @return the size of the entry.
*/
size_type getSize() const;
/**
* Get the mime_type of the entry.
*
* @return the mime_type of the entry.
*/
std::string getMimetype() const;
/**
* Get if the entry is a redirect entry.
*
* @return True if the entry is a redirect.
*/
bool isRedirect() const;
/**
* Get if the entry is a link target entry.
*
* @return True if the entry is a link target.
*/
bool isLinkTarget() const;
/**
* Get if the entry is a deleted entry.
*
* @return True if the entry is a deleted entry.
*/
bool isDeleted() const;
/**
* Get the entry pointed by this entry.
*
* @return the entry pointed.
* @throw NoEntry if the entry is not a redirected entry.
*/
Entry getRedirectEntry() const;
/**
* Get the final entry pointed by this entry.
*
* Follow the redirection until a "not redirecting" entry is found.
* If the entry is not a redirected entry, return the entry itself.
*
* @return the final entry.
*/
Entry getFinalEntry() const;
/**
* Convert the entry to a boolean value.
*
* @return True if the entry is valid.
*/
explicit operator bool() const { return good(); }
private:
zim::Article article;
mutable zim::Article final_article;
bool good() const { return article.good(); }
};
}
#endif // KIWIX_ENTRY_H

View File

@@ -1,173 +0,0 @@
/*
* Copyright 2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_INDEXER_H
#define KIWIX_INDEXER_H
#include <string>
#include <vector>
#include <stack>
#include <queue>
#include <fstream>
#include <iostream>
#include <sstream>
#include <pthread.h>
#include "common/stringTools.h"
#include "common/otherTools.h"
#include <zim/file.h>
#include <zim/article.h>
#include <zim/fileiterator.h>
#include "reader.h"
using namespace std;
namespace kiwix {
struct indexerToken {
string url;
string accentedTitle;
string title;
string keywords;
string content;
string snippet;
string size;
string wordCount;
};
class Indexer {
typedef void (* ProgressCallback)(const unsigned int processedArticleCount, const unsigned int totalArticleCount);
public:
Indexer();
virtual ~Indexer();
bool start(const string zimPath, const string indexPath, ProgressCallback callback = NULL);
bool stop();
bool isRunning();
unsigned int getProgression();
void setVerboseFlag(const bool value);
protected:
virtual void indexingPrelude(const string indexPath) = 0;
virtual void index(const string &url,
const string &title,
const string &unaccentedTitle,
const string &keywords,
const string &content,
const string &snippet,
const string &size,
const string &wordCount) = 0;
virtual void flush() = 0;
virtual void indexingPostlude(const string indexPath) = 0;
/* Stop words */
std::vector<std::string> stopWords;
void readStopWords(const string languageCode);
/* Others */
unsigned int countWords(const string &text);
/* Boost factor */
unsigned int keywordsBoostFactor;
inline unsigned int getTitleBoostFactor(const unsigned int contentLength) {
return contentLength / 500 + 1;
}
/* Verbose */
pthread_mutex_t verboseMutex;
bool getVerboseFlag();
bool verboseFlag;
private:
ProgressCallback progressCallback;
pthread_mutex_t threadIdsMutex;
/* Article extraction */
pthread_t articleExtractor;
pthread_mutex_t articleExtractorRunningMutex;
static void *extractArticles(void *ptr);
bool articleExtractorRunningFlag;
bool isArticleExtractorRunning();
void articleExtractorRunning(bool value);
/* Article parsing */
pthread_t articleParser;
pthread_mutex_t articleParserRunningMutex;
static void *parseArticles(void *ptr);
bool articleParserRunningFlag;
bool isArticleParserRunning();
void articleParserRunning(bool value);
/* Index writting */
pthread_t articleIndexer;
pthread_mutex_t articleIndexerRunningMutex;
static void *indexArticles(void *ptr);
bool articleIndexerRunningFlag;
bool isArticleIndexerRunning();
void articleIndexerRunning(bool value);
/* To parse queue */
std::queue<indexerToken> toParseQueue;
pthread_mutex_t toParseQueueMutex;
void pushToParseQueue(indexerToken &token);
bool popFromToParseQueue(indexerToken &token);
bool isToParseQueueEmpty();
/* To index queue */
std::queue<indexerToken> toIndexQueue;
pthread_mutex_t toIndexQueueMutex;
void pushToIndexQueue(indexerToken &token);
bool popFromToIndexQueue(indexerToken &token);
bool isToIndexQueueEmpty();
/* Article Count & Progression */
unsigned int articleCount;
pthread_mutex_t articleCountMutex;
void setArticleCount(const unsigned int articleCount);
unsigned int getArticleCount();
/* Progression */
unsigned int progression;
pthread_mutex_t progressionMutex;
void setProgression(const unsigned int progression);
/* getProgression() is public */
/* ZIM path */
pthread_mutex_t zimPathMutex;
string zimPath;
void setZimPath(const string path);
string getZimPath();
/* Index path */
pthread_mutex_t indexPathMutex;
string indexPath;
void setIndexPath(const string path);
string getIndexPath();
/* ZIM id */
pthread_mutex_t zimIdMutex;
string zimId;
void setZimId(const string id);
string getZimId();
};
}
#endif

View File

@@ -22,5 +22,4 @@
#include "library.h"
#endif

View File

@@ -20,88 +20,186 @@
#ifndef KIWIX_LIBRARY_H
#define KIWIX_LIBRARY_H
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>
#include <vector>
#include <stack>
#include <map>
#include "common/stringTools.h"
#include "common/regexTools.h"
#include "book.h"
#include "bookmark.h"
#define KIWIX_LIBRARY_VERSION "20110515"
using namespace std;
namespace kiwix
{
namespace kiwix {
class OPDSDumper;
enum supportedIndexType { UNKNOWN, XAPIAN };
enum supportedListSortBy { UNSORTED, TITLE, SIZE, DATE, CREATOR, PUBLISHER };
enum supportedListMode {
ALL = 0,
LOCAL = 1,
REMOTE = 1 << 1,
NOLOCAL = 1 << 2,
NOREMOTE = 1 << 3,
VALID = 1 << 4,
NOVALID = 1 << 5
};
/**
* A Library store several books.
*/
class Library
{
std::map<std::string, kiwix::Book> m_books;
std::vector<kiwix::Bookmark> m_bookmarks;
class Book {
public:
Library();
~Library();
public:
Book();
~Book();
/**
* Add a book to the library.
*
* If a book already exist in the library with the same id, update
* the existing book instead of adding a new one.
*
* @param book The book to add.
* @return True if the book has been added.
* False if a book has been updated.
*/
bool addBook(const Book& book);
static bool sortByLastOpen(const Book &a, const Book &b);
static bool sortByTitle(const Book &a, const Book &b);
static bool sortBySize(const Book &a, const Book &b);
static bool sortByDate(const Book &a, const Book &b);
static bool sortByCreator(const Book &a, const Book &b);
static bool sortByPublisher(const Book &a, const Book &b);
static bool sortByLanguage(const Book &a, const Book &b);
string getHumanReadableIdFromPath();
/**
* Add a bookmark to the library.
*
* @param bookmark the book to add.
*/
void addBookmark(const Bookmark& bookmark);
string id;
string path;
string pathAbsolute;
string last;
string indexPath;
string indexPathAbsolute;
supportedIndexType indexType;
string title;
string description;
string language;
string creator;
string publisher;
string date;
string url;
string name;
string tags;
string origId;
string articleCount;
string mediaCount;
bool readOnly;
string size;
string favicon;
string faviconMimeType;
};
/**
* Remove a bookmarkk
*
* @param zimId The zimId of the bookmark.
* @param url The url of the bookmark.
* @return True if the bookmark has been removed.
*/
bool removeBookmark(const std::string& zimId, const std::string& url);
class Library {
Book& getBookById(const std::string& id);
public:
Library();
~Library();
/**
* Remove a book from the library.
*
* @param id the id of the book to remove.
* @return True if the book were in the lirbrary and has been removed.
*/
bool removeBookById(const std::string& id);
string version;
bool addBook(const Book &book);
bool removeBookByIndex(const unsigned int bookIndex);
vector <kiwix::Book> books;
/**
* Write the library to a file.
*
* @param path the path of the file to write to.
* @return True if the library has been correctly saved.
*/
bool writeToFile(const std::string& path);
/*
* 'current' is the variable storing the current content/book id
* in the library. This is used to be able to load per default a
* content. As Kiwix may work with many library XML files, you may
* have "current" defined many time with different values. The
* last XML file read has the priority, Although we do not have an
* library object for each file, we want to be able to fallback to
* an 'old' current book if the one which should be load
* failed. That is the reason why we need a stack here
*/
stack<string> current;
};
/**
* Write the library bookmarks to a file.
*
* @param path the path of the file to write to.
* @return True if the library has been correctly saved.
*/
bool writeBookmarksToFile(const std::string& path);
/**
* Get the number of book in the library.
*
* @param localBooks If we must count local books (books with a path).
* @param remoteBooks If we must count remote books (books with an url)
* @return The number of books.
*/
unsigned int getBookCount(const bool localBooks, const bool remoteBooks);
/**
* Get all langagues of the books in the library.
*
* @return A list of languages.
*/
std::vector<std::string> getBooksLanguages();
/**
* Get all book creators of the books in the library.
*
* @return A list of book creators.
*/
std::vector<std::string> getBooksCreators();
/**
* Get all book publishers of the books in the library.
*
* @return A list of book publishers.
*/
std::vector<std::string> getBooksPublishers();
/**
* Get all bookmarks.
*
* @return A list of bookmarks
*/
const std::vector<kiwix::Bookmark>& getBookmarks() { return m_bookmarks; }
/**
* Get all book ids of the books in the library.
*
* @return A list of book ids.
*/
std::vector<std::string> getBooksIds();
/**
* Filter the library and generate a new one with the keep elements.
*
* This is equivalent to `listBookIds(ALL, UNSORTED, search)`.
*
* @param search List only books with search in the title or description.
* @return The list of bookIds corresponding to the query.
*/
std::vector<std::string> filter(const std::string& search);
/**
* List books in the library.
*
* @param mode The mode of listing :
* - LOCAL  : list only local books (with a path).
* - REMOTE : list only remote books (with an url).
* - VALID  : list only valid books (without a path or with a
* path pointing to a valid zim file).
* - NOLOCAL : list only books without valid path.
* - NOREMOTE : list only books without url.
* - NOVALID : list only books not valid.
* - ALL : Do not do any filter (LOCAL or REMOTE)
* - Flags can be combined.
* @param sortBy Attribute to sort by the book list.
* @param search List only books with search in the title, description.
* @param language List only books in this language.
* @param creator List only books of this creator.
* @param publisher List only books of this publisher.
* @param maxSize Do not list book bigger than maxSize.
* Set to 0 to cancel this filter.
* @return The list of bookIds corresponding to the query.
*/
std::vector<std::string> listBooksIds(
int supportedListMode = ALL,
supportedListSortBy sortBy = UNSORTED,
const std::string& search = "",
const std::string& language = "",
const std::string& creator = "",
const std::string& publisher = "",
const std::vector<std::string>& tags = {},
size_t maxSize = 0);
friend class OPDSDumper;
friend class libXMLDumper;
};
}
#endif

83
include/libxml_dumper.h Normal file
View File

@@ -0,0 +1,83 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_LIBXML_DUMPER_H
#define KIWIX_LIBXML_DUMPER_H
#include <string>
#include <vector>
#include <pugixml.hpp>
#include "library.h"
namespace kiwix
{
/**
* A tool to dump a `Library` into a basic library.xml
*
*/
class LibXMLDumper
{
public:
LibXMLDumper() = default;
LibXMLDumper(Library* library);
~LibXMLDumper();
/**
* Dump the library.xml
*
* @param id The id of the library.
* @return The library.xml content.
*/
std::string dumpLibXMLContent(const std::vector<std::string>& bookIds);
/**
* Dump the bookmark of the library.
*
* @return The bookmark.xml content.
*/
std::string dumpLibXMLBookmark();
/**
* Set the base directory used.
*
* @param baseDir the base directory to use.
*/
void setBaseDir(const std::string& baseDir) { this->baseDir = baseDir; }
/**
* Set the library to dump.
*
* @param library The library to dump.
*/
void setLibrary(Library* library) { this->library = library; }
protected:
kiwix::Library* library;
std::string baseDir;
private:
void handleBook(Book book, pugi::xml_node root_node);
void handleBookmark(Bookmark bookmark, pugi::xml_node root_node);
};
}
#endif // KIWIX_OPDS_DUMPER_H

View File

@@ -20,73 +20,214 @@
#ifndef KIWIX_MANAGER_H
#define KIWIX_MANAGER_H
#include <string>
#include <sstream>
#include <time.h>
#include <pugixml.hpp>
#include "common/base64.h"
#include "common/regexTools.h"
#include "common/pathTools.h"
#include "book.h"
#include "library.h"
#include "reader.h"
using namespace std;
#include <string>
#include <vector>
namespace kiwix {
namespace pugi {
class xml_document;
}
enum supportedListMode { LASTOPEN, REMOTE, LOCAL };
enum supportedListSortBy { TITLE, SIZE, DATE, CREATOR, PUBLISHER };
namespace kiwix
{
class Manager {
class LibraryManipulator {
public:
virtual ~LibraryManipulator() {}
virtual bool addBookToLibrary(Book book) = 0;
virtual void addBookmarkToLibrary(Bookmark bookmark) = 0;
};
public:
Manager();
~Manager();
class DefaultLibraryManipulator : public LibraryManipulator {
public:
DefaultLibraryManipulator(Library* library) :
library(library) {}
virtual ~DefaultLibraryManipulator() {}
bool addBookToLibrary(Book book) {
return library->addBook(book);
}
void addBookmarkToLibrary(Bookmark bookmark) {
library->addBookmark(bookmark);
}
private:
kiwix::Library* library;
};
bool readFile(const string path, const bool readOnly = true);
bool readFile(const string nativePath, const string UTF8Path, const bool readOnly = true);
bool readXml(const string xml, const bool readOnly = true, const string libraryPath = "");
bool writeFile(const string path);
bool removeBookByIndex(const unsigned int bookIndex);
bool removeBookById(const string id);
bool setCurrentBookId(const string id);
string getCurrentBookId();
bool setBookIndex(const string id, const string path, const supportedIndexType type);
bool setBookIndex(const string id, const string path);
bool setBookPath(const string id, const string path);
string addBookFromPathAndGetId(const string pathToOpen, const string pathToSave = "", const string url = "",
const bool checkMetaData = false);
bool addBookFromPath(const string pathToOpen, const string pathToSave = "", const string url = "",
const bool checkMetaData = false);
Library cloneLibrary();
bool getBookById(const string id, Book &book);
bool getCurrentBook(Book &book);
unsigned int getBookCount(const bool localBooks, const bool remoteBooks);
bool updateBookLastOpenDateById(const string id);
void removeBookPaths();
bool listBooks(const supportedListMode mode, const supportedListSortBy sortBy, const unsigned int maxSize,
const string language, const string creator, const string publisher, const string search);
vector<string> getBooksLanguages();
vector<string> getBooksCreators();
vector<string> getBooksPublishers();
vector<string> getBooksIds();
/**
* A tool to manage a `Library`.
*
* A `Manager` handle a internal `Library`.
* This `Library` can be retrived with `cloneLibrary` method.
*/
class Manager
{
public:
Manager(LibraryManipulator* manipulator);
Manager(Library* library);
~Manager();
string writableLibraryPath;
/**
* Read a `library.xml` and add book in the file to the library.
*
* @param path The path to the `library.xml`.
* @param readOnly Set if the libray path could be overwritten latter with
* updated content.
* @return True if file has been properly parsed.
*/
bool readFile(const std::string& path, const bool readOnly = true);
vector<std::string> bookIdList;
/**
* Read a `library.xml` and add book in the file to the library.
*
* @param nativePath The path of the `library.xml`
* @param UTF8Path The utf8 version (?) of the path. Also the path where the
* library will be writen i readOnly is False.
* @param readOnly Set if the libray path could be overwritten latter with
* updated content.
* @return True if file has been properly parsed.
*/
bool readFile(const std::string& nativePath,
const std::string& UTF8Path,
const bool readOnly = true);
protected:
kiwix::Library library;
/**
* Load a library content store in the string.
*
* @param xml The content corresponding of the library xml
* @param readOnly Set if the libray path could be overwritten latter with
* updated content.
* @param libraryPath The library path (used to resolve relative path)
* @return True if the content has been properly parsed.
*/
bool readXml(const std::string& xml,
const bool readOnly = true,
const std::string& libraryPath = "");
bool readBookFromPath(const string path, Book *book = NULL);
bool parseXmlDom(const pugi::xml_document &doc, const bool readOnly, const string libraryPath);
/**
* Load a library content stored in a OPDS stream.
*
* @param content The content of the OPDS stream.
* @param readOnly Set if the library path could be overwritten later with
* updated content.
* @param libraryPath The library path (used to resolve relative path)
* @return True if the content has been properly parsed.
*/
bool readOpds(const std::string& content, const std::string& urlHost);
private:
void checkAndCleanBookPaths(Book &book, const string &libraryPath);
};
/**
* Load a bookmark file.
*
* @param path The path of the file to read.
* @return True if the content has been properly parsed.
*/
bool readBookmarkFile(const std::string& path);
/**
* Add a book to the library.
*
* @param pathToOpen The path to the zim file to add.
* @param pathToSave The path to store in the library in place of pathToOpen.
* @param url The url of the book to store in the library.
* @param checMetaData Tell if we check metadata before adding book to the
* library.
* @return The id of the book if the book has been added to the library.
* Else, an empty string.
*/
std::string addBookFromPathAndGetId(const std::string& pathToOpen,
const std::string& pathToSave = "",
const std::string& url = "",
const bool checkMetaData = false);
/**
* Add a book to the library.
*
* @param pathToOpen The path to the zim file to add.
* @param pathToSave The path to store in the library in place of pathToOpen.
* @param url The url of the book to store in the library.
* @param checMetaData Tell if we check metadata before adding book to the
* library.
* @return True if the book has been added to the library.
*/
bool addBookFromPath(const std::string& pathToOpen,
const std::string& pathToSave = "",
const std::string& url = "",
const bool checkMetaData = false);
/**
* Get the book corresponding to an id.
*
* @param[in] id The id of the book
* @param[out] book The book corresponding to the id.
* @return True if the book has been found.
*/
bool getBookById(const std::string& id, Book& book);
/**
* Update the "last open date" of a book
*
* @param id the id of the book.
* @return True if the book is in the library.
*/
bool updateBookLastOpenDateById(const std::string& id);
/**
* Remove (set to empty) paths of all books in the library.
*/
void removeBookPaths();
/**
* List books in the library.
*
* The books list will be available in public vector member `bookIdList`.
*
* @param mode The mode of listing :
* - LASTOPEN sort by last opened book.
* - LOCAL list only local file.
* - REMOTE list only remote file.
* @param sortBy Attribute to sort by the book list.
* @param maxSize Do not list book bigger than maxSize MiB.
* Set to 0 to cancel this filter.
* @param language List only books in this language.
* @param creator List only books of this creator.
* @param publisher List only books of this publisher.
* @param search List only books with search in the title, description or
* language.
* @return True
*/
bool listBooks(const supportedListMode mode,
const supportedListSortBy sortBy,
const unsigned int maxSize,
const std::string& language,
const std::string& creator,
const std::string& publisher,
const std::string& search);
std::string writableLibraryPath;
bool m_hasSearchResult = false;
uint64_t m_totalBooks = 0;
uint64_t m_startIndex = 0;
uint64_t m_itemsPerPage = 0;
protected:
kiwix::LibraryManipulator* manipulator;
bool mustDeleteManipulator;
bool readBookFromPath(const std::string& path, Book* book);
bool parseXmlDom(const pugi::xml_document& doc,
const bool readOnly,
const std::string& libraryPath);
bool parseOpdsDom(const pugi::xml_document& doc,
const std::string& urlHost);
private:
void checkAndCleanBookPaths(Book& book, const std::string& libraryPath);
};
}
#endif

View File

@@ -1,35 +1,26 @@
headers = [
'book.h',
'bookmark.h',
'common.h',
'library.h',
'manager.h',
'libxml_dumper.h',
'opds_dumper.h',
'downloader.h',
'reader.h',
'entry.h',
'searcher.h'
]
if not get_option('android')
headers += ['indexer.h']
endif
if xapian_dep.found()
headers += ['xapianIndexer.h', 'xapianSearcher.h']
endif
install_headers(headers, subdir:'kiwix')
install_headers(
'common/base64.h',
'common/networkTools.h',
'common/otherTools.h',
'common/pathTools.h',
'common/regexTools.h',
'common/stringTools.h',
'common/tree.h',
subdir:'kiwix/common'
'tools/base64.h',
'tools/networkTools.h',
'tools/otherTools.h',
'tools/pathTools.h',
'tools/regexTools.h',
'tools/stringTools.h',
subdir:'kiwix/tools'
)
if has_ctpp2_dep
install_headers(
'ctpp2/CTPP2VMStringLoader.hpp',
subdir:'kiwix/ctpp2'
)
endif

120
include/opds_dumper.h Normal file
View File

@@ -0,0 +1,120 @@
/*
* Copyright 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_OPDS_DUMPER_H
#define KIWIX_OPDS_DUMPER_H
#include <time.h>
#include <sstream>
#include <string>
#include <pugixml.hpp>
#include "tools/base64.h"
#include "tools/pathTools.h"
#include "tools/regexTools.h"
#include "library.h"
#include "reader.h"
using namespace std;
namespace kiwix
{
/**
* A tool to dump a `Library` into a opds stream.
*
*/
class OPDSDumper
{
public:
OPDSDumper() = default;
OPDSDumper(Library* library);
~OPDSDumper();
/**
* Dump the OPDS feed.
*
* @param id The id of the library.
* @return The OPDS feed.
*/
std::string dumpOPDSFeed(const std::vector<std::string>& bookIds);
/**
* Set the id of the opds stream.
*
* @param id the id to use.
*/
void setId(const std::string& id) { this->id = id;}
/**
* Set the title oft the opds stream.
*
* @param title the title to use.
*/
void setTitle(const std::string& title) { this->title = title; }
/**
* Set the root location used when generating url.
*
* @param rootLocation the root location to use.
*/
void setRootLocation(const std::string& rootLocation) { this->rootLocation = rootLocation; }
/**
* Set the search url.
*
* @param searchUrl the search url to use.
*/
void setSearchDescriptionUrl(const std::string& searchDescriptionUrl) { this->searchDescriptionUrl = searchDescriptionUrl; }
/**
* Set some informations about the search results.
*
* @param totalResult the total number of results of the search.
* @param startIndex the start index of the result.
* @param count the number of result of the current set (or page).
*/
void setOpenSearchInfo(int totalResult, int startIndex, int count);
/**
* Set the library to dump.
*
* @param library The library to dump.
*/
void setLibrary(Library* library) { this->library = library; }
protected:
kiwix::Library* library;
std::string id;
std::string title;
std::string date;
std::string rootLocation;
std::string searchDescriptionUrl;
int m_totalResults;
int m_startIndex;
int m_count;
bool m_isSearchResult = false;
private:
pugi::xml_node handleBook(Book book, pugi::xml_node root_node);
};
}
#endif // KIWIX_OPDS_DUMPER_H

View File

@@ -20,85 +20,486 @@
#ifndef KIWIX_READER_H
#define KIWIX_READER_H
#include <zim/zim.h>
#include <zim/file.h>
#include <zim/article.h>
#include <zim/fileiterator.h>
#include <stdio.h>
#include <string>
#include <zim/article.h>
#include <zim/file.h>
#include <zim/fileiterator.h>
#include <zim/zim.h>
#include <exception>
#include <sstream>
#include <map>
#include "common/pathTools.h"
#include "common/stringTools.h"
#include <sstream>
#include <string>
#include "common.h"
#include "entry.h"
#include "tools/pathTools.h"
#include "tools/stringTools.h"
using namespace std;
namespace kiwix {
namespace kiwix
{
class Reader {
/**
* The Reader class is the class who allow to get an entry content from a zim
* file.
*/
class Reader
{
public:
/**
* Create a Reader to read a zim file specified by zimFilePath.
*
* @param zimFilePath The path to the zim file to read.
* The zim file can be splitted (.zimaa, .zimab, ...).
* In this case, the file path must still point to the
* unsplitted path as if the file were not splitted
* (.zim extesion).
*/
Reader(const string zimFilePath);
~Reader();
public:
Reader(const string zimFilePath);
~Reader();
/**
* Get the number of "displayable" entries in the zim file.
*
* @return If the zim file has a /M/Counter metadata, return the number of
* entries with the 'text/html' MIMEtype specified in the metadata.
* Else return the number of entries in the 'A' namespace.
*/
unsigned int getArticleCount() const;
void reset();
unsigned int getArticleCount();
unsigned int getMediaCount();
unsigned int getGlobalCount();
string getZimFilePath();
string getId();
string getRandomPageUrl();
string getFirstPageUrl();
string getMainPageUrl();
bool getMetatag(const string &url, string &content);
string getTitle();
string getDescription();
string getLanguage();
string getName();
string getTags();
string getDate();
string getCreator();
string getPublisher();
string getOrigId();
bool getFavicon(string &content, string &mimeType);
bool getPageUrlFromTitle(const string &title, string &url);
bool getMimeTypeByUrl(const string &url, string &mimeType);
bool getContentByUrl(const string &url, string &content, unsigned int &contentLength, string &contentType);
bool getContentByEncodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType, string &baseUrl);
bool getContentByEncodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType);
bool getContentByDecodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType, string &baseUrl);
bool getContentByDecodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType);
bool searchSuggestions(const string &prefix, unsigned int suggestionsCount, const bool reset = true);
bool searchSuggestionsSmart(const string &prefix, unsigned int suggestionsCount);
bool urlExists(const string &url);
bool hasFulltextIndex();
std::vector<std::string> getTitleVariants(const std::string &title);
bool getNextSuggestion(string &title);
bool getNextSuggestion(string &title, string &url);
bool canCheckIntegrity();
bool isCorrupted();
bool parseUrl(const string &url, char *ns, string &title);
unsigned int getFileSize();
zim::File* getZimFileHandler();
bool getArticleObjectByDecodedUrl(const string &url, zim::Article &article);
/**
* Get the number of media in the zim file.
*
* @return If the zim file has a /M/Counter metadata, return the number of
* entries with the 'image/jpeg', 'image/gif' and 'image/png' in
* the metadata.
* Else return the number of entries in the 'I' namespace.
*/
unsigned int getMediaCount() const;
protected:
zim::File* zimFileHandler;
zim::size_type firstArticleOffset;
zim::size_type lastArticleOffset;
zim::size_type currentArticleOffset;
zim::size_type nsACount;
zim::size_type nsICount;
std::string zimFilePath;
std::vector< std::vector<std::string> > suggestions;
std::vector< std::vector<std::string> >::iterator suggestionsOffset;
/**
* Get the number of all entries in the zim file.
*
* @return Return the number of all the entries, whatever their MIMEtype or
* their namespace.
*/
unsigned int getGlobalCount() const;
private:
std::map<std::string, unsigned int> parseCounterMetadata();
};
/**
* Get the path of the zim file.
*
* @return the path of the zim file as given in the constructor.
*/
string getZimFilePath() const;
/**
* Get the Id of the zim file.
*
* @return The uuid stored in the zim file.
*/
string getId() const;
/**
* Get the url of a random page.
*
* Deprecated : Use `getRandomPage` instead.
*
* @return Url of a random page. The page is picked from all entries in
* the 'A' namespace.
* The main page is excluded from the potential results.
*/
DEPRECATED string getRandomPageUrl() const;
/**
* Get a random page.
*
* @return A random Entry. The entry is picked from all entries in
* the 'A' namespace.
* The main entry is excluded from the potential results.
*/
Entry getRandomPage() const;
/**
* Get the url of the first page.
*
* Deprecated : Use `getFirstPage` instead.
*
* @return Url of the first entry in the 'A' namespace.
*/
DEPRECATED string getFirstPageUrl() const;
/**
* Get the entry of the first page.
*
* @return The first entry in the 'A' namespace.
*/
Entry getFirstPage() const;
/**
* Get the url of the main page.
*
* Deprecated : Use `getMainPage` instead.
*
* @return Url of the main page as specified in the zim file.
*/
DEPRECATED string getMainPageUrl() const;
/**
* Get the entry of the main page.
*
* @return Entry of the main page as specified in the zim file.
*/
Entry getMainPage() const;
/**
* Get the content of a metadata.
*
* @param[in] name The name of the metadata.
* @param[out] value The value will be set to the content of the metadata.
* @return True if it was possible to get the content of the metadata.
*/
bool getMetatag(const string& name, string& value) const;
/**
* Get the title of the zim file.
*
* @return The title of zim file as specified in the zim metadata.
* If no title has been set, return a title computed from the
* file path.
*/
string getTitle() const;
/**
* Get the description of the zim file.
*
* @return The description of the zim file as specified in the zim metadata.
* If no description has been set, return the subtitle.
*/
string getDescription() const;
/**
* Get the language of the zim file.
*
* @return The language of the zim file as specified in the zim metadata.
*/
string getLanguage() const;
/**
* Get the name of the zim file.
*
* @return The name of the zim file as specified in the zim metadata.
*/
string getName() const;
/**
* Get the tags of the zim file.
*
* @return The tags of the zim file as specified in the zim metadata.
*/
string getTags() const;
/**
* Get the date of the zim file.
*
* @return The date of the zim file as specified in the zim metadata.
*/
string getDate() const;
/**
* Get the creator of the zim file.
*
* @return The creator of the zim file as specified in the zim metadata.
*/
string getCreator() const;
/**
* Get the publisher of the zim file.
*
* @return The publisher of the zim file as specified in the zim metadata.
*/
string getPublisher() const;
/**
* Get the origId of the zim file.
*
* The origId is only used in the case of patch zim file and is the Id
* of the original zim file.
*
* @return The origId of the zim file as specified in the zim metadata.
*/
string getOrigId() const;
/**
* Get the favicon of the zim file.
*
* @param[out] content The content of the favicon.
* @param[out] mimeType The mimeType of the favicon.
* @return True if a favicon has been found.
*/
bool getFavicon(string& content, string& mimeType) const;
/**
* Get an entry associated to an path.
*
* @param path The path of the entry.
* @return The entry.
* @throw NoEntry If no entry correspond to the path.
*/
Entry getEntryFromPath(const std::string& path) const;
/**
* Get an entry associated to an url encoded path.
*
* Equivalent to `getEntryFromPath(urlDecode(path));`
*
* @param path The url encoded path.
* @return The entry.
* @throw NoEntry If no entry correspond to the path.
*/
Entry getEntryFromEncodedPath(const std::string& path) const;
/**
* Get un entry associated to a title.
*
* @param title The title.
* @return The entry
* throw NoEntry If no entry correspond to the url.
*/
Entry getEntryFromTitle(const std::string& title) const;
/**
* Get the url of a page specified by a title.
*
* @param[in] title the title of the page.
* @param[out] url the url of the page.
* @return True if the page can be found.
*/
DEPRECATED bool getPageUrlFromTitle(const string& title, string& url) const;
/**
* Get the mimetype of a entry specified by a url.
*
* @param[in] url the url of the entry.
* @param[out] mimeType the mimeType of the entry.
* @return True if the mimeType has been found.
*/
DEPRECATED bool getMimeTypeByUrl(const string& url, string& mimeType) const;
/**
* Get the content of an entry specifed by a url.
*
* Alias to `getContentByEncodedUrl`
*/
DEPRECATED bool getContentByUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
/**
* Get the content of an entry specified by a url encoded url.
*
* Equivalent to getContentByDecodedUrl(urlDecode(url), ...).
*/
DEPRECATED bool getContentByEncodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType,
string& baseUrl) const;
/**
* Get the content of an entry specified by an url encoded url.
*
* Equivalent to getContentByEncodedUrl but without baseUrl.
*/
DEPRECATED bool getContentByEncodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
/**
* Get the content of an entry specified by a url.
*
* @param[in] url The url of the entry.
* @param[out] content The content of the entry.
* @param[out] title the title of the entry.
* @param[out] contentLength The size of the entry (size of content).
* @param[out] contentType The mimeType of the entry.
* @param[out] baseUrl Return the true url of the entry.
* If the specified entry is a redirection, contains
* the url of the targeted entry.
* @return True if the entry has been found.
*/
DEPRECATED bool getContentByDecodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType,
string& baseUrl) const;
/**
* Get the content of an entry specified by a url.
*
* Equivalent to getContentByDecodedUrl but withou the baseUrl.
*/
DEPRECATED bool getContentByDecodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
/**
* Search for entries with title starting with prefix (case sensitive).
*
* Suggestions are stored in an internal vector and can be retrieved using
* `getNextSuggestion` method.
*
* @param prefix The prefix to search.
* @param suggestionsCount How many suggestions to search for.
* @param reset If true, remove previous suggestions in the internal vector.
* If false, add suggestions to the internal vector
* (until internal vector size is suggestionCount (or no more
* suggestion))
* @return True if some suggestions where added to the internal vector.
*/
bool searchSuggestions(const string& prefix,
unsigned int suggestionsCount,
const bool reset = true);
/**
* Search for entries for the given prefix.
*
* If the zim file has a internal fulltext index, the suggestions will be
* searched using it.
* Else the suggestions will be search using `searchSuggestions` while trying
* to be smart about case sensitivity (using `getTitleVariants`).
*
* In any case, suggestions are stored in an internal vector and can be
* retrieved using `getNextSuggestion` method.
* The internal vector will be reset.
*
* @param prefix The prefix to search for.
* @param suggestionsCount How many suggestions to search for.
*/
bool searchSuggestionsSmart(const string& prefix,
unsigned int suggestionsCount);
/**
* Check if the url exists in the zim file.
*
* Deprecated : Use `pathExists` instead.
*
* @param url the url to check.
* @return True if the url exits in the zim file.
*/
DEPRECATED bool urlExists(const string& url) const;
/**
* Check if the path exists in the zim file.
*
* @param path the path to check.
* @return True if the path exists in the zim file.
*/
bool pathExists(const string& path) const;
/**
* Check if the zim file has a embedded fulltext index.
*
* @return True if the zim file has a embedded fulltext index
* and is not split (else the fulltext is not accessible).
*/
bool hasFulltextIndex() const;
/**
* Get potential case title variations for a title.
*
* @param title a title.
* @return the list of variantions.
*/
std::vector<std::string> getTitleVariants(const std::string& title) const;
/**
* Get the next suggestion title.
*
* @param[out] title the title of the suggestion.
* @return True if title has been set.
*/
bool getNextSuggestion(string& title);
/**
* Get the next suggestion title and url.
*
* @param[out] title the title of the suggestion.
* @param[out] url the url of the suggestion.
* @return True if title and url have been set.
*/
bool getNextSuggestion(string& title, string& url);
/**
* Get if we can check zim file integrity (has a checksum).
*
* @return True if zim file have a checksum.
*/
bool canCheckIntegrity() const;
/**
* Check is zim file is corrupted.
*
* @return True if zim file is corrupted.
*/
bool isCorrupted() const;
/**
* Parse a full url into a namespace and url.
*
* @param[in] url The full url ("/N/url").
* @param[out] ns The namespace (N).
* @param[out] title The url (url).
* @return True
*/
DEPRECATED bool parseUrl(const string& url, char* ns, string& title) const;
/**
* Return the total size of the zim file.
*
* If zim file is split, return the sum of all parts' size.
*
* @return Size of the size file is KiB.
*/
unsigned int getFileSize() const;
/**
* Get the zim file handler.
*
* @return The libzim file handler.
*/
zim::File* getZimFileHandler() const;
/**
* Get the zim article object associated to a url.
*
* @param[in] url The url of the article.
* @param[out] article The libzim article object.
* @return True if the url is good (article.good()).
*/
DEPRECATED bool getArticleObjectByDecodedUrl(const string& url,
zim::Article& article) const;
protected:
zim::File* zimFileHandler;
zim::size_type firstArticleOffset;
zim::size_type lastArticleOffset;
zim::size_type nsACount;
zim::size_type nsICount;
std::string zimFilePath;
std::vector<std::vector<std::string>> suggestions;
std::vector<std::vector<std::string>>::iterator suggestionsOffset;
private:
std::map<const std::string, unsigned int> parseCounterMetadata() const;
};
}
#endif

View File

@@ -22,68 +22,177 @@
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <algorithm>
#include <vector>
#include <locale>
#include <cctype>
#include <vector>
#include "common/pathTools.h"
#include "common/stringTools.h"
#include <unicode/putil.h>
#include <algorithm>
#include <cctype>
#include <locale>
#include <string>
#include <vector>
#include <vector>
#include "tools/pathTools.h"
#include "tools/stringTools.h"
#include "kiwix_config.h"
using namespace std;
struct Result
namespace kiwix
{
string url;
string title;
int score;
string snippet;
int wordCount;
int size;
class Reader;
class Result
{
public:
virtual ~Result(){};
virtual std::string get_url() = 0;
virtual std::string get_title() = 0;
virtual int get_score() = 0;
virtual std::string get_snippet() = 0;
virtual std::string get_content() = 0;
virtual int get_wordCount() = 0;
virtual int get_size() = 0;
virtual int get_readerIndex() = 0;
};
namespace kiwix {
struct SearcherInternal;
/**
* The Searcher class is reponsible to do different kind of search using the
* fulltext index.
*
* Searcher may (if compiled with ctpp2) be used to
* generate a html page for the search result. This use a template that need a
* humanReaderName. This feature is only used by kiwix-serve and this should be
* move outside of Searcher (and with a better API). If you don't use the html
* rendering (getHtml method), you better should simply ignore the different
* humanReadeableName attributes (or give an empty string).
*/
class Searcher
{
public:
/**
* The default constructor.
*
* @param humanReadableName The global zim's humanReadableName.
* Used to generate pagination links.
*/
Searcher(const string& humanReadableName = "");
class Searcher {
~Searcher();
public:
Searcher();
virtual ~Searcher();
/**
* Add a reader (containing embedded fulltext index) to the search.
*
* @param reader The Reader for the zim containing the fulltext index.
* @param humanReaderName The human readable name of the reader.
* @return true if the reader has been added.
* false if the reader cannot be added (no embedded fulltext index present)
*/
bool add_reader(Reader* reader, const std::string& humanReaderName);
void search(std::string &search, unsigned int resultStart,
unsigned int resultEnd, const bool verbose=false);
bool getNextResult(string &url, string &title, unsigned int &score);
unsigned int getEstimatedResultCount();
bool setProtocolPrefix(const std::string prefix);
bool setSearchProtocolPrefix(const std::string prefix);
void reset();
void setContentHumanReadableId(const string &contentHumanReadableId);
/**
* Start a search on the zim associated to the Searcher.
*
* Search results should be retrived using the getNextResult method.
*
* @param search The search query.
* @param resultStart the start offset of the search results (used for pagination).
* @param resultEnd the end offset of the search results (used for pagination).
* @param verbose print some info on stdout if true.
*/
void search(std::string& search,
unsigned int resultStart,
unsigned int resultEnd,
const bool verbose = false);
#ifdef ENABLE_CTPP2
string getHtml();
#endif
protected:
std::string beautifyInteger(const unsigned int number);
virtual void closeIndex() = 0;
virtual void searchInIndex(string &search, const unsigned int resultStart,
const unsigned int resultEnd, const bool verbose=false) = 0;
/**
* Start a geographique search.
* The search return result for entry in a disc of center latitude/longitude
* and radius distance.
*
* Search results should be retrived using the getNextResult method.
*
* @param latitude The latitude of the center point.
* @param longitude The longitude of the center point.
* @param distance The radius of the disc.
* @param resultStart the start offset of the search results (used for pagination).
* @param resultEnd the end offset of the search results (used for pagination).
* @param verbose print some info on stdout if true.
*/
void geo_search(float latitude, float longitude, float distance,
unsigned int resultStart,
unsigned int resultEnd,
const bool verbose = false);
/**
* Start a suggestion search.
* The search made depend of the "version" of the embedded index.
* - If the index is newer enough and have a title namespace, the search is
* made in the titles only.
* - Else the search is made on the whole article content.
* In any case, the search is made "partial" (as adding '*' at the end of the query)
*
* @param search The search query.
* @param verbose print some info on stdout if true.
*/
void suggestions(std::string& search, const bool verbose = false);
/**
* Get the next result of a started search.
* This is the method to use to loop hover the search results.
*/
Result* getNextResult();
/**
* Restart the previous search.
* Next call to getNextResult will return the first result.
*/
void restart_search();
/**
* Get a estimation of the result count.
*/
unsigned int getEstimatedResultCount();
/**
* Set protocol prefix.
* Only used by getHtml.
*/
bool setProtocolPrefix(const std::string prefix);
/**
* Set search protocol prefix.
* Only used by getHtml.
*/
bool setSearchProtocolPrefix(const std::string prefix);
/**
* Generate the html page with the resutls of the search.
*/
string getHtml();
protected:
std::string beautifyInteger(const unsigned int number);
void closeIndex();
void searchInIndex(string& search,
const unsigned int resultStart,
const unsigned int resultEnd,
const bool verbose = false);
std::vector<Reader*> readers;
std::vector<std::string> humanReaderNames;
SearcherInternal* internal;
std::string searchPattern;
std::string protocolPrefix;
std::string searchProtocolPrefix;
unsigned int resultCountPerPage;
unsigned int estimatedResultCount;
unsigned int resultStart;
unsigned int resultEnd;
std::string contentHumanReadableId;
private:
void reset();
};
std::vector<Result> results;
std::vector<Result>::iterator resultOffset;
std::string searchPattern;
std::string protocolPrefix;
std::string searchProtocolPrefix;
std::string template_ct2;
unsigned int resultCountPerPage;
unsigned int estimatedResultCount;
unsigned int resultStart;
unsigned int resultEnd;
std::string contentHumanReadableId;
};
}

4
include/tools/base64.h Normal file
View File

@@ -0,0 +1,4 @@
#include <string>
std::string base64_encode(const std::string& inString);
std::string base64_decode(const std::string& s);

View File

@@ -20,29 +20,14 @@
#ifndef KIWIX_NETWORKTOOLS_H
#define KIWIX_NETWORKTOOLS_H
#ifdef _WIN32
#include <winsock2.h>
#include <ws2tcpip.h>
#else
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <net/if.h>
#include <netdb.h>
#endif
#include <iostream>
#include <vector>
#include <string>
#include <map>
#include <string>
namespace kiwix {
std::map<std::string, std::string> getNetworkInterfaces();
std::string getBestPublicIp();
namespace kiwix
{
std::map<std::string, std::string> getNetworkInterfaces();
std::string getBestPublicIp();
std::string download(const std::string& url);
}
#endif

View File

@@ -26,8 +26,13 @@
#include <unistd.h>
#endif
namespace kiwix {
void sleep(unsigned int milliseconds);
#include <pugixml.hpp>
namespace kiwix
{
void sleep(unsigned int milliseconds);
std::string nodeToString(pugi::xml_node node);
std::string converta2toa3(const std::string& a2code);
}
#endif

View File

@@ -20,18 +20,18 @@
#ifndef KIWIX_PATHTOOLS_H
#define KIWIX_PATHTOOLS_H
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fstream>
#include <ios>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#include <fstream>
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <ios>
#include <limits.h>
#ifdef _WIN32
#include <direct.h>
@@ -41,20 +41,24 @@
using namespace std;
bool isRelativePath(const string &path);
bool isRelativePath(const string& path);
string computeAbsolutePath(const string path, const string relativePath);
string computeRelativePath(const string path, const string absolutePath);
string removeLastPathElement(const string path, const bool removePreSeparator = false,
const bool removePostSeparator = false);
string appendToDirectory(const string &directoryPath, const string &filename);
string removeLastPathElement(const string path,
const bool removePreSeparator = false,
const bool removePostSeparator = false);
string appendToDirectory(const string& directoryPath, const string& filename);
unsigned int getFileSize(const string &path);
string getFileSizeAsString(const string &path);
bool fileExists(const string &path);
bool makeDirectory(const string &path);
bool copyFile(const string &sourcePath, const string &destPath);
string getLastPathElement(const string &path);
unsigned int getFileSize(const string& path);
string getFileSizeAsString(const string& path);
string getFileContent(const string& path);
bool fileExists(const string& path);
bool makeDirectory(const string& path);
string makeTmpDirectory();
bool copyFile(const string& sourcePath, const string& destPath);
string getLastPathElement(const string& path);
string getExecutablePath();
string getCurrentDirectory();
bool writeTextFile(const string &path, const string &content);
string getDataDirectory();
bool writeTextFile(const string& path, const string& content);
#endif

View File

@@ -22,11 +22,15 @@
#include <unicode/regex.h>
#include <unicode/ucnv.h>
#include <string>
#include <map>
#include <string>
bool matchRegex(const std::string &content, const std::string &regex);
std::string replaceRegex(const std::string &content, const std::string &replacement, const std::string &regex);
std::string appendToFirstOccurence(const std::string &content, const std::string regex, const std::string &replacement);
bool matchRegex(const std::string& content, const std::string& regex);
std::string replaceRegex(const std::string& content,
const std::string& replacement,
const std::string& regex);
std::string appendToFirstOccurence(const std::string& content,
const std::string regex,
const std::string& replacement);
#endif

View File

@@ -0,0 +1,79 @@
/*
* Copyright 2011-2012 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_STRINGTOOLS_H
#define KIWIX_STRINGTOOLS_H
#include <unicode/unistr.h>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include "pathTools.h"
namespace kiwix
{
std::string beautifyInteger(uint64_t number);
std::string beautifyFileSize(uint64_t number);
void printStringInHexadecimal(const char* s);
void printStringInHexadecimal(icu::UnicodeString s);
void stringReplacement(std::string& str,
const std::string& oldStr,
const std::string& newStr);
std::string encodeDiples(const std::string& str);
std::string removeAccents(const std::string& text);
void loadICUExternalTables();
std::string urlEncode(const std::string& value, bool encodeReserved = false);
std::string urlDecode(const std::string& value, bool component = false);
std::vector<std::string> split(const std::string&, const std::string&);
std::vector<std::string> split(const char*, const char*);
std::vector<std::string> split(const std::string&, const char*);
std::vector<std::string> split(const char*, const std::string&);
std::string ucAll(const std::string& word);
std::string lcAll(const std::string& word);
std::string ucFirst(const std::string& word);
std::string lcFirst(const std::string& word);
std::string toTitle(const std::string& word);
std::string normalize(const std::string& word);
template<typename T>
std::string to_string(T value)
{
std::ostringstream oss;
oss << value;
return oss.str();
}
template<typename T>
T extractFromString(const std::string& str) {
std::istringstream iss(str);
T ret;
iss >> ret;
return ret;
}
} //namespace kiwix
#endif

View File

@@ -1,56 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_XAPIAN_INDEXER_H
#define KIWIX_XAPIAN_INDEXER_H
#include <xapian.h>
#include "indexer.h"
using namespace std;
namespace kiwix {
class XapianIndexer : public Indexer {
public:
XapianIndexer();
protected:
void indexingPrelude(const string indexPath);
void index(const string &url,
const string &title,
const string &unaccentedTitle,
const string &keywords,
const string &content,
const string &snippet,
const string &size,
const string &wordCount);
void flush();
void indexingPostlude(const string indexPath);
Xapian::WritableDatabase writableDatabase;
Xapian::Stem stemmer;
Xapian::SimpleStopper stopper;
Xapian::TermGenerator indexer;
};
}
#endif

View File

@@ -1,54 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_XAPIAN_SEARCHER_H
#define KIWIX_XAPIAN_SEARCHER_H
#include <xapian.h>
#include "searcher.h"
using namespace std;
namespace kiwix {
class NoXapianIndexInZim: public exception {
virtual const char* what() const throw() {
return "There is no fulltext index in the zim file";
}
};
class XapianSearcher : public Searcher {
public:
XapianSearcher(const string &xapianDirectoryPath);
virtual ~XapianSearcher() {};
void searchInIndex(string &search, const unsigned int resultStart, const unsigned int resultEnd,
const bool verbose=false);
protected:
void closeIndex();
void openIndex(const string &xapianDirectoryPath);
Xapian::Database readableDatabase;
Xapian::Stem stemmer;
};
}
#endif

View File

@@ -4,7 +4,7 @@ includedir=${prefix}/include
Name: libkiwix
Description: A library that contains a lot of things used by used by other kiwix programs
Version: 1.0
Version: @version@
Requires: @requires@
Libs: -L${libdir} -lkiwix @extra_libs@
Cflags: -I${includedir}/ @extra_cflags@

View File

@@ -1,103 +1,60 @@
project('kiwixlib', 'cpp',
version : '0.1.0',
license : 'GPL')
project('kiwix-lib', 'cpp',
version : '4.1.0',
license : 'GPL',
default_options : ['c_std=c11', 'cpp_std=c++11', 'werror=true'])
compiler = meson.get_compiler('cpp')
find_library_in_compiler = meson.version().version_compare('>=0.31.0')
static_deps = get_option('android') or get_option('default_library') == 'static'
if get_option('android')
extra_libs = ['-llog']
else
extra_libs = []
endif
thread_dep = dependency('threads')
libicu_dep = dependency('icu-i18n')
libzim_dep = dependency('libzim')
pugixml_dep = dependency('pugixml')
libicu_dep = dependency('icu-i18n', static:static_deps)
libzim_dep = dependency('libzim', version : '>=4.0.0', static:static_deps)
pugixml_dep = dependency('pugixml', static:static_deps)
libcurl_dep = dependency('libcurl', static:static_deps)
has_ctpp2_dep = false
ctpp2_prefix_install = get_option('ctpp2-install-prefix')
ctpp2_link_args = []
if ctpp2_prefix_install == ''
if compiler.has_header('ctpp2/CTPP2Logger.hpp')
if find_library_in_compiler
ctpp2_lib = compiler.find_library('ctpp2')
else
ctpp2_lib = find_library('ctpp2')
endif
ctpp2_link_args = ['-lctpp2']
if meson.is_cross_build() and host_machine.system() == 'windows'
if find_library_in_compiler
iconv_lib = compiler.find_library('iconv', required:false)
else
iconv_lib = find_library('iconv', required:false)
endif
if iconv_lib.found()
ctpp2_link_args += ['-liconv']
endif
endif
has_ctpp2_dep = true
ctpp2_dep = declare_dependency(link_args:ctpp2_link_args)
else
message('ctpp2/CTPP2Logger.hpp not found. Compiling without CTPP2 support')
endif
else
if not find_library_in_compiler
error('For custom ctpp2_prefix_install you need a meson version >=0.31.0')
endif
ctpp2_include_path = ctpp2_prefix_install + '/include'
ctpp2_include_args = ['-I'+ctpp2_include_path]
if compiler.has_header('ctpp2/CTPP2Logger.hpp', args:ctpp2_include_args)
ctpp2_include_dir = include_directories(ctpp2_include_path, is_system:true)
ctpp2_lib_path = ctpp2_prefix_install+'/lib'
ctpp2_lib = compiler.find_library('ctpp2', dirs:ctpp2_lib_path)
ctpp2_link_args = ['-L'+ctpp2_lib_path, '-lctpp2']
if meson.is_cross_build() and host_machine.system() == 'windows'
iconv_lib = compiler.find_library('iconv', required:false)
if iconv_lib.found()
ctpp2_link_args += ['-liconv']
endif
endif
has_ctpp2_dep = true
ctpp2_dep = declare_dependency(include_directories:ctpp2_include_dir, link_args:ctpp2_link_args)
else
message('ctpp2/CTPP2Logger.hpp not found. Compiling without CTPP2 support')
endif
if not compiler.has_header('mustache.hpp')
error('Cannot found header mustache.hpp')
endif
xapian_dep = dependency('xapian-core', required:false)
all_deps = [thread_dep, libicu_dep, libzim_dep, xapian_dep, pugixml_dep]
if has_ctpp2_dep
all_deps += [ctpp2_dep]
extra_cflags = ''
if target_machine.system() == 'windows' and static_deps
add_project_arguments('-DCURL_STATICLIB', language : 'cpp')
extra_cflags += '-DCURL_STATICLIB'
endif
all_deps = [thread_dep, libicu_dep, libzim_dep, pugixml_dep, libcurl_dep]
inc = include_directories('include')
conf = configuration_data()
conf.set('VERSION', '"@0@"'.format(meson.project_version()))
conf.set('ENABLE_CTPP2', has_ctpp2_dep)
if build_machine.system() == 'windows'
extra_link_args = ['-lshlwapi', '-lwinmm']
else
extra_link_args = []
endif
subdir('include')
subdir('scripts')
subdir('static')
subdir('src')
subdir('test')
pkg_requires = ['libzim', 'icu-i18n', 'pugixml']
if xapian_dep.found()
pkg_requires += ['xapian-core']
endif
extra_libs = []
extra_cflags = ''
if has_ctpp2_dep
extra_libs += ctpp2_link_args
if ctpp2_include_path != ''
extra_cflags = '-I'+ctpp2_include_path
endif
endif
pkg_requires = ['libzim', 'icu-i18n', 'pugixml', 'libcurl']
pkg_conf = configuration_data()
pkg_conf.set('prefix', get_option('prefix'))
pkg_conf.set('requires', ' '.join(pkg_requires))
pkg_conf.set('extra_libs', ' '.join(extra_libs))
pkg_conf.set('extra_cflags', extra_cflags)
pkg_conf.set('version', meson.project_version())
configure_file(output : 'kiwix.pc',
configuration : pkg_conf,
input : 'kiwix.pc.in',

View File

@@ -1,4 +1,2 @@
option('ctpp2-install-prefix', type : 'string', value : '',
description : 'Prefix where ctpp libs has been installed')
option('android', type : 'boolean', value : false,
description : 'Do we make a kiwix-lib for android')

View File

@@ -1,5 +1,24 @@
#!/usr/bin/env python3
'''
Copyright 2016 Matthieu Gautier <mgautier@kymeria.fr>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or any
later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.
'''
import argparse
import os.path
import re
@@ -38,12 +57,21 @@ extern const std::string {identifier};
{namespaces_close}"""
class Resource:
def __init__(self, base_dir, filename):
def __init__(self, base_dirs, filename):
filename = filename.strip()
self.filename = filename
self.identifier = full_identifier(filename)
with open(os.path.join(base_dir, filename), 'rb') as f:
self.data = f.read()
found = False
for base_dir in base_dirs:
try:
with open(os.path.join(base_dir, filename), 'rb') as f:
self.data = f.read()
found = True
break
except FileNotFoundError:
continue
if not found:
raise Exception("Impossible to found {}".format(filename))
def dump_impl(self):
nb_row = len(self.data)//16 + (1 if len(self.data) % 16 else 0)
@@ -78,16 +106,8 @@ master_c_template = """//This file is automaically generated. Do not modify it.
#include <stdlib.h>
#include <fstream>
#include <stdexcept>
#include "{include_file}"
class ResourceNotFound : public std::runtime_error {{
public:
ResourceNotFound(const std::string& what_arg):
std::runtime_error(what_arg)
{{ }};
}};
static std::string init_resource(const char* name, const unsigned char* content, int len)
{{
char * resPath = getenv(name);
@@ -125,11 +145,19 @@ master_h_template = """//This file is automaically generated. Do not modify it.
#define KIWIX_{BASENAME}
#include <string>
#include <stdexcept>
namespace RESOURCE {{
{RESOURCES}
}};
class ResourceNotFound : public std::runtime_error {{
public:
ResourceNotFound(const std::string& what_arg):
std::runtime_error(what_arg)
{{ }};
}};
const std::string& getResource_{basename}(const std::string& name);
#define getResource(a) (getResource_{basename}(a))
@@ -151,13 +179,18 @@ if __name__ == "__main__":
help='The Cpp file name to generate')
parser.add_argument('--hfile',
help='The h file name to generate')
parser.add_argument('--source_dir',
help="Additional directory where to look for resources.",
action='append')
parser.add_argument('resource_file',
help='The list of resources to compile.')
args = parser.parse_args()
base_dir = os.path.dirname(os.path.realpath(args.resource_file))
source_dir = args.source_dir or []
with open(args.resource_file, 'r') as f:
resources = [Resource(base_dir, filename) for filename in f.readlines()]
resources = [Resource([base_dir]+source_dir, filename)
for filename in f.readlines()]
h_identifier = to_identifier(os.path.basename(args.hfile))
with open(args.hfile, 'w') as f:

View File

@@ -1,4 +1,4 @@
res_compiler = find_program('compile_resources.py')
res_compiler = find_program('kiwix-compile-resources')
install_data(res_compiler.path(), install_dir:get_option('bindir'))

View File

@@ -0,0 +1,13 @@
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="kiwix.org.kiwixlib"
>
<application android:allowBackup="true"
android:label="@string/app_name"
android:supportsRtl="true"
>
</application>
</manifest>

View File

@@ -1,11 +0,0 @@
#!/usr/bin/env bash
set -e
BUILD_PATH=$(pwd)
javac -d $BUILD_PATH/src/android $1 $2 $3 $4
cd $BUILD_PATH/src/android
javah -jni org.kiwix.kiwixlib.JNIKiwix
cd $BUILD_PATH

View File

@@ -1,486 +1,44 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <jni.h>
#include "org_kiwix_kiwixlib_JNIKiwix.h"
#include <stdio.h>
#include <string.h>
#include <iostream>
#include <string>
#include "unicode/putil.h"
#include "reader.h"
#include "xapianSearcher.h"
#include "common/base64.h"
#include <android/log.h>
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, "kiwix", __VA_ARGS__)
#include "utils.h"
#include <xapian.h>
#include <zim/zim.h>
#include <zim/file.h>
#include <zim/article.h>
#include <zim/error.h>
pthread_mutex_t globalLock = PTHREAD_RECURSIVE_MUTEX_INITIALIZER;
/* global variables */
kiwix::Reader *reader = NULL;
kiwix::XapianSearcher *searcher = NULL;
static pthread_mutex_t readerLock = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t searcherLock = PTHREAD_MUTEX_INITIALIZER;
/* c2jni type conversion functions */
jboolean c2jni(const bool &val) {
return val ? JNI_TRUE : JNI_FALSE;
}
jstring c2jni(const std::string &val, JNIEnv *env) {
return env->NewStringUTF(val.c_str());
}
jint c2jni(const int val) {
return (jint)val;
}
jint c2jni(const unsigned val) {
return (unsigned)val;
}
/* jni2c type conversion functions */
bool jni2c(const jboolean &val) {
return val == JNI_TRUE;
}
std::string jni2c(const jstring &val, JNIEnv *env) {
return std::string(env->GetStringUTFChars(val, 0));
}
int jni2c(const jint val) {
return (int)val;
}
/* Method to deal with variable passed by reference */
void setStringObjValue(const std::string &value, const jobject obj, JNIEnv *env) {
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "Ljava/lang/String;");
env->SetObjectField(obj, objFid, c2jni(value, env));
}
void setIntObjValue(const int value, const jobject obj, JNIEnv *env) {
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "I");
env->SetIntField(obj, objFid, value);
}
void setBoolObjValue(const bool value, const jobject obj, JNIEnv *env) {
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "Z");
env->SetIntField(obj, objFid, c2jni(value));
}
/* Kiwix library functions */
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMainPage(JNIEnv *env, jobject obj) {
jstring url;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cUrl = reader->getMainPageUrl();
url = c2jni(cUrl, env);
} catch (...) {
std::cerr << "Unable to get ZIM main page" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return url;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getId(JNIEnv *env, jobject obj) {
jstring id;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cId = reader->getId();
id = c2jni(cId, env);
} catch (...) {
std::cerr << "Unable to get ZIM id" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return id;
}
JNIEXPORT jint JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getFileSize(JNIEnv *env, jobject obj) {
jint size;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
int cSize = reader->getFileSize();
size = c2jni(cSize);
} catch (...) {
std::cerr << "Unable to get ZIM file size" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return size;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getCreator(JNIEnv *env, jobject obj) {
jstring creator;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cCreator = reader->getCreator();
creator = c2jni(cCreator, env);
} catch (...) {
std::cerr << "Unable to get ZIM creator" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return creator;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPublisher(JNIEnv *env, jobject obj) {
jstring publisher;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cPublisher = reader->getPublisher();
publisher = c2jni(cPublisher, env);
} catch (...) {
std::cerr << "Unable to get ZIM creator" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return publisher;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getName(JNIEnv *env, jobject obj) {
jstring name;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cName = reader->getName();
name = c2jni(cName, env);
} catch (...) {
std::cerr << "Unable to get ZIM name" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return name;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getFavicon(JNIEnv *env, jobject obj) {
jstring favicon;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cContent;
std::string cMime;
reader->getFavicon(cContent, cMime);
favicon = c2jni(base64_encode(reinterpret_cast<const unsigned char*>(cContent.c_str()), cContent.length()), env);
} catch (...) {
std::cerr << "Unable to get ZIM favicon" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return favicon;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDate(JNIEnv *env, jobject obj) {
jstring date;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cDate = reader->getDate();
date = c2jni(cDate, env);
} catch (...) {
std::cerr << "Unable to get ZIM date" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return date;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getLanguage(JNIEnv *env, jobject obj) {
jstring language;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cLanguage = reader->getLanguage();
language = c2jni(cLanguage, env);
} catch (...) {
std::cerr << "Unable to get ZIM language" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return language;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMimeType(JNIEnv *env, jobject obj, jstring url) {
jstring mimeType;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
std::string cUrl = jni2c(url, env);
try {
std::string cMimeType;
reader->getMimeTypeByUrl(cUrl, cMimeType);
mimeType = c2jni(cMimeType, env);
} catch (...) {
std::cerr << "Unable to get mime-type for url " << cUrl << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return mimeType;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadZIM(JNIEnv *env, jobject obj, jstring path) {
jboolean retVal = JNI_TRUE;
std::string cPath = jni2c(path, env);
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) delete reader;
reader = new kiwix::Reader(cPath);
} catch (...) {
std::cerr << "Unable to load ZIM " << cPath << std::endl;
reader = NULL;
retVal = JNI_FALSE;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getContent(JNIEnv *env, jobject obj, jstring url, jobject mimeTypeObj, jobject sizeObj) {
/* Default values */
setStringObjValue("", mimeTypeObj, env);
setIntObjValue(0, sizeObj, env);
jbyteArray data = env->NewByteArray(0);
/* Retrieve the content */
if (reader != NULL) {
std::string cUrl = jni2c(url, env);
std::string cData;
std::string cMimeType;
unsigned int cSize = 0;
pthread_mutex_lock(&readerLock);
try {
if (reader->getContentByUrl(cUrl, cData, cSize, cMimeType)) {
data = env->NewByteArray(cSize);
env->SetByteArrayRegion(data, 0, cSize, reinterpret_cast<const jbyte*>(cData.c_str()));
setStringObjValue(cMimeType, mimeTypeObj, env);
setIntObjValue(cSize, sizeObj, env);
}
} catch (...) {
std::cerr << "Unable to get content for url " << cUrl << std::endl;
}
pthread_mutex_unlock(&readerLock);
}
return data;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_searchSuggestions
(JNIEnv *env, jobject obj, jstring prefix, jint count) {
jboolean retVal = JNI_FALSE;
std::string cPrefix = jni2c(prefix, env);
unsigned int cCount = jni2c(count);
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) {
if (reader->searchSuggestionsSmart(cPrefix, cCount)) {
retVal = JNI_TRUE;
}
}
} catch (...) {
std::cerr << "Unable to search suggestions for pattern " << cPrefix << std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getNextSuggestion
(JNIEnv *env, jobject obj, jobject titleObj) {
jboolean retVal = JNI_FALSE;
std::string cTitle;
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) {
if (reader->getNextSuggestion(cTitle)) {
setStringObjValue(cTitle, titleObj, env);
retVal = JNI_TRUE;
}
}
} catch (...) {
std::cerr << "Unable to get next suggestion" << std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPageUrlFromTitle
(JNIEnv *env, jobject obj, jstring title, jobject urlObj) {
jboolean retVal = JNI_FALSE;
std::string cTitle = jni2c(title, env);
std::string cUrl;
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) {
if (reader->getPageUrlFromTitle(cTitle, cUrl)) {
setStringObjValue(cUrl, urlObj, env);
retVal = JNI_TRUE;
}
}
} catch (...) {
std::cerr << "Unable to get URL for title " << cTitle << std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getTitle
(JNIEnv *env , jobject obj, jobject titleObj) {
jboolean retVal = JNI_FALSE;
std::string cTitle;
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) {
std::string cTitle = reader->getTitle();
setStringObjValue(cTitle, titleObj, env);
retVal = JNI_TRUE;
}
} catch (...) {
std::cerr << "Unable to get ZIM title" << std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDescription(JNIEnv *env, jobject obj) {
jstring description;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cDescription = reader->getDescription();
description = c2jni(cDescription, env);
} catch (...) {
std::cerr << "Unable to get ZIM description" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return description;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getRandomPage
(JNIEnv *env, jobject obj, jobject urlObj) {
jboolean retVal = JNI_FALSE;
std::string cUrl;
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) {
std::string cUrl = reader->getRandomPageUrl();
setStringObjValue(cUrl, urlObj, env);
retVal = JNI_TRUE;
}
} catch (...) {
std::cerr << "Unable to get random page" << std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_setDataDirectory
(JNIEnv *env, jobject obj, jstring dirStr) {
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_setDataDirectory(
JNIEnv* env, jobject obj, jstring dirStr)
{
std::string cPath = jni2c(dirStr, env);
pthread_mutex_lock(&readerLock);
Lock l;
try {
u_setDataDirectory(cPath.c_str());
} catch (...) {
std::cerr << "Unable to set data directory " << cPath << std::endl;
}
pthread_mutex_unlock(&readerLock);
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadFulltextIndex(JNIEnv *env, jobject obj, jstring path) {
jboolean retVal = JNI_TRUE;
std::string cPath = jni2c(path, env);
pthread_mutex_lock(&searcherLock);
searcher = NULL;
try {
if (searcher != NULL) delete searcher;
searcher = new kiwix::XapianSearcher(cPath);
} catch (...) {
searcher = NULL;
retVal = JNI_FALSE;
std::cerr << "Unable to load full text index " << cPath << std::endl;
}
pthread_mutex_unlock(&searcherLock);
return retVal;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_indexedQuery
(JNIEnv *env, jclass obj, jstring query, jint count) {
std::string cQuery = jni2c(query, env);
unsigned int cCount = jni2c(count);
std::string url;
std::string title;
std::string result;
unsigned int score;
pthread_mutex_lock(&searcherLock);
try {
if (searcher != NULL) {
searcher->search(cQuery, 0, count);
while (searcher->getNextResult(url, title, score) &&
!title.empty() &&
!url.empty()) {
result += title + "\n";
}
}
} catch (...) {
std::cerr << "Unable to make indexed query " << cQuery << std::endl;
}
pthread_mutex_unlock(&searcherLock);
return env->NewStringUTF(result.c_str());
}

424
src/android/kiwixreader.cpp Normal file
View File

@@ -0,0 +1,424 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <jni.h>
#include <zim/file.h>
#include <android/log.h>
#include "org_kiwix_kiwixlib_JNIKiwixReader.h"
#include "tools/base64.h"
#include "reader.h"
#include "utils.h"
/* Kiwix Reader JNI functions */
JNIEXPORT jlong JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getNativeReader(
JNIEnv* env, jobject obj, jstring filename)
{
std::string cPath = jni2c(filename, env);
__android_log_print(ANDROID_LOG_INFO, "kiwix", "Attempting to create reader with: %s", cPath.c_str());
Lock l;
try {
kiwix::Reader* reader = new kiwix::Reader(cPath);
return reinterpret_cast<jlong>(new Handle<kiwix::Reader>(reader));
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_WARN, "kiwix", "Error opening ZIM file");
__android_log_print(ANDROID_LOG_WARN, "kiwix", e.what());
return 0;
}
}
JNIEXPORT void JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_dispose(JNIEnv* env, jobject obj)
{
Handle<kiwix::Reader>::dispose(env, obj);
}
#define READER (Handle<kiwix::Reader>::getHandle(env, obj))
/* Kiwix library functions */
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getMainPage(JNIEnv* env, jobject obj)
{
jstring url;
try {
std::string cUrl = READER->getMainPage().getPath();
url = c2jni(cUrl, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM main page");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
url = NULL;
}
return url;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getId(JNIEnv* env, jobject obj)
{
jstring id;
try {
std::string cId = READER->getId();
id = c2jni(cId, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM id");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
id = NULL;
}
return id;
}
JNIEXPORT jint JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getFileSize(JNIEnv* env, jobject obj)
{
jint size = 0;
try {
int cSize = READER->getFileSize();
size = c2jni(cSize);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM file size");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
}
return size;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getCreator(JNIEnv* env, jobject obj)
{
jstring creator;
try {
std::string cCreator = READER->getCreator();
creator = c2jni(cCreator, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM creator");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
creator = NULL;
}
return creator;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getPublisher(JNIEnv* env, jobject obj)
{
jstring publisher;
try {
std::string cPublisher = READER->getPublisher();
publisher = c2jni(cPublisher, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM publish");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
publisher = NULL;
}
return publisher;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getName(JNIEnv* env, jobject obj)
{
jstring name;
try {
std::string cName = READER->getName();
name = c2jni(cName, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM name");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
name = NULL;
}
return name;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getFavicon(JNIEnv* env, jobject obj)
{
jstring favicon;
try {
std::string cContent;
std::string cMime;
READER->getFavicon(cContent, cMime);
favicon = c2jni(
base64_encode(cContent),
env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM favicon");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
favicon = NULL;
}
return favicon;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getDate(JNIEnv* env, jobject obj)
{
jstring date;
try {
std::string cDate = READER->getDate();
date = c2jni(cDate, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM date");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
date = NULL;
}
return date;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getLanguage(JNIEnv* env, jobject obj)
{
jstring language;
try {
std::string cLanguage = READER->getLanguage();
language = c2jni(cLanguage, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM language");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
language = NULL;
}
return language;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getMimeType(
JNIEnv* env, jobject obj, jstring url)
{
jstring mimeType;
std::string cUrl = jni2c(url, env);
try {
auto entry = READER->getEntryFromEncodedPath(cUrl);
auto cMimeType = entry.getMimetype();
mimeType = c2jni(cMimeType, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get mime-type for url: %s", cUrl.c_str());
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
mimeType = NULL;
}
return mimeType;
}
JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getContent(
JNIEnv* env, jobject obj, jstring url, jobject titleObj, jobject mimeTypeObj, jobject sizeObj)
{
/* Default values */
setStringObjValue("", titleObj, env);
setStringObjValue("", mimeTypeObj, env);
setIntObjValue(0, sizeObj, env);
jbyteArray data = env->NewByteArray(0);
/* Retrieve the content */
std::string cUrl = jni2c(url, env);
unsigned int cSize = 0;
try {
auto entry = READER->getEntryFromEncodedPath(cUrl);
entry = entry.getFinalEntry();
cSize = entry.getSize();
setIntObjValue(cSize, sizeObj, env);
data = env->NewByteArray(cSize);
env->SetByteArrayRegion(
data, 0, cSize, reinterpret_cast<const jbyte*>(entry.getBlob().data()));
setStringObjValue(entry.getMimetype(), mimeTypeObj, env);
setStringObjValue(entry.getTitle(), titleObj, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get content for url: %s", cUrl.c_str());
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
}
return data;
}
JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getContentPart(
JNIEnv* env, jobject obj, jstring url, jint offset, jint len, jobject sizeObj)
{
jbyteArray data = env->NewByteArray(0);
setIntObjValue(0, sizeObj, env);
/* Default values */
/* Retrieve the content */
std::string cUrl = jni2c(url, env);
unsigned int cOffset = jni2c(offset);
unsigned int cLen = jni2c(len);
try {
auto entry = READER->getEntryFromEncodedPath(cUrl);
entry = entry.getFinalEntry();
if (cLen == 0) {
setIntObjValue(entry.getSize(), sizeObj, env);
} else if (cOffset+cLen < entry.getSize()) {
auto blob = entry.getBlob(cOffset, cLen);
data = env->NewByteArray(cLen);
env->SetByteArrayRegion(
data, 0, cLen, reinterpret_cast<const jbyte*>(blob.data()));
setIntObjValue(cLen, sizeObj, env);
}
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get partial content for url: %s (%u : %u)", cUrl.c_str(), cOffset, cLen);
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
}
return data;
}
JNIEXPORT jobject JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getDirectAccessInformation(
JNIEnv* env, jobject obj, jstring url)
{
jclass classPair = env->FindClass("org/kiwix/kiwixlib/Pair");
jmethodID midPairinit = env->GetMethodID(classPair, "<init>", "()V");
jobject pair = env->NewObject(classPair, midPairinit);
setPairObjValue("", 0, pair, env);
std::string cUrl = jni2c(url, env);
try {
auto entry = READER->getEntryFromEncodedPath(cUrl);
entry = entry.getFinalEntry();
auto part_info = entry.getDirectAccessInfo();
setPairObjValue(part_info.first, part_info.second, pair, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get direct access info for url: %s", cUrl.c_str());
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
}
return pair;
}
JNIEXPORT jboolean JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_searchSuggestions(JNIEnv* env,
jobject obj,
jstring prefix,
jint count)
{
jboolean retVal = JNI_FALSE;
std::string cPrefix = jni2c(prefix, env);
unsigned int cCount = jni2c(count);
try {
if (READER->searchSuggestionsSmart(cPrefix, cCount)) {
retVal = JNI_TRUE;
}
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_WARN, "kiwix", "Unable to get search results for pattern: %s", cPrefix.c_str());
__android_log_print(ANDROID_LOG_WARN, "kiwix", e.what());
}
return retVal;
}
JNIEXPORT jboolean JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getNextSuggestion(JNIEnv* env,
jobject obj,
jobject titleObj)
{
jboolean retVal = JNI_FALSE;
std::string cTitle;
try {
if (READER->getNextSuggestion(cTitle)) {
setStringObjValue(cTitle, titleObj, env);
retVal = JNI_TRUE;
}
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_WARN, "kiwix", "Unable to get next suggestion");
__android_log_print(ANDROID_LOG_WARN, "kiwix", e.what());
}
return retVal;
}
JNIEXPORT jboolean JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getPageUrlFromTitle(JNIEnv* env,
jobject obj,
jstring title,
jobject urlObj)
{
std::string cTitle = jni2c(title, env);
try {
auto entry = READER->getEntryFromTitle(cTitle);
entry = entry.getFinalEntry();
setStringObjValue(entry.getPath(), urlObj, env);
return JNI_TRUE;
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_WARN, "kiwix", "Unable to get url for title %s: ", cTitle.c_str());
__android_log_print(ANDROID_LOG_WARN, "kiwix", e.what());
}
return JNI_FALSE;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getTitle(
JNIEnv* env, jobject obj)
{
jstring title;
try {
std::string cTitle = READER->getTitle();
title = c2jni(cTitle, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get zim title");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
title = NULL;
}
return title;
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixReader_getDescription(JNIEnv* env, jobject obj)
{
jstring description;
try {
std::string cDescription = READER->getDescription();
description = c2jni(cDescription, env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get zim description");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
description = NULL;
}
return description;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getRandomPage(
JNIEnv* env, jobject obj, jobject urlObj)
{
jboolean retVal = JNI_FALSE;
std::string cUrl;
try {
std::string cUrl = READER->getRandomPage().getPath();
setStringObjValue(cUrl, urlObj, env);
retVal = JNI_TRUE;
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get random page");
__android_log_print(ANDROID_LOG_ERROR, "kiwix", e.what());
}
return retVal;
}

View File

@@ -0,0 +1,124 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <zim/file.h>
#include "org_kiwix_kiwixlib_JNIKiwixSearcher.h"
#include "org_kiwix_kiwixlib_JNIKiwixSearcher_Result.h"
#include "reader.h"
#include "searcher.h"
#include "utils.h"
#define SEARCHER (Handle<kiwix::Searcher>::getHandle(env, obj))
#define RESULT (Handle<kiwix::Result>::getHandle(env, obj))
JNIEXPORT void JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_dispose(JNIEnv* env, jobject obj)
{
Handle<kiwix::Searcher>::dispose(env, obj);
}
/* Kiwix Reader JNI functions */
JNIEXPORT jlong JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_getNativeHandle(JNIEnv* env,
jobject obj)
{
kiwix::Searcher* searcher = new kiwix::Searcher();
return reinterpret_cast<jlong>(new Handle<kiwix::Searcher>(searcher));
}
/* Kiwix library functions */
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwixSearcher_addReader(
JNIEnv* env, jobject obj, jobject reader)
{
auto searcher = SEARCHER;
searcher->add_reader(*(Handle<kiwix::Reader>::getHandle(env, reader)), "");
}
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwixSearcher_search(
JNIEnv* env, jobject obj, jstring query, jint count)
{
std::string cquery = jni2c(query, env);
unsigned int ccount = jni2c(count);
SEARCHER->search(cquery, 0, ccount);
}
JNIEXPORT jobject JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_getNextResult(JNIEnv* env,
jobject obj)
{
jobject result = nullptr;
kiwix::Result* cresult = SEARCHER->getNextResult();
if (cresult != nullptr) {
jclass resultclass
= env->FindClass("org/kiwix/kiwixlib/JNIKiwixSearcher$Result");
jmethodID ctor = env->GetMethodID(
resultclass, "<init>", "(Lorg/kiwix/kiwixlib/JNIKiwixSearcher;JLorg/kiwix/kiwixlib/JNIKiwixSearcher;)V");
result = env->NewObject(resultclass, ctor, obj, reinterpret_cast<jlong>(new Handle<kiwix::Result>(cresult)), obj);
}
return result;
}
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwixSearcher_00024Result_dispose(
JNIEnv* env, jobject obj)
{
Handle<kiwix::Result>::dispose(env, obj);
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_00024Result_getUrl(JNIEnv* env,
jobject obj)
{
try {
return c2jni(RESULT->get_url(), env);
} catch (...) {
return nullptr;
}
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_00024Result_getTitle(JNIEnv* env,
jobject obj)
{
try {
return c2jni(RESULT->get_title(), env);
} catch (...) {
return nullptr;
}
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_00024Result_getSnippet(JNIEnv* env,
jobject obj)
{
return c2jni(RESULT->get_snippet(), env);
}
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwixSearcher_00024Result_getContent(JNIEnv* env,
jobject obj)
{
return c2jni(RESULT->get_content(), env);
}

View File

@@ -1,13 +1,26 @@
jni_generator = find_program('gen_kiwix.sh')
kiwix_jni = custom_target('jni',
input: ['org/kiwix/kiwixlib/JNIKiwix.java',
'org/kiwix/kiwixlib/JNIKiwixReader.java',
'org/kiwix/kiwixlib/JNIKiwixSearcher.java',
'org/kiwix/kiwixlib/JNIKiwixInt.java',
'org/kiwix/kiwixlib/JNIKiwixString.java',
'org/kiwix/kiwixlib/JNIKiwixBool.java'],
output: ['org_kiwix_kiwixlib_JNIKiwix.h'],
command:[jni_generator, '@INPUT@']
'org/kiwix/kiwixlib/JNIKiwixBool.java',
'org/kiwix/kiwixlib/JNIKiwixException.java',
'org/kiwix/kiwixlib/Pair.java'],
output: ['org_kiwix_kiwixlib_JNIKiwix.h',
'org_kiwix_kiwixlib_JNIKiwixReader.h',
'org_kiwix_kiwixlib_JNIKiwixSearcher.h',
'org_kiwix_kiwixlib_JNIKiwixSearcher_Result.h'],
command:['javac', '-d', '@OUTDIR@', '-h', '@OUTDIR@', '@INPUT@']
)
kiwix_sources += ['android/kiwix.cpp', kiwix_jni]
kiwix_sources += [
'android/kiwix.cpp',
'android/kiwixreader.cpp',
'android/kiwixsearcher.cpp',
kiwix_jni]
install_subdir('org', install_dir: 'kiwix-lib/java')
install_subdir('res', install_dir: 'kiwix-lib')
install_data('AndroidManifest.xml', install_dir: 'kiwix-lib')

View File

@@ -1,5 +1,6 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,59 +20,12 @@
package org.kiwix.kiwixlib;
import org.kiwix.kiwixlib.JNIKiwixReader;
import org.kiwix.kiwixlib.JNIKiwixString;
import org.kiwix.kiwixlib.JNIKiwixBool;
import org.kiwix.kiwixlib.JNIKiwixInt;
public class JNIKiwix {
static {
System.loadLibrary("kiwix");
}
public native String getMainPage();
public native String getId();
public native String getLanguage();
public native String getMimeType(String url);
public native boolean loadZIM(String path);
public native boolean loadFulltextIndex(String path);
public native byte[] getContent(String url, JNIKiwixString mimeType, JNIKiwixInt size);
public native boolean searchSuggestions(String prefix, int count);
public native boolean getNextSuggestion(JNIKiwixString title);
public native boolean getPageUrlFromTitle(String title, JNIKiwixString url);
public native boolean getTitle(JNIKiwixString title);
public native String getDescription();
public native String getDate();
public native String getFavicon();
public native String getCreator();
public native String getPublisher();
public native String getName();
public native int getFileSize();
public native int getArticleCount();
public native int getMediaCount();
public native boolean getRandomPage(JNIKiwixString url);
public class JNIKiwix
{
static { System.loadLibrary("kiwix"); }
public native void setDataDirectory(String icuDataDir);
public static native String indexedQuery(String db, int count);
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,7 +19,7 @@
package org.kiwix.kiwixlib;
public class JNIKiwixBool {
public class JNIKiwixBool
{
public boolean value;
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2014 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -17,12 +17,11 @@
* MA 02110-1301, USA.
*/
#include <common/otherTools.h>
package org.kiwix.kiwixlib;
void kiwix::sleep(unsigned int milliseconds) {
#ifdef _WIN32
Sleep(milliseconds);
#else
usleep(1000 * milliseconds);
#endif
public class JNIKiwixException extends Exception
{
public JNIKiwixException(String message) {
super(message);
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,8 +19,7 @@
package org.kiwix.kiwixlib;
public class JNIKiwixInt {
public class JNIKiwixInt
{
public int value;
}

View File

@@ -0,0 +1,127 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
package org.kiwix.kiwixlib;
import org.kiwix.kiwixlib.JNIKiwixException;
import org.kiwix.kiwixlib.JNIKiwixString;
import org.kiwix.kiwixlib.JNIKiwixInt;
import org.kiwix.kiwixlib.JNIKiwixSearcher;
import org.kiwix.kiwixlib.Pair;
public class JNIKiwixReader
{
public native String getMainPage();
public native String getTitle();
public native String getId();
public native String getLanguage();
public native String getMimeType(String url);
public native byte[] getContent(String url,
JNIKiwixString title,
JNIKiwixString mimeType,
JNIKiwixInt size);
/**
* getContentPart.
*
* Get only a part of the content of the article.
* Return a byte array of `len` size starting from offset `offset`.
* Set `size` to the number of bytes read
* (`len` if everything is ok, 0 in case of error).
* If `len` == 0, no bytes are read but `size` is set to the total size of the
* article.
*/
public native byte[] getContentPart(String url,
int offest,
int len,
JNIKiwixInt size);
/**
* getDirectAccessInformation.
*
* Return information giving where the content is located in the zim file.
*
* Some contents (binary content) are stored uncompressed in the zim file.
* Knowing this information, it could be interesting to directly open
* the zim file (or zim part) and directly read the content from it (and so
* bypassing the libzim).
*
* Return a `Pair` (filename, offset) where the content is located.
*
* If the content cannot be directly accessed (content is compressed or zim
* file is cut in the middle of the content), the filename is an empty string
* and offset is zero.
*/
public native Pair getDirectAccessInformation(String url);
public native boolean searchSuggestions(String prefix, int count);
public native boolean getNextSuggestion(JNIKiwixString title);
public native boolean getPageUrlFromTitle(String title, JNIKiwixString url);
public native String getDescription();
public native String getDate();
public native String getFavicon();
public native String getCreator();
public native String getPublisher();
public native String getName();
public native int getFileSize();
public native int getArticleCount();
public native int getMediaCount();
public native boolean getRandomPage(JNIKiwixString url);
public JNIKiwixSearcher search(String query, int count)
{
JNIKiwixSearcher searcher = new JNIKiwixSearcher();
searcher.addKiwixReader(this);
searcher.search(query, count);
return searcher;
}
public JNIKiwixReader(String filename) throws JNIKiwixException
{
nativeHandle = getNativeReader(filename);
if (nativeHandle == 0) {
throw new JNIKiwixException("Cannot open zimfile "+filename);
}
}
public JNIKiwixReader() {
}
public native void dispose();
private native long getNativeReader(String filename);
private long nativeHandle;
}

View File

@@ -0,0 +1,67 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
package org.kiwix.kiwixlib;
import org.kiwix.kiwixlib.JNIKiwixReader;
import java.util.Vector;
public class JNIKiwixSearcher
{
public class Result
{
private long nativeHandle;
private JNIKiwixSearcher searcher;
public Result(long handle, JNIKiwixSearcher _searcher)
{
nativeHandle = handle;
searcher = _searcher;
}
public native String getUrl();
public native String getTitle();
public native String getContent();
public native String getSnippet();
public native void dispose();
}
public JNIKiwixSearcher()
{
nativeHandle = getNativeHandle();
usedReaders = new Vector();
}
public native void dispose();
private native long getNativeHandle();
private long nativeHandle;
private Vector usedReaders;
public native void addReader(JNIKiwixReader reader);
public void addKiwixReader(JNIKiwixReader reader)
{
addReader(reader);
usedReaders.addElement(reader);
};
public native void search(String query, int count);
public native Result getNextResult();
public native boolean hasMoreResult();
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,7 +19,7 @@
package org.kiwix.kiwixlib;
public class JNIKiwixString {
public class JNIKiwixString
{
public String value;
}

View File

@@ -0,0 +1,26 @@
/*
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
package org.kiwix.kiwixlib;
public class Pair
{
public String filename;
public int offset;
}

View File

@@ -0,0 +1,3 @@
<resources>
<string name="app_name">Kiwix Lib</string>
</resources>

150
src/android/utils.h Normal file
View File

@@ -0,0 +1,150 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright (C) 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef _ANDROID_JNI_UTILS_H
#define _ANDROID_JNI_UTILS_H
#include <jni.h>
#include <pthread.h>
#include <string>
extern pthread_mutex_t globalLock;
inline jfieldID getHandleField(JNIEnv* env, jobject obj)
{
jclass c = env->GetObjectClass(obj);
// J is the type signature for long:
return env->GetFieldID(c, "nativeHandle", "J");
}
class Lock
{
protected:
pthread_mutex_t* lock;
public:
Lock() : lock(&globalLock) { pthread_mutex_lock(lock); }
Lock(const Lock&) = delete;
Lock& operator=(const Lock&) = delete;
Lock(Lock&& other) : lock(&globalLock) { other.lock = nullptr; }
virtual ~Lock()
{
if (lock) {
pthread_mutex_unlock(lock);
}
}
};
template <class T>
class LockedHandle;
template <class T>
class Handle
{
protected:
T* h;
public:
Handle(T* h) : h(h){};
// No destructor. This must and will be handled by dispose method.
static LockedHandle<T> getHandle(JNIEnv* env, jobject obj)
{
jlong handle = env->GetLongField(obj, getHandleField(env, obj));
return LockedHandle<T>(reinterpret_cast<Handle<T>*>(handle));
}
static void dispose(JNIEnv* env, jobject obj)
{
auto lHandle = getHandle(env, obj);
auto handle = lHandle.h;
delete handle->h;
delete handle;
}
friend class LockedHandle<T>;
};
template <class T>
struct LockedHandle : public Lock {
Handle<T>* h;
LockedHandle(Handle<T>* h) : h(h) {}
T* operator->() { return h->h; }
T* operator*() { return h->h; }
operator bool() const { return (h->h != nullptr); }
};
/* c2jni type conversion functions */
inline jboolean c2jni(const bool& val) { return val ? JNI_TRUE : JNI_FALSE; }
inline jstring c2jni(const std::string& val, JNIEnv* env)
{
return env->NewStringUTF(val.c_str());
}
inline jint c2jni(const int val) { return (jint)val; }
inline jint c2jni(const unsigned val) { return (unsigned)val; }
/* jni2c type conversion functions */
inline bool jni2c(const jboolean& val) { return val == JNI_TRUE; }
inline std::string jni2c(const jstring& val, JNIEnv* env)
{
const char* chars = env->GetStringUTFChars(val, 0);
std::string ret(chars);
env->ReleaseStringUTFChars(val, chars);
return ret;
}
inline int jni2c(const jint val) { return (int)val; }
/* Method to deal with variable passed by reference */
inline void setStringObjValue(const std::string& value,
const jobject obj,
JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "Ljava/lang/String;");
env->SetObjectField(obj, objFid, c2jni(value, env));
}
inline void setIntObjValue(const int value, const jobject obj, JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "I");
env->SetIntField(obj, objFid, value);
}
inline void setBoolObjValue(const bool value, const jobject obj, JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "Z");
env->SetIntField(obj, objFid, c2jni(value));
}
inline void setPairObjValue(const std::string& filename, const int offset,
const jobject obj, JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID filenameFid = env->GetFieldID(objClass, "filename", "Ljava/lang/String;");
env->SetObjectField(obj, filenameFid, c2jni(filename, env));
jfieldID offsetFid = env->GetFieldID(objClass, "offset", "I");
env->SetIntField(obj, offsetFid, offset);
}
#endif // _ANDROID_JNI_UTILS_H

203
src/aria2.cpp Normal file
View File

@@ -0,0 +1,203 @@
#include "aria2.h"
#include "xmlrpc.h"
#include <sstream>
#include <thread>
#include <chrono>
#include <tools/otherTools.h>
#include <tools/pathTools.h>
#include <downloader.h> // For AriaError
#ifdef _WIN32
# define ARIA2_CMD "aria2c.exe"
#else
# define ARIA2_CMD "aria2c"
#endif
namespace kiwix {
Aria2::Aria2():
mp_aria(nullptr),
m_port(42042),
m_secret("kiwixariarpc"),
mp_curl(nullptr),
m_lock(PTHREAD_MUTEX_INITIALIZER)
{
m_downloadDir = getDataDirectory();
makeDirectory(m_downloadDir);
std::vector<const char*> callCmd;
std::string rpc_port = "--rpc-listen-port=" + to_string(m_port);
std::string download_dir = "--dir=" + getDataDirectory();
std::string session_file = appendToDirectory(getDataDirectory(), "kiwix.session");
std::string session = "--save-session=" + session_file;
std::string inputFile = "--input-file=" + session_file;
// std::string log_dir = "--log=\"" + logDir + "\"";
#ifdef _WIN32
int pid = GetCurrentProcessId();
#else
pid_t pid = getpid();
#endif
std::string stop_with_pid = "--stop-with-process=" + to_string(pid);
std::string rpc_secret = "--rpc-secret=" + m_secret;
m_secret = "token:"+m_secret;
std::string aria2cmd = appendToDirectory(
removeLastPathElement(getExecutablePath(), true, true),
ARIA2_CMD);
if (fileExists(aria2cmd)) {
// A local aria2c exe exists (packaged with kiwix-desktop), use it.
callCmd.push_back(aria2cmd.c_str());
} else {
// Try to use a potential installed aria2c.
callCmd.push_back(ARIA2_CMD);
}
callCmd.push_back("--enable-rpc");
callCmd.push_back(rpc_secret.c_str());
callCmd.push_back(rpc_port.c_str());
callCmd.push_back(download_dir.c_str());
if (fileExists(session_file)) {
callCmd.push_back(inputFile.c_str());
}
callCmd.push_back(session.c_str());
// callCmd.push_back(log_dir.c_str());
callCmd.push_back("--auto-save-interval=10");
callCmd.push_back(stop_with_pid.c_str());
callCmd.push_back("--allow-overwrite=true");
callCmd.push_back("--dht-entry-point=router.bittorrent.com:6881");
callCmd.push_back("--dht-entry-point6=router.bittorrent.com:6881");
callCmd.push_back("--quiet=true");
callCmd.push_back("--bt-enable-lpd=true");
callCmd.push_back("--always-resume=true");
callCmd.push_back("--max-concurrent-downloads=42");
callCmd.push_back("--rpc-max-request-size=6M");
callCmd.push_back("--file-allocation=none");
mp_aria = Subprocess::run(callCmd);
mp_curl = curl_easy_init();
curl_easy_setopt(mp_curl, CURLOPT_URL, "http://localhost/rpc");
curl_easy_setopt(mp_curl, CURLOPT_PORT, m_port);
curl_easy_setopt(mp_curl, CURLOPT_POST, 1L);
int watchdog = 50;
while(--watchdog) {
std::this_thread::sleep_for(std::chrono::microseconds(10000));
auto res = curl_easy_perform(mp_curl);
if (res == CURLE_OK) {
break;
}
}
if (!watchdog) {
curl_easy_cleanup(mp_curl);
throw std::runtime_error("Cannot connect to aria2c rpc");
}
}
Aria2::~Aria2()
{
curl_easy_cleanup(mp_curl);
}
void Aria2::close()
{
saveSession();
shutdown();
}
size_t write_callback_to_iss(char* ptr, size_t size, size_t nmemb, void* userdata)
{
auto str = static_cast<std::stringstream*>(userdata);
str->write(ptr, nmemb);
return nmemb;
}
std::string Aria2::doRequest(const MethodCall& methodCall)
{
pthread_mutex_lock(&m_lock);
auto requestContent = methodCall.toString();
std::stringstream stringstream;
CURLcode res;
curl_easy_setopt(mp_curl, CURLOPT_POSTFIELDSIZE, requestContent.size());
curl_easy_setopt(mp_curl, CURLOPT_POSTFIELDS, requestContent.c_str());
curl_easy_setopt(mp_curl, CURLOPT_WRITEFUNCTION, &write_callback_to_iss);
curl_easy_setopt(mp_curl, CURLOPT_WRITEDATA, &stringstream);
res = curl_easy_perform(mp_curl);
if (res == CURLE_OK) {
long response_code;
curl_easy_getinfo(mp_curl, CURLINFO_RESPONSE_CODE, &response_code);
pthread_mutex_unlock(&m_lock);
if (response_code != 200) {
throw std::runtime_error("Invalid return code from aria");
}
auto responseContent = stringstream.str();
MethodResponse response(responseContent);
if (response.isFault()) {
throw AriaError(response.getFault().getFaultString());
}
return responseContent;
}
pthread_mutex_unlock(&m_lock);
throw std::runtime_error("Cannot perform request");
}
std::string Aria2::addUri(const std::vector<std::string>& uris)
{
MethodCall methodCall("aria2.addUri", m_secret);
auto uriParams = methodCall.newParamValue().getArray();
for (auto& uri : uris) {
uriParams.addValue().set(uri);
}
auto ret = doRequest(methodCall);
MethodResponse response(ret);
return response.getParamValue(0).getAsS();
}
std::string Aria2::tellStatus(const std::string& gid, const std::vector<std::string>& statusKey)
{
MethodCall methodCall("aria2.tellStatus", m_secret);
methodCall.newParamValue().set(gid);
if (!statusKey.empty()) {
auto statusArray = methodCall.newParamValue().getArray();
for (auto& key : statusKey) {
statusArray.addValue().set(key);
}
}
return doRequest(methodCall);
}
std::vector<std::string> Aria2::tellActive()
{
MethodCall methodCall("aria2.tellActive", m_secret);
auto statusArray = methodCall.newParamValue().getArray();
statusArray.addValue().set(std::string("gid"));
statusArray.addValue().set(std::string("following"));
auto responseContent = doRequest(methodCall);
MethodResponse response(responseContent);
std::vector<std::string> activeGID;
int index = 0;
while(true) {
try {
auto structNode = response.getParamValue(0).getArray().getValue(index++).getStruct();
auto gidNode = structNode.getMember("gid");
activeGID.push_back(gidNode.getValue().getAsS());
} catch (InvalidRPCNode& e) { break; }
}
return activeGID;
}
void Aria2::saveSession()
{
MethodCall methodCall("aria2.saveSession", m_secret);
doRequest(methodCall);
std::cout << "session saved" << std::endl;
}
void Aria2::shutdown()
{
MethodCall methodCall("aria2.shutdown", m_secret);
doRequest(methodCall);
}
} // end namespace kiwix

46
src/aria2.h Normal file
View File

@@ -0,0 +1,46 @@
#ifndef KIWIXLIB_ARIA2_H_
#define KIWIXLIB_ARIA2_H_
#ifdef _WIN32
// winsock2.h need to be included before windows.h (included by curl.h)
# include <winsock2.h>
#endif
#include "subprocess.h"
#include "xmlrpc.h"
#include <memory>
#include <curl/curl.h>
#include <pthread.h>
namespace kiwix {
class Aria2
{
private:
std::unique_ptr<Subprocess> mp_aria;
int m_port;
std::string m_secret;
std::string m_downloadDir;
CURL* mp_curl;
pthread_mutex_t m_lock;
std::string doRequest(const MethodCall& methodCall);
public:
Aria2();
virtual ~Aria2();
void close();
std::string addUri(const std::vector<std::string>& uri);
std::string tellStatus(const std::string& gid, const std::vector<std::string>& statusKey);
std::vector<std::string> tellActive();
void saveSession();
void shutdown();
};
}; //end namespace kiwix
#endif // KIWIXLIB_ARIA2_H_

195
src/book.cpp Normal file
View File

@@ -0,0 +1,195 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "book.h"
#include "reader.h"
#include "tools/base64.h"
#include "tools/regexTools.h"
#include "tools/networkTools.h"
#include <pugixml.hpp>
namespace kiwix
{
/* Constructor */
Book::Book() : m_readOnly(false)
{
}
/* Destructor */
Book::~Book()
{
}
bool Book::update(const kiwix::Book& other)
{
if (m_readOnly)
return false;
m_readOnly = other.m_readOnly;
if (m_path.empty()) {
m_path = other.m_path;
}
if (m_url.empty()) {
m_url = other.m_url;
}
if (m_tags.empty()) {
m_tags = other.m_tags;
}
if (m_name.empty()) {
m_name = other.m_name;
}
if (m_faviconMimeType.empty()) {
m_favicon = other.m_favicon;
m_faviconMimeType = other.m_faviconMimeType;
}
return true;
}
void Book::update(const kiwix::Reader& reader)
{
m_path = reader.getZimFilePath();
m_id = reader.getId();
m_description = reader.getDescription();
m_language = reader.getLanguage();
m_date = reader.getDate();
m_creator = reader.getCreator();
m_publisher = reader.getPublisher();
m_title = reader.getTitle();
m_name = reader.getName();
m_tags = reader.getTags();
m_origId = reader.getOrigId();
m_articleCount = reader.getArticleCount();
m_mediaCount = reader.getMediaCount();
m_size = static_cast<uint64_t>(reader.getFileSize()) << 10;
reader.getFavicon(m_favicon, m_faviconMimeType);
}
#define ATTR(name) node.attribute(name).value()
void Book::updateFromXml(const pugi::xml_node& node, const std::string& baseDir)
{
m_id = ATTR("id");
std::string path = ATTR("path");
if (isRelativePath(path)) {
path = computeAbsolutePath(baseDir, path);
}
m_path = path;
m_title = ATTR("title");
m_name = ATTR("name");
m_tags = ATTR("tags");
m_description = ATTR("description");
m_language = ATTR("language");
m_date = ATTR("date");
m_creator = ATTR("creator");
m_publisher = ATTR("publisher");
m_url = ATTR("url");
m_origId = ATTR("origId");
m_articleCount = strtoull(ATTR("articleCount"), 0, 0);
m_mediaCount = strtoull(ATTR("mediaCount"), 0, 0);
m_size = strtoull(ATTR("size"), 0, 0) << 10;
m_favicon = base64_decode(ATTR("favicon"));
m_faviconMimeType = ATTR("faviconMimeType");
try {
m_downloadId = ATTR("downloadId");
} catch(...) {}
}
#undef ATTR
static std::string fromOpdsDate(const std::string& date)
{
//The opds date use the standard <YYYY>-<MM>-<DD>T<HH>:<mm>:<SS>Z
//and we want <YYYY>-<MM>-<DD>. That's easy, let's take the first 10 char
return date.substr(0, 10);
}
#define VALUE(name) node.child(name).child_value()
void Book::updateFromOpds(const pugi::xml_node& node, const std::string& urlHost)
{
m_id = VALUE("id");
if (!m_id.compare(0, 9, "urn:uuid:")) {
m_id.erase(0, 9);
}
m_title = VALUE("title");
m_description = VALUE("description");
m_language = VALUE("language");
m_date = fromOpdsDate(VALUE("updated"));
m_creator = node.child("author").child("name").child_value();
for(auto linkNode = node.child("link"); linkNode;
linkNode = linkNode.next_sibling("link")) {
std::string rel = linkNode.attribute("rel").value();
if (rel == "http://opds-spec.org/acquisition/open-access") {
m_url = linkNode.attribute("href").value();
m_size = strtoull(linkNode.attribute("length").value(), 0, 0);
}
if (rel == "http://opds-spec.org/image/thumbnail") {
m_faviconUrl = urlHost + linkNode.attribute("href").value();
m_faviconMimeType = linkNode.attribute("type").value();
}
}
}
#undef VALUE
std::string Book::getHumanReadableIdFromPath()
{
std::string id = m_path;
if (!id.empty()) {
kiwix::removeAccents(id);
#ifdef _WIN32
id = replaceRegex(id, "", "^.*\\\\");
#else
id = replaceRegex(id, "", "^.*/");
#endif
id = replaceRegex(id, "", "\\.zim[a-z]*$");
id = replaceRegex(id, "_", " ");
id = replaceRegex(id, "plus", "\\+");
}
return id;
}
void Book::setPath(const std::string& path)
{
m_path = isRelativePath(path)
? computeAbsolutePath(getCurrentDirectory(), path)
: path;
}
const std::string& Book::getFavicon() const {
if (m_favicon.empty() && !m_faviconUrl.empty()) {
try {
m_favicon = download(m_faviconUrl);
} catch(...) {
std::cerr << "Cannot download favicon from " << m_faviconUrl;
}
}
return m_favicon;
}
}

47
src/bookmark.cpp Normal file
View File

@@ -0,0 +1,47 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "bookmark.h"
#include <pugixml.hpp>
namespace kiwix
{
/* Constructor */
Bookmark::Bookmark()
{
}
/* Destructor */
Bookmark::~Bookmark()
{
}
void Bookmark::updateFromXml(const pugi::xml_node& node)
{
auto bookNode = node.child("book");
m_bookId = bookNode.child("id").child_value();
m_bookTitle = bookNode.child("title").child_value();
m_language = bookNode.child("language").child_value();
m_date = bookNode.child("date").child_value();
m_title = node.child("title").child_value();
m_url = node.child("url").child_value();
}
}

View File

@@ -1,137 +0,0 @@
/*
* Copyright 2012 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <common/networkTools.h>
std::map<std::string, std::string> kiwix::getNetworkInterfaces() {
std::map<std::string, std::string> interfaces;
#ifdef _WIN32
SOCKET sd = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0);
if (sd == SOCKET_ERROR) {
std::cerr << "Failed to get a socket. Error " << WSAGetLastError() <<
std::endl;
return interfaces;
}
INTERFACE_INFO InterfaceList[20];
unsigned long nBytesReturned;
if (WSAIoctl(sd, SIO_GET_INTERFACE_LIST, 0, 0, &InterfaceList,
sizeof(InterfaceList), &nBytesReturned, 0, 0) == SOCKET_ERROR) {
std::cerr << "Failed calling WSAIoctl: error " << WSAGetLastError() <<
std::endl;
return interfaces;
}
int nNumInterfaces = nBytesReturned / sizeof(INTERFACE_INFO);
for (int i = 0; i < nNumInterfaces; ++i) {
sockaddr_in *pAddress;
pAddress = (sockaddr_in *) & (InterfaceList[i].iiAddress);
/* Add to the map */
std::string interfaceName = std::string(inet_ntoa(pAddress->sin_addr));
std::string interfaceIp = std::string(inet_ntoa(pAddress->sin_addr));
interfaces.insert(std::pair<std::string, std::string>(interfaceName, interfaceIp));
}
#else
/* Get Network interfaces information */
char buf[16384];
struct ifconf ifconf;
int fd = socket(PF_INET, SOCK_DGRAM, 0); /* Only IPV4 */
ifconf.ifc_len=sizeof buf;
ifconf.ifc_buf=buf;
if(ioctl(fd, SIOCGIFCONF, &ifconf)!=0) {
perror("ioctl(SIOCGIFCONF)");
exit(EXIT_FAILURE);
}
/* Go through each interface */
int i;
size_t len;
struct ifreq *ifreq;
ifreq = ifconf.ifc_req;
for (i = 0; i < ifconf.ifc_len; ) {
if (ifreq->ifr_addr.sa_family == AF_INET) {
/* Get the network interface ip */
char host[128] = { 0 };
const int error = getnameinfo(&(ifreq->ifr_addr), sizeof ifreq->ifr_addr,
host, sizeof host,
0, 0, NI_NUMERICHOST);
if (!error) {
std::string interfaceName = std::string(ifreq->ifr_name);
std::string interfaceIp = std::string(host);
/* Add to the map */
interfaces.insert(std::pair<std::string, std::string>(interfaceName, interfaceIp));
} else {
perror("getnameinfo()");
}
}
/* some systems have ifr_addr.sa_len and adjust the length that
* way, but not mine. weird */
#ifndef linux
len=IFNAMSIZ + ifreq->ifr_addr.sa_len;
#else
len=sizeof *ifreq;
#endif
ifreq=(struct ifreq*)((char*)ifreq+len);
i+=len;
}
#endif
return interfaces;
}
std::string kiwix::getBestPublicIp() {
std::map<std::string, std::string> interfaces = kiwix::getNetworkInterfaces();
#ifndef _WIN32
const char* const prioritizedNames[] =
{ "eth0", "eth1", "wlan0", "wlan1", "en0", "en1" };
const int count = (sizeof prioritizedNames) / (sizeof prioritizedNames[0]);
for (int i = 0; i < count; ++i) {
std::map<std::string, std::string>::const_iterator it =
interfaces.find(prioritizedNames[i]);
if (it != interfaces.end())
return it->second;
}
#endif
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end(); ++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "192.168")
return interfaceIp;
}
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end(); ++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "172.16.")
return interfaceIp;
}
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end(); ++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 3 && interfaceIp.substr(0, 3) == "10.")
return interfaceIp;
}
return "127.0.0.1";
}

View File

@@ -1,252 +0,0 @@
/*
* Copyright 2011-2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <common/pathTools.h>
#ifdef __APPLE__
#include <mach-o/dyld.h>
#include <limits.h>
#elif _WIN32
#include <windows.h>
#include "shlwapi.h"
#include <direct.h>
#define getcwd _getcwd // stupid MSFT "deprecation" warning
#endif
#ifdef _WIN32
#else
#include <unistd.h>
#endif
#ifdef _WIN32
#define SEPARATOR "\\"
#else
#define SEPARATOR "/"
#include <unistd.h>
#endif
#include <stdlib.h>
#ifndef PATH_MAX
#define PATH_MAX 1024
#endif
bool isRelativePath(const string &path) {
#ifdef _WIN32
return path.empty() || path.substr(1, 2) == ":\\" ? false : true;
#else
return path.empty() || path.substr(0, 1) == "/" ? false : true;
#endif
}
string computeRelativePath(const string path, const string absolutePath) {
std::vector<std::string> pathParts = kiwix::split(path, SEPARATOR);
std::vector<std::string> absolutePathParts = kiwix::split(absolutePath, SEPARATOR);
unsigned int commonCount = 0;
while (commonCount < pathParts.size() &&
commonCount < absolutePathParts.size() &&
pathParts[commonCount] == absolutePathParts[commonCount]) {
if (!pathParts[commonCount].empty()) {
commonCount++;
}
}
string relativePath;
#ifdef _WIN32
/* On Windows you have a token more because the root is represented
by a letter */
if (commonCount == 0) {
relativePath = "../";
}
#endif
for (unsigned int i = commonCount ; i < pathParts.size() ; i++) {
relativePath += "../";
}
for (unsigned int i = commonCount ; i < absolutePathParts.size() ; i++) {
relativePath += absolutePathParts[i];
relativePath += i + 1 < absolutePathParts.size() ? "/" : "";
}
return relativePath;
}
/* Warning: the relative path must be with slashes */
string computeAbsolutePath(const string path, const string relativePath) {
string absolutePath;
if (path.empty()) {
char *path=NULL;
size_t size = 0;
#ifdef _WIN32
path = _getcwd(path, size);
#else
path = getcwd(path, size);
#endif
absolutePath = string(path) + SEPARATOR;
} else {
absolutePath = path.substr(path.length() - 1, 1) == SEPARATOR ? path : path + SEPARATOR;
}
#if _WIN32
char *cRelativePath = _strdup(relativePath.c_str());
#else
char *cRelativePath = strdup(relativePath.c_str());
#endif
char *token = strtok(cRelativePath, "/");
while (token != NULL) {
if (string(token) == "..") {
absolutePath = removeLastPathElement(absolutePath, true, false);
token = strtok(NULL, "/");
} else if (strcmp(token, ".") && strcmp(token, "")) {
absolutePath += string(token);
token = strtok(NULL, "/");
if (token != NULL)
absolutePath += SEPARATOR;
} else {
token = strtok(NULL, "/");
}
}
return absolutePath;
}
string removeLastPathElement(const string path, const bool removePreSeparator, const bool removePostSeparator) {
string newPath = path;
size_t offset = newPath.find_last_of(SEPARATOR);
if (removePreSeparator &&
#ifndef _WIN32
offset != newPath.find_first_of(SEPARATOR) &&
#endif
offset == newPath.length()-1) {
newPath = newPath.substr(0, offset);
offset = newPath.find_last_of(SEPARATOR);
}
newPath = removePostSeparator ? newPath.substr(0, offset) : newPath.substr(0, offset+1);
return newPath;
}
string appendToDirectory(const string &directoryPath, const string &filename) {
string newPath = directoryPath + SEPARATOR + filename;
return newPath;
}
string getLastPathElement(const string &path) {
return path.substr(path.find_last_of(SEPARATOR) + 1);
}
unsigned int getFileSize(const string &path) {
#ifdef _WIN32
struct _stat filestatus;
_stat(path.c_str(), &filestatus);
#else
struct stat filestatus;
stat(path.c_str(), &filestatus);
#endif
return filestatus.st_size / 1024;
}
string getFileSizeAsString(const string &path) {
ostringstream convert; convert << getFileSize(path);
return convert.str();
}
bool fileExists(const string &path) {
#ifdef _WIN32
return PathFileExists(path.c_str());
#else
bool flag = false;
fstream fin;
fin.open(path.c_str(), ios::in);
if (fin.is_open()) {
flag = true;
}
fin.close();
return flag;
#endif
}
bool makeDirectory(const string &path) {
#ifdef _WIN32
int status = _mkdir(path.c_str());
#else
int status = mkdir(path.c_str(), S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
#endif
return status == 0;
}
/* Try to create a link and if does not work then make a copy */
bool copyFile(const string &sourcePath, const string &destPath) {
try {
#ifndef _WIN32
if (link(sourcePath.c_str(), destPath.c_str()) != 0) {
#endif
std::ifstream infile(sourcePath.c_str(), std::ios_base::binary);
std::ofstream outfile(destPath.c_str(), std::ios_base::binary);
outfile << infile.rdbuf();
#ifndef _WIN32
}
#endif
} catch (exception &e) {
cerr << e.what() << endl;
return false;
}
return true;
}
string getExecutablePath() {
char binRootPath[PATH_MAX];
#ifdef _WIN32
GetModuleFileName( NULL, binRootPath, PATH_MAX);
return std::string(binRootPath);
#elif __APPLE__
uint32_t max = (uint32_t)PATH_MAX;
_NSGetExecutablePath(binRootPath, &max);
return std::string(binRootPath);
#else
ssize_t size = readlink("/proc/self/exe", binRootPath, PATH_MAX);
if (size != -1) {
return std::string(binRootPath, size);
}
#endif
return "";
}
bool writeTextFile(const string &path, const string &content) {
std::ofstream file;
file.open(path.c_str());
file << content;
file.close();
return true;
}
string getCurrentDirectory() {
char* a_cwd = getcwd(NULL,0);
string s_cwd(a_cwd);
free(a_cwd);
return s_cwd;
}

View File

@@ -1,272 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <common/stringTools.h>
/* tell ICU where to find its dat file (tables) */
void kiwix::loadICUExternalTables() {
#ifdef __APPLE__
std::string executablePath = getExecutablePath();
std::string executableDirectory = removeLastPathElement(executablePath);
std::string datPath = computeAbsolutePath(executableDirectory, "icudt49l.dat");
try {
u_setDataDirectory(datPath.c_str());
} catch (exception &e) {
std::cerr << e.what() << std::endl;
}
#endif
}
std::string kiwix::removeAccents(const std::string &text) {
loadICUExternalTables();
ucnv_setDefaultName("UTF-8");
UErrorCode status = U_ZERO_ERROR;
Transliterator *removeAccentsTrans = Transliterator::createInstance("Lower; NFD; [:M:] remove; NFC", UTRANS_FORWARD, status);
UnicodeString ustring = UnicodeString(text.c_str());
removeAccentsTrans->transliterate(ustring);
delete removeAccentsTrans;
std::string unaccentedText;
ustring.toUTF8String(unaccentedText);
return unaccentedText;
}
#ifndef __ANDROID__
/* Prepare integer for display */
std::string kiwix::beautifyInteger(const unsigned int number) {
std::stringstream numberStream;
numberStream << number;
std::string numberString = numberStream.str();
signed int offset = numberString.size() - 3;
while (offset > 0) {
numberString.insert(offset, ",");
offset -= 3;
}
return numberString;
}
std::string kiwix::beautifyFileSize(const unsigned int number) {
if (number > 1024*1024) {
return kiwix::beautifyInteger(number/(1024*1024)) + " GB";
} else {
return kiwix::beautifyInteger(number/1024 !=
0 ? number/1024 : 1) + " MB";
}
}
void kiwix::printStringInHexadecimal(UnicodeString s) {
std::cout << std::showbase << std::hex;
for (int i=0; i<s.length(); i++) {
char c = (char)((s.getTerminatedBuffer())[i]);
if (c & 0x80)
std::cout << (c & 0xffff) << " ";
else
std::cout << c << " ";
}
std::cout << std::endl;
}
void kiwix::printStringInHexadecimal(const char *s) {
std::cout << std::showbase << std::hex;
for (char const* pc = s; *pc; ++pc) {
if (*pc & 0x80)
std::cout << (*pc & 0xffff);
else
std::cout << *pc;
std::cout << ' ';
}
std::cout << std::endl;
}
void kiwix::stringReplacement(std::string& str, const std::string& oldStr, const std::string& newStr) {
size_t pos = 0;
while((pos = str.find(oldStr, pos)) != std::string::npos) {
str.replace(pos, oldStr.length(), newStr);
pos += newStr.length();
}
}
/* Encode string to avoid XSS attacks */
std::string kiwix::encodeDiples(const std::string& str) {
std::string result = str;
kiwix::stringReplacement(result, "<", "&lt;");
kiwix::stringReplacement(result, ">", "&gt;");
return result;
}
// Urlencode
//based on javascript encodeURIComponent()
std::string char2hex(char dec) {
char dig1 = (dec&0xF0)>>4;
char dig2 = (dec&0x0F);
if ( 0<= dig1 && dig1<= 9) dig1+=48; //0,48inascii
if (10<= dig1 && dig1<=15) dig1+=97-10; //a,97inascii
if ( 0<= dig2 && dig2<= 9) dig2+=48;
if (10<= dig2 && dig2<=15) dig2+=97-10;
std::string r;
r.append( &dig1, 1);
r.append( &dig2, 1);
return r;
}
std::string kiwix::urlEncode(const std::string &c) {
std::string escaped="";
int max = c.length();
for(int i=0; i<max; i++)
{
if ( (48 <= c[i] && c[i] <= 57) ||//0-9
(65 <= c[i] && c[i] <= 90) ||//abc...xyz
(97 <= c[i] && c[i] <= 122) || //ABC...XYZ
(c[i]=='~' || c[i]=='!' || c[i]=='*' || c[i]=='(' || c[i]==')' || c[i]=='\'')
)
{
escaped.append( &c[i], 1);
}
else
{
escaped.append("%");
escaped.append( char2hex(c[i]) );//converts char 255 to string "ff"
}
}
return escaped;
}
#endif
static char charFromHex(std::string a) {
std::istringstream Blat(a);
int Z;
Blat >> std::hex >> Z;
return char (Z);
}
std::string kiwix::urlDecode(const std::string &originalUrl) {
std::string url = originalUrl;
std::string::size_type pos = 0;
while ((pos = url.find('%', pos)) != std::string::npos &&
pos + 2 < url.length()) {
url.replace(pos, 3, 1, charFromHex(url.substr(pos + 1, 2)));
++pos;
}
return url;
}
/* Split string in a token array */
std::vector<std::string> kiwix::split(const std::string & str,
const std::string & delims=" *-")
{
std::string::size_type lastPos = str.find_first_not_of(delims, 0);
std::string::size_type pos = str.find_first_of(delims, lastPos);
std::vector<std::string> tokens;
while (std::string::npos != pos || std::string::npos != lastPos)
{
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = str.find_first_not_of(delims, pos);
pos = str.find_first_of(delims, lastPos);
}
return tokens;
}
std::vector<std::string> kiwix::split(const char* lhs, const char* rhs){
const std::string m1 (lhs), m2 (rhs);
return split(m1, m2);
}
std::vector<std::string> kiwix::split(const char* lhs, const std::string& rhs){
return split(lhs, rhs.c_str());
}
std::vector<std::string> kiwix::split(const std::string& lhs, const char* rhs){
return split(lhs.c_str(), rhs);
}
std::string kiwix::ucFirst (const std::string &word) {
if (word.empty())
return "";
std::string result;
UnicodeString unicodeWord(word.c_str());
UnicodeString unicodeFirstLetter = UnicodeString(unicodeWord, 0, 1).toUpper();
unicodeWord.replace(0, 1, unicodeFirstLetter);
unicodeWord.toUTF8String(result);
return result;
}
std::string kiwix::ucAll (const std::string &word) {
if (word.empty())
return "";
std::string result;
UnicodeString unicodeWord(word.c_str());
unicodeWord.toUpper().toUTF8String(result);
return result;
}
std::string kiwix::lcFirst (const std::string &word) {
if (word.empty())
return "";
std::string result;
UnicodeString unicodeWord(word.c_str());
UnicodeString unicodeFirstLetter = UnicodeString(unicodeWord, 0, 1).toLower();
unicodeWord.replace(0, 1, unicodeFirstLetter);
unicodeWord.toUTF8String(result);
return result;
}
std::string kiwix::lcAll (const std::string &word) {
if (word.empty())
return "";
std::string result;
UnicodeString unicodeWord(word.c_str());
unicodeWord.toLower().toUTF8String(result);
return result;
}
std::string kiwix::toTitle (const std::string &word) {
if (word.empty())
return "";
std::string result;
UnicodeString unicodeWord(word.c_str());
unicodeWord = unicodeWord.toTitle(0);
unicodeWord.toUTF8String(result);
return result;
}
std::string kiwix::normalize (const std::string &word) {
return kiwix::lcAll(word);
}

View File

@@ -1,4 +1,3 @@
#mesondefine VERSION
#mesondefine ENABLE_CTPP2

View File

@@ -1,210 +0,0 @@
/*
* Copyright 2013 Renaud Gaudin <reg@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <ctpp2/CTPP2VMStringLoader.hpp>
namespace CTPP // C++ Template Engine
{
//
// Convert byte order
//
static void ConvertExecutable(VMExecutable * oCore)
{
// Code entry point
oCore -> entry_point = Swap32(oCore -> entry_point);
// Offset of code segment
oCore -> code_offset = Swap32(oCore -> code_offset);
// Code segment size
oCore -> code_size = Swap32(oCore -> code_size);
// Offset of static text segment
oCore -> syscalls_offset = Swap32(oCore -> syscalls_offset);
// Static text segment size
oCore -> syscalls_data_size = Swap32(oCore -> syscalls_data_size);
// Offset of static text index segment
oCore -> syscalls_index_offset = Swap32(oCore -> syscalls_index_offset);
// Static text index segment size
oCore -> syscalls_index_size = Swap32(oCore -> syscalls_index_size);
// Offset of static data segment
oCore -> static_data_offset = Swap32(oCore -> static_data_offset);
// Static data segment size
oCore -> static_data_data_size = Swap32(oCore -> static_data_data_size);
// Offset of static text segment
oCore -> static_text_offset = Swap32(oCore -> static_text_offset);
// Static text segment size
oCore -> static_text_data_size = Swap32(oCore -> static_text_data_size);
// Offset of static text index segment
oCore -> static_text_index_offset = Swap32(oCore -> static_text_index_offset);
// Static text index segment size
oCore -> static_text_index_size = Swap32(oCore -> static_text_index_size);
// Version 2.2+
// Offset of static data bit index
oCore -> static_data_bit_index_offset = Swap32(oCore -> static_data_bit_index_offset);
/// Offset of static data bit index
oCore -> static_data_bit_index_size = Swap32(oCore -> static_data_bit_index_size);
// Platform
oCore -> platform = Swap64(oCore -> platform);
// Ugly-jolly hack!
// ... dereferencing type-punned pointer will break strict-aliasing rules ...
UINT_64 iTMP;
memcpy(&iTMP, &(oCore -> ieee754double), sizeof(UINT_64));
iTMP = Swap64(iTMP);
memcpy(&(oCore -> ieee754double), &iTMP, sizeof(UINT_64));
// Cyclic Redundancy Check
oCore -> crc = 0;
// Convert data structures
// Convert code segment
VMInstruction * pInstructions = const_cast<VMInstruction *>(VMExecutable::GetCodeSeg(oCore));
UINT_32 iI = 0;
UINT_32 iSteps = oCore -> code_size / sizeof(VMInstruction);
for(iI = 0; iI < iSteps; ++iI)
{
pInstructions -> instruction = Swap32(pInstructions -> instruction);
pInstructions -> argument = Swap32(pInstructions -> argument);
pInstructions -> reserved = Swap64(pInstructions -> reserved);
++pInstructions;
}
// Convert syscalls index
TextDataIndex * pTextIndex = const_cast<TextDataIndex *>(VMExecutable::GetSyscallsIndexSeg(oCore));
iSteps = oCore -> syscalls_index_size / sizeof(TextDataIndex);
for(iI = 0; iI < iSteps; ++iI)
{
pTextIndex -> offset = Swap32(pTextIndex -> offset);
pTextIndex -> length = Swap32(pTextIndex -> length);
++pTextIndex;
}
// Convert static text index
pTextIndex = const_cast<TextDataIndex *>(VMExecutable::GetStaticTextIndexSeg(oCore));
iSteps = oCore -> static_text_index_size / sizeof(TextDataIndex);
for(iI = 0; iI < iSteps; ++iI)
{
pTextIndex -> offset = Swap32(pTextIndex -> offset);
pTextIndex -> length = Swap32(pTextIndex -> length);
++pTextIndex;
}
// Convert static data
StaticDataVar * pStaticDataVar = const_cast<StaticDataVar *>(VMExecutable::GetStaticDataSeg(oCore));
iSteps = oCore -> static_data_data_size / sizeof(StaticDataVar);
for(iI = 0; iI < iSteps; ++iI)
{
(*pStaticDataVar).i_data = Swap64((*pStaticDataVar).i_data);
++pStaticDataVar;
}
}
//
// Constructor
//
VMStringLoader::VMStringLoader(CCHAR_P rawContent, size_t rawContentSize)
{
oCore = (VMExecutable *)malloc(rawContentSize + 1);
memcpy(oCore, rawContent, rawContentSize);
if (oCore -> magic[0] == 'C' &&
oCore -> magic[1] == 'T' &&
oCore -> magic[2] == 'P' &&
oCore -> magic[3] == 'P')
{
// Check version
if (oCore -> version[0] >= 1)
{
// Platform-dependent data (byte order)
if (oCore -> platform == 0x4142434445464748ull)
{
#ifdef _DEBUG
fprintf(stderr, "Big/Little Endian conversion: Nothing to do\n");
#endif
// Nothing to do, only check crc
UINT_32 iCRC = oCore -> crc;
oCore -> crc = 0;
// Calculate CRC of file
// KELSON: next line used to refer to oStat.st_size
// changed it to rawContentSize
if (iCRC != crc32((UCCHAR_P)oCore, rawContentSize))
{
free(oCore);
throw CTPPLogicError("CRC checksum invalid");
}
}
// Platform-dependent data (byte order)
else if (oCore -> platform == 0x4847464544434241ull)
{
// Need to reconvert data
#ifdef _DEBUG
fprintf(stderr, "Big/Little Endian conversion: Need to reconvert core\n");
#endif
ConvertExecutable(oCore);
}
else
{
free(oCore);
throw CTPPLogicError("Conversion of middle-end architecture does not supported.");
}
// Check IEEE 754 format
if (oCore -> ieee754double != 15839800103804824402926068484019465486336.0)
{
free(oCore);
throw CTPPLogicError("IEEE 754 format is broken, cannot convert file");
}
}
pVMMemoryCore = new VMMemoryCore(oCore);
}
else
{
free(oCore);
throw CTPPLogicError("Not an CTPP bytecode file.");
}
}
//
// Get ready-to-run program
//
const VMMemoryCore * VMStringLoader::GetCore() const { return pVMMemoryCore; }
//
// A destructor
//
VMStringLoader::~VMStringLoader() throw()
{
delete pVMMemoryCore;
free(oCore);
}
} // namespace CTPP
// End.

154
src/downloader.cpp Normal file
View File

@@ -0,0 +1,154 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "downloader.h"
#include "tools/pathTools.h"
#include <algorithm>
#include <thread>
#include <chrono>
#include <iostream>
#include "aria2.h"
#include "xmlrpc.h"
#include "tools/otherTools.h"
#include <pugixml.hpp>
namespace kiwix
{
void Download::updateStatus(bool follow)
{
static std::vector<std::string> statusKey = {"status", "files", "totalLength",
"completedLength", "followedBy",
"downloadSpeed", "verifiedLength"};
std::string strStatus;
if(follow && !m_followedBy.empty()) {
strStatus = mp_aria->tellStatus(m_followedBy, statusKey);
} else {
strStatus = mp_aria->tellStatus(m_did, statusKey);
}
// std::cout << strStatus << std::endl;
MethodResponse response(strStatus);
if (response.isFault()) {
m_status = Download::K_UNKNOWN;
return;
}
auto structNode = response.getParams().getParam(0).getValue().getStruct();
auto _status = structNode.getMember("status").getValue().getAsS();
auto status = _status == "active" ? Download::K_ACTIVE
: _status == "waiting" ? Download::K_WAITING
: _status == "paused" ? Download::K_PAUSED
: _status == "error" ? Download::K_ERROR
: _status == "complete" ? Download::K_COMPLETE
: _status == "removed" ? Download::K_REMOVED
: Download::K_UNKNOWN;
if (status == K_COMPLETE) {
try {
auto followedByMember = structNode.getMember("followedBy");
m_followedBy = followedByMember.getValue().getArray().getValue(0).getAsS();
if (follow) {
status = K_ACTIVE;
updateStatus(true);
return;
}
} catch (InvalidRPCNode& e) { }
}
m_status = status;
m_totalLength = extractFromString<uint64_t>(structNode.getMember("totalLength").getValue().getAsS());
m_completedLength = extractFromString<uint64_t>(structNode.getMember("completedLength").getValue().getAsS());
m_downloadSpeed = extractFromString<uint64_t>(structNode.getMember("downloadSpeed").getValue().getAsS());
try {
auto verifiedLengthValue = structNode.getMember("verifiedLength").getValue();
m_verifiedLength = extractFromString<uint64_t>(verifiedLengthValue.getAsS());
} catch (InvalidRPCNode& e) { m_verifiedLength = 0; }
auto filesMember = structNode.getMember("files");
auto fileStruct = filesMember.getValue().getArray().getValue(0).getStruct();
m_path = fileStruct.getMember("path").getValue().getAsS();
auto urisArray = fileStruct.getMember("uris").getValue().getArray();
int index = 0;
m_uris.clear();
while(true) {
try {
auto uriNode = urisArray.getValue(index++).getStruct().getMember("uri");
m_uris.push_back(uriNode.getValue().getAsS());
} catch(InvalidRPCNode& e) { break; }
}
}
/* Constructor */
Downloader::Downloader() :
mp_aria(new Aria2())
{
for (auto gid : mp_aria->tellActive()) {
m_knownDownloads[gid] = std::unique_ptr<Download>(new Download(mp_aria, gid));
m_knownDownloads[gid]->updateStatus();
}
}
/* Destructor */
Downloader::~Downloader()
{
}
void Downloader::close()
{
mp_aria->close();
}
std::vector<std::string> Downloader::getDownloadIds() {
std::vector<std::string> ret;
for(auto& p:m_knownDownloads) {
ret.push_back(p.first);
}
return ret;
}
Download* Downloader::startDownload(const std::string& uri)
{
for (auto& p: m_knownDownloads) {
auto& d = p.second;
auto& uris = d->getUris();
if (std::find(uris.begin(), uris.end(), uri) != uris.end())
return d.get();
}
std::vector<std::string> uris = {uri};
auto gid = mp_aria->addUri(uris);
m_knownDownloads[gid] = std::unique_ptr<Download>(new Download(mp_aria, gid));
return m_knownDownloads[gid].get();
}
Download* Downloader::getDownload(const std::string& did)
{
try {
return m_knownDownloads.at(did).get();
} catch(exception& e) {
for (auto gid : mp_aria->tellActive()) {
if (gid == did) {
m_knownDownloads[gid] = std::unique_ptr<Download>(new Download(mp_aria, gid));
return m_knownDownloads[gid].get();
}
}
throw e;
}
}
}

138
src/entry.cpp Normal file
View File

@@ -0,0 +1,138 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "reader.h"
#include <time.h>
#include <zim/search.h>
namespace kiwix
{
Entry::Entry(zim::Article article)
: article(article)
{
}
#define RETURN_IF_INVALID(WHAT) if(!good()) { return (WHAT); }
std::string Entry::getPath() const
{
RETURN_IF_INVALID("");
return article.getLongUrl();
}
std::string Entry::getTitle() const
{
RETURN_IF_INVALID("");
return article.getTitle();
}
std::string Entry::getContent() const
{
RETURN_IF_INVALID("");
return article.getData();
}
zim::Blob Entry::getBlob(offset_type offset) const
{
RETURN_IF_INVALID(zim::Blob());
return article.getData(offset);
}
zim::Blob Entry::getBlob(offset_type offset, size_type size) const
{
RETURN_IF_INVALID(zim::Blob());
return article.getData(offset, size);
}
std::pair<std::string, offset_type> Entry::getDirectAccessInfo() const
{
RETURN_IF_INVALID(std::make_pair("", 0));
return article.getDirectAccessInformation();
}
size_type Entry::getSize() const
{
RETURN_IF_INVALID(0);
return article.getArticleSize();
}
std::string Entry::getMimetype() const
{
RETURN_IF_INVALID("");
try {
return article.getMimeType();
} catch (exception& e) {
return "application/octet-stream";
}
}
bool Entry::isRedirect() const
{
RETURN_IF_INVALID(false);
return article.isRedirect();
}
bool Entry::isLinkTarget() const
{
RETURN_IF_INVALID(false);
return article.isLinktarget();
}
bool Entry::isDeleted() const
{
RETURN_IF_INVALID(false);
return article.isDeleted();
}
Entry Entry::getRedirectEntry() const
{
RETURN_IF_INVALID(Entry());
if ( !article.isRedirect() ) {
throw NoEntry();
}
auto targeted_article = article.getRedirectArticle();
if ( !targeted_article.good()) {
throw NoEntry();
}
return targeted_article;
}
Entry Entry::getFinalEntry() const
{
RETURN_IF_INVALID(Entry());
if (final_article.good()) {
return final_article;
}
int loopCounter = 42;
final_article = article;
while (final_article.isRedirect() && loopCounter--) {
final_article = final_article.getRedirectArticle();
if ( !final_article.good()) {
throw NoEntry();
}
}
return final_article;
}
}

View File

@@ -1,528 +0,0 @@
/*
* Copyright 2011-2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "indexer.h"
#include "xapian/myhtmlparse.h"
#include "kiwixlib-resources.h"
namespace kiwix {
/* Count word */
unsigned int Indexer::countWords(const string &text) {
unsigned int numWords = 1;
unsigned int length = text.size();
for(unsigned int i=0; i<length;) {
while(i<length && text[i] != ' ') {
i++;
}
numWords++;
i++;
}
return numWords;
}
/* Constructor */
Indexer::Indexer() :
keywordsBoostFactor(3),
verboseFlag(false) {
/* Initialize mutex */
pthread_mutex_init(&threadIdsMutex, NULL);
pthread_mutex_init(&toParseQueueMutex, NULL);
pthread_mutex_init(&toIndexQueueMutex, NULL);
pthread_mutex_init(&articleExtractorRunningMutex, NULL);
pthread_mutex_init(&articleParserRunningMutex, NULL);
pthread_mutex_init(&articleIndexerRunningMutex, NULL);
pthread_mutex_init(&articleCountMutex, NULL);
pthread_mutex_init(&zimPathMutex, NULL);
pthread_mutex_init(&zimIdMutex, NULL);
pthread_mutex_init(&indexPathMutex, NULL);
pthread_mutex_init(&progressionMutex, NULL);
pthread_mutex_init(&verboseMutex, NULL);
}
/* Destructor */
Indexer::~Indexer() {
}
/* Read the stopwords */
void Indexer::readStopWords(const string languageCode) {
std::string stopWord;
std::istringstream file(getResource("stopwords/" + languageCode));
this->stopWords.clear();
while (getline(file, stopWord, '\n')) {
this->stopWords.push_back(stopWord);
}
if (this->verboseFlag) {
std::cout << "Read stop words, lang code:" << languageCode << ", count:" << this->stopWords.size() << std::endl;
}
}
#pragma mark - Extractor
/* Article extractor methods */
void *Indexer::extractArticles(void *ptr) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
kiwix::Indexer *self = (kiwix::Indexer *)ptr;
/* Get the number of article to index and the ZIM id */
kiwix::Reader reader(self->getZimPath());
unsigned int articleCount = reader.getArticleCount();
self->setArticleCount(articleCount);
string zimId = reader.getId();
self->setZimId(zimId);
/* Progression */
unsigned int readArticleCount = 0;
unsigned int currentProgression = 0;
self->setProgression(currentProgression);
unsigned int newProgress;
/* StopWords */
self->readStopWords(reader.getLanguage());
/* Goes trough all articles */
zim::File *zimHandler = reader.getZimFileHandler();
unsigned int currentOffset = zimHandler->getNamespaceBeginOffset('A');
unsigned int lastOffset = zimHandler->getNamespaceEndOffset('A');
zim::Article currentArticle;
while (currentOffset < lastOffset) {
currentArticle = zimHandler->getArticle(currentOffset);
if (!currentArticle.isRedirect()) {
/* Add articles to the queue */
indexerToken token;
token.title = currentArticle.getTitle();
token.url = currentArticle.getLongUrl();
token.content = string(currentArticle.getData().data(), currentArticle.getData().size());
self->pushToParseQueue(token);
readArticleCount += 1;
/* Update progress */
if (self->progressCallback) {
self->progressCallback(readArticleCount, articleCount);
}
newProgress = (unsigned int)((float)readArticleCount / (float)articleCount * 100);
if (newProgress != currentProgression) {
self->setProgression(newProgress);
}
}
currentOffset += 1;
/* Test if the thread should be cancelled */
pthread_testcancel();
}
self->articleExtractorRunning(false);
pthread_exit(NULL);
return NULL;
}
void Indexer::articleExtractorRunning(bool value) {
pthread_mutex_lock(&articleExtractorRunningMutex);
this->articleExtractorRunningFlag = value;
pthread_mutex_unlock(&articleExtractorRunningMutex);
}
bool Indexer::isArticleExtractorRunning() {
pthread_mutex_lock(&articleExtractorRunningMutex);
bool retVal = this->articleExtractorRunningFlag;
pthread_mutex_unlock(&articleExtractorRunningMutex);
return retVal;
}
#pragma mark - Parser
/* Article parser methods */
void *Indexer::parseArticles(void *ptr) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
kiwix::Indexer *self = (kiwix::Indexer *)ptr;
size_t found;
indexerToken token;
while (self->popFromToParseQueue(token)) {
MyHtmlParser htmlParser;
/* The parser generate a lot of exceptions which should be avoided */
try {
htmlParser.parse_html(token.content, "UTF-8", true);
} catch (...) {
}
/* If content does not have the noindex meta tag */
/* Seems that the parser generates an exception in such case */
found = htmlParser.dump.find("NOINDEX");
if (found == string::npos) {
/* Get the accented title */
token.accentedTitle = (htmlParser.title.empty() ? token.title : htmlParser.title);
/* count words */
stringstream countWordStringStream;
countWordStringStream << self->countWords(htmlParser.dump);
token.wordCount = countWordStringStream.str();
/* snippet */
std::string snippet = std::string(htmlParser.dump, 0, 300);
std::string::size_type last = snippet.find_last_of('.');
if (last == snippet.npos)
last = snippet.find_last_of(' ');
if (last != snippet.npos)
snippet = snippet.substr(0, last);
token.snippet = snippet;
/* size */
stringstream sizeStringStream;
sizeStringStream << token.content.size() / 1024;
token.size = sizeStringStream.str();
/* Remove accent */
token.title = kiwix::removeAccents(token.accentedTitle);
token.keywords = kiwix::removeAccents(htmlParser.keywords);
token.content = kiwix::removeAccents(htmlParser.dump);
self->pushToIndexQueue(token);
}
/* Test if the thread should be cancelled */
pthread_testcancel();
}
self->articleParserRunning(false);
pthread_exit(NULL);
return NULL;
}
void Indexer::articleParserRunning(bool value) {
pthread_mutex_lock(&articleParserRunningMutex);
this->articleParserRunningFlag = value;
pthread_mutex_unlock(&articleParserRunningMutex);
}
bool Indexer::isArticleParserRunning() {
pthread_mutex_lock(&articleParserRunningMutex);
bool retVal = this->articleParserRunningFlag;
pthread_mutex_unlock(&articleParserRunningMutex);
return retVal;
}
#pragma mark - Indexer
/* Article indexer methods */
void *Indexer::indexArticles(void *ptr) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
kiwix::Indexer *self = (kiwix::Indexer *)ptr;
unsigned int indexedArticleCount = 0;
indexerToken token;
self->indexingPrelude(self->getIndexPath());
while (self->popFromToIndexQueue(token)) {
self->index(token.url,
token.accentedTitle,
token.title,
token.keywords,
token.content,
token.snippet,
token.size,
token.wordCount
);
indexedArticleCount += 1;
/* Make a hard-disk flush every 10.000 articles */
if (indexedArticleCount % 5000 == 0) {
self->flush();
}
/* Test if the thread should be cancelled */
pthread_testcancel();
}
self->indexingPostlude(self->getIndexPath());
/* Write content id file */
string path = appendToDirectory(self->getIndexPath(), "content.id");
writeTextFile(path, self->getZimId());
self->setProgression(100);
kiwix::sleep(100);
self->articleIndexerRunning(false);
pthread_exit(NULL);
return NULL;
}
void Indexer::articleIndexerRunning(bool value) {
pthread_mutex_lock(&articleIndexerRunningMutex);
this->articleIndexerRunningFlag = value;
pthread_mutex_unlock(&articleIndexerRunningMutex);
}
bool Indexer::isArticleIndexerRunning() {
pthread_mutex_lock(&articleIndexerRunningMutex);
bool retVal = this->articleIndexerRunningFlag;
pthread_mutex_unlock(&articleIndexerRunningMutex);
return retVal;
}
#pragma mark - Parse Queue
/* ToParseQueue methods */
bool Indexer::isToParseQueueEmpty() {
pthread_mutex_lock(&toParseQueueMutex);
bool retVal = this->toParseQueue.empty();
pthread_mutex_unlock(&toParseQueueMutex);
return retVal;
}
void Indexer::pushToParseQueue(indexerToken &token) {
pthread_mutex_lock(&toParseQueueMutex);
this->toParseQueue.push(token);
pthread_mutex_unlock(&toParseQueueMutex);
kiwix::sleep(int(this->toParseQueue.size() / 200) / 10 * 1000);
}
bool Indexer::popFromToParseQueue(indexerToken &token) {
while (this->isToParseQueueEmpty() && this->isArticleExtractorRunning()) {
kiwix::sleep(500);
if (this->getVerboseFlag()) {
std::cout << "Waiting... ToParseQueue is empty for now..." << std::endl;
}
pthread_testcancel();
}
if (!this->isToParseQueueEmpty()) {
pthread_mutex_lock(&toParseQueueMutex);
token = this->toParseQueue.front();
this->toParseQueue.pop();
pthread_mutex_unlock(&toParseQueueMutex);
} else {
return false;
}
return true;
}
#pragma mark - Index Queue
/* ToIndexQueue methods */
bool Indexer::isToIndexQueueEmpty() {
pthread_mutex_lock(&toIndexQueueMutex);
bool retVal = this->toIndexQueue.empty();
pthread_mutex_unlock(&toIndexQueueMutex);
return retVal;
}
void Indexer::pushToIndexQueue(indexerToken &token) {
pthread_mutex_lock(&toIndexQueueMutex);
this->toIndexQueue.push(token);
pthread_mutex_unlock(&toIndexQueueMutex);
kiwix::sleep(int(this->toIndexQueue.size() / 200) / 10 * 1000);
}
bool Indexer::popFromToIndexQueue(indexerToken &token) {
while (this->isToIndexQueueEmpty() && this->isArticleParserRunning()) {
kiwix::sleep(500);
if (this->getVerboseFlag()) {
std::cout << "Waiting... ToIndexQueue is empty for now..." << std::endl;
}
pthread_testcancel();
}
if (!this->isToIndexQueueEmpty()) {
pthread_mutex_lock(&toIndexQueueMutex);
token = this->toIndexQueue.front();
this->toIndexQueue.pop();
pthread_mutex_unlock(&toIndexQueueMutex);
} else {
return false;
}
return true;
}
#pragma mark - Properties Getter & Setter
/* ZIM & Index methods */
void Indexer::setZimPath(const string path) {
pthread_mutex_lock(&zimPathMutex);
this->zimPath = path;
pthread_mutex_unlock(&zimPathMutex);
}
string Indexer::getZimPath() {
pthread_mutex_lock(&zimPathMutex);
string retVal = this->zimPath;
pthread_mutex_unlock(&zimPathMutex);
return retVal;
}
void Indexer::setIndexPath(const string path) {
pthread_mutex_lock(&indexPathMutex);
this->indexPath = path;
pthread_mutex_unlock(&indexPathMutex);
}
string Indexer::getIndexPath() {
pthread_mutex_lock(&indexPathMutex);
string retVal = this->indexPath;
pthread_mutex_unlock(&indexPathMutex);
return retVal;
}
void Indexer::setArticleCount(const unsigned int articleCount) {
pthread_mutex_lock(&articleCountMutex);
this->articleCount = articleCount;
pthread_mutex_unlock(&articleCountMutex);
}
unsigned int Indexer::getArticleCount() {
pthread_mutex_lock(&articleCountMutex);
unsigned int retVal = this->articleCount;
pthread_mutex_unlock(&articleCountMutex);
return retVal;
}
void Indexer::setProgression(const unsigned int progression) {
pthread_mutex_lock(&progressionMutex);
this->progression = progression;
pthread_mutex_unlock(&progressionMutex);
}
unsigned int Indexer::getProgression() {
pthread_mutex_lock(&progressionMutex);
unsigned int retVal = this->progression;
pthread_mutex_unlock(&progressionMutex);
return retVal;
}
void Indexer::setZimId(const string id) {
pthread_mutex_lock(&zimIdMutex);
this->zimId = id;
pthread_mutex_unlock(&zimIdMutex);
}
string Indexer::getZimId() {
pthread_mutex_lock(&zimIdMutex);
string retVal = this->zimId;
pthread_mutex_unlock(&zimIdMutex);
return retVal;
}
#pragma mark - Status Management
/* Manage */
bool Indexer::start(const string zimPath, const string indexPath, ProgressCallback callback) {
if (this->getVerboseFlag()) {
std::cout << "Indexing of '" << zimPath << "' starting..." <<std::endl;
}
if (callback) {
this->progressCallback = callback;
}
this->setArticleCount(0);
this->setProgression(0);
this->setZimPath(zimPath);
this->setIndexPath(indexPath);
pthread_mutex_lock(&threadIdsMutex);
this->articleExtractorRunning(true);
pthread_create(&(this->articleExtractor), NULL, Indexer::extractArticles, (void*)this);
pthread_detach(this->articleExtractor);
while(this->isArticleExtractorRunning() && this->getArticleCount() == 0) {
kiwix::sleep(100);
}
this->articleParserRunning(true);
pthread_create(&(this->articleParser), NULL, Indexer::parseArticles, (void*)this);
pthread_detach(this->articleParser);
this->articleIndexerRunning(true);
pthread_create(&(this->articleIndexer), NULL, Indexer::indexArticles, (void*)this);
pthread_detach(this->articleIndexer);
pthread_mutex_unlock(&threadIdsMutex);
return true;
}
bool Indexer::isRunning() {
if (this->getVerboseFlag()) {
std::cout << "isArticleExtractor running: " << (this->isArticleExtractorRunning() ? "yes" : "no") << std::endl;
std::cout << "isArticleParser running: " << (this->isArticleParserRunning() ? "yes" : "no") << std::endl;
std::cout << "isArticleIndexer running: " << (this->isArticleIndexerRunning() ? "yes" : "no") << std::endl;
}
return this->isArticleExtractorRunning() || this->isArticleIndexerRunning() || this->isArticleParserRunning();
}
bool Indexer::stop() {
if (this->isRunning()) {
bool isArticleExtractorRunning = this->isArticleExtractorRunning();
bool isArticleIndexerRunning = this->isArticleIndexerRunning();
bool isArticleParserRunning = this->isArticleParserRunning();
pthread_mutex_lock(&threadIdsMutex);
if (isArticleIndexerRunning) {
pthread_cancel(this->articleIndexer);
this->articleIndexerRunning(false);
}
if (isArticleParserRunning) {
pthread_cancel(this->articleParser);
this->articleParserRunning(false);
}
if (isArticleExtractorRunning) {
pthread_cancel(this->articleExtractor);
this->articleExtractorRunning(false);
}
pthread_mutex_unlock(&threadIdsMutex);
}
return true;
}
#pragma mark - verbose
/* Manage the verboseFlag */
void Indexer::setVerboseFlag(const bool value) {
pthread_mutex_lock(&verboseMutex);
this->verboseFlag = value;
pthread_mutex_unlock(&verboseMutex);
}
bool Indexer::getVerboseFlag() {
bool value;
pthread_mutex_lock(&verboseMutex);
value = this->verboseFlag;
pthread_mutex_unlock(&verboseMutex);
return value;
}
}

View File

@@ -18,126 +18,310 @@
*/
#include "library.h"
#include "book.h"
#include "libxml_dumper.h"
namespace kiwix {
#include "tools/base64.h"
#include "tools/regexTools.h"
#include "tools/pathTools.h"
/* Constructor */
Book::Book():
readOnly(false) {
}
/* Destructor */
Book::~Book() {
#include <pugixml.hpp>
#include <algorithm>
#include <set>
namespace kiwix
{
/* Constructor */
Library::Library()
{
}
/* Destructor */
Library::~Library()
{
}
bool Library::addBook(const Book& book)
{
/* Try to find it */
try {
auto& oldbook = m_books.at(book.getId());
oldbook.update(book);
return false;
} catch (std::out_of_range&) {
m_books[book.getId()] = book;
return true;
}
}
/* Sort functions */
bool Book::sortByLastOpen(const kiwix::Book &a, const kiwix::Book &b) {
return atoi(a.last.c_str()) > atoi(b.last.c_str());
}
void Library::addBookmark(const Bookmark& bookmark)
{
m_bookmarks.push_back(bookmark);
}
bool Book::sortByTitle(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.title.c_str(), b.title.c_str()) < 0;
}
bool Book::sortByDate(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.date.c_str(), b.date.c_str()) > 0;
}
bool Book::sortBySize(const kiwix::Book &a, const kiwix::Book &b) {
return atoi(a.size.c_str()) < atoi(b.size.c_str());
}
bool Book::sortByPublisher(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.publisher.c_str(), b.publisher.c_str()) < 0;
}
bool Book::sortByCreator(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.creator.c_str(), b.creator.c_str()) < 0;
}
bool Book::sortByLanguage(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.language.c_str(), b.language.c_str()) < 0;
}
std::string Book::getHumanReadableIdFromPath() {
std::string id = pathAbsolute;
if (!id.empty()) {
kiwix::removeAccents(id);
#ifdef _WIN32
id = replaceRegex(id, "", "^.*\\\\");
#else
id = replaceRegex(id, "", "^.*/");
#endif
id = replaceRegex(id, "", "\\.zim[a-z]*$");
id = replaceRegex(id, "_", " ");
id = replaceRegex(id, "plus", "\\+");
bool Library::removeBookmark(const std::string& zimId, const std::string& url)
{
for(auto it=m_bookmarks.begin(); it!=m_bookmarks.end(); it++) {
if (it->getBookId() == zimId && it->getUrl() == url) {
m_bookmarks.erase(it);
return true;
}
return id;
}
return false;
}
/* Constructor */
Library::Library():
version(KIWIX_LIBRARY_VERSION) {
}
/* Destructor */
Library::~Library() {
bool Library::removeBookById(const std::string& id)
{
return m_books.erase(id) == 1;
}
Book& Library::getBookById(const std::string& id)
{
return m_books.at(id);
}
unsigned int Library::getBookCount(const bool localBooks,
const bool remoteBooks)
{
unsigned int result = 0;
for (auto& pair: m_books) {
auto& book = pair.second;
if ((!book.getPath().empty() && localBooks)
|| (book.getPath().empty() && remoteBooks)) {
result++;
}
}
return result;
}
bool Library::addBook(const Book &book) {
bool Library::writeToFile(const std::string& path) {
auto baseDir = removeLastPathElement(path, true, false);
LibXMLDumper dumper(this);
dumper.setBaseDir(baseDir);
return writeTextFile(path, dumper.dumpLibXMLContent(getBooksIds()));
}
/* Try to find it */
std::vector<kiwix::Book>::iterator itr;
for ( itr = this->books.begin(); itr != this->books.end(); ++itr ) {
if (itr->id == book.id) {
if (!itr->readOnly) {
itr->readOnly = book.readOnly;
if (itr->path.empty())
itr->path = book.path;
if (itr->pathAbsolute.empty())
itr->pathAbsolute = book.pathAbsolute;
if (itr->url.empty())
itr->url = book.url;
bool Library::writeBookmarksToFile(const std::string& path) {
LibXMLDumper dumper(this);
return writeTextFile(path, dumper.dumpLibXMLBookmark());
}
if (itr->tags.empty())
itr->tags = book.tags;
std::vector<std::string> Library::getBooksLanguages()
{
std::vector<std::string> booksLanguages;
std::map<std::string, bool> booksLanguagesMap;
if (itr->name.empty())
itr->name = book.name;
if (itr->indexPath.empty()) {
itr->indexPath = book.indexPath;
itr->indexType = book.indexType;
}
if (itr->indexPathAbsolute.empty()) {
itr->indexPathAbsolute = book.indexPathAbsolute;
itr->indexType = book.indexType;
}
if (itr->faviconMimeType.empty()) {
itr->favicon = book.favicon;
itr->faviconMimeType = book.faviconMimeType;
}
}
return false;
for (auto& pair: m_books) {
auto& book = pair.second;
auto& language = book.getLanguage();
if (booksLanguagesMap.find(language) == booksLanguagesMap.end()) {
if (book.getOrigId().empty()) {
booksLanguagesMap[language] = true;
booksLanguages.push_back(language);
}
}
/* otherwise */
this->books.push_back(book);
return true;
}
bool Library::removeBookByIndex(const unsigned int bookIndex) {
books.erase(books.begin()+bookIndex);
return true;
}
return booksLanguages;
}
std::vector<std::string> Library::getBooksCreators()
{
std::vector<std::string> booksCreators;
std::map<std::string, bool> booksCreatorsMap;
for (auto& pair: m_books) {
auto& book = pair.second;
auto& creator = book.getCreator();
if (booksCreatorsMap.find(creator) == booksCreatorsMap.end()) {
if (book.getOrigId().empty()) {
booksCreatorsMap[creator] = true;
booksCreators.push_back(creator);
}
}
}
return booksCreators;
}
std::vector<std::string> Library::getBooksPublishers()
{
std::vector<std::string> booksPublishers;
std::map<std::string, bool> booksPublishersMap;
for (auto& pair:m_books) {
auto& book = pair.second;
auto& publisher = book.getPublisher();
if (booksPublishersMap.find(publisher) == booksPublishersMap.end()) {
if (book.getOrigId().empty()) {
booksPublishersMap[publisher] = true;
booksPublishers.push_back(publisher);
}
}
}
return booksPublishers;
}
std::vector<std::string> Library::getBooksIds()
{
std::vector<std::string> bookIds;
for (auto& pair: m_books) {
bookIds.push_back(pair.first);
}
return bookIds;
}
std::vector<std::string> Library::filter(const std::string& search)
{
if (search.empty()) {
return getBooksIds();
}
std::vector<std::string> bookIds;
for(auto& pair:m_books) {
auto& book = pair.second;
if (matchRegex(book.getTitle(), "\\Q" + search + "\\E")
|| matchRegex(book.getDescription(), "\\Q" + search + "\\E")) {
bookIds.push_back(pair.first);
}
}
return bookIds;
}
template<supportedListSortBy sort>
struct Comparator {
Library* lib;
Comparator(Library* lib) : lib(lib) {}
bool operator() (const std::string& id1, const std::string& id2) {
return get_keys(id1) < get_keys(id2);
}
std::string get_keys(const std::string& id);
unsigned int get_keyi(const std::string& id);
};
template<>
std::string Comparator<TITLE>::get_keys(const std::string& id)
{
return lib->getBookById(id).getTitle();
}
template<>
unsigned int Comparator<SIZE>::get_keyi(const std::string& id)
{
return lib->getBookById(id).getSize();
}
template<>
bool Comparator<SIZE>::operator() (const std::string& id1, const std::string& id2)
{
return get_keyi(id1) < get_keyi(id2);
}
template<>
std::string Comparator<DATE>::get_keys(const std::string& id)
{
return lib->getBookById(id).getDate();
}
template<>
std::string Comparator<CREATOR>::get_keys(const std::string& id)
{
return lib->getBookById(id).getCreator();
}
template<>
std::string Comparator<PUBLISHER>::get_keys(const std::string& id)
{
return lib->getBookById(id).getPublisher();
}
std::vector<std::string> Library::listBooksIds(
int mode,
supportedListSortBy sortBy,
const std::string& search,
const std::string& language,
const std::string& creator,
const std::string& publisher,
const std::vector<std::string>& tags,
size_t maxSize) {
std::vector<std::string> bookIds;
for(auto& pair:m_books) {
auto& book = pair.second;
auto local = !book.getPath().empty();
if (mode & LOCAL && !local)
continue;
if (mode & NOLOCAL && local)
continue;
auto valid = book.isPathValid();
if (mode & VALID && !valid)
continue;
if (mode & NOVALID && valid)
continue;
auto remote = !book.getUrl().empty();
if (mode & REMOTE && !remote)
continue;
if (mode & NOREMOTE && remote)
continue;
if (!tags.empty()) {
auto vBookTags = split(book.getTags(), ";");
std::set<std::string> sBookTags(vBookTags.begin(), vBookTags.end());
bool ok = true;
for (auto& t: tags) {
if (sBookTags.find(t) == sBookTags.end()) {
// A "filter" tag is not in the book tag.
// No need to loop for all "filter" tags.
ok = false;
break;
}
}
if (! ok ) {
// Skip the book
continue;
}
}
if (maxSize != 0 && book.getSize() > maxSize)
continue;
if (!language.empty() && book.getLanguage() != language)
continue;
if (!publisher.empty() && book.getPublisher() != publisher)
continue;
if (!creator.empty() && book.getCreator() != creator)
continue;
if (!search.empty() && !(matchRegex(book.getTitle(), "\\Q" + search + "\\E")
|| matchRegex(book.getDescription(), "\\Q" + search + "\\E")))
continue;
bookIds.push_back(pair.first);
}
switch(sortBy) {
case TITLE:
std::sort(bookIds.begin(), bookIds.end(), Comparator<TITLE>(this));
break;
case SIZE:
std::sort(bookIds.begin(), bookIds.end(), Comparator<SIZE>(this));
break;
case DATE:
std::sort(bookIds.begin(), bookIds.end(), Comparator<DATE>(this));
break;
case CREATOR:
std::sort(bookIds.begin(), bookIds.end(), Comparator<CREATOR>(this));
break;
case PUBLISHER:
std::sort(bookIds.begin(), bookIds.end(), Comparator<PUBLISHER>(this));
break;
default:
break;
}
return bookIds;
}
}

139
src/libxml_dumper.cpp Normal file
View File

@@ -0,0 +1,139 @@
/*
* Copyright 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "libxml_dumper.h"
#include "book.h"
#include <tools/base64.h>
#include <tools/stringTools.h>
#include <tools/otherTools.h>
namespace kiwix
{
/* Constructor */
LibXMLDumper::LibXMLDumper(Library* library)
: library(library)
{
}
/* Destructor */
LibXMLDumper::~LibXMLDumper()
{
}
#define ADD_ATTRIBUTE(node, name, value) { (node).append_attribute((name)) = (value).c_str(); }
#define ADD_ATTR_NOT_EMPTY(node, name, value) { if (!(value).empty()) ADD_ATTRIBUTE(node, name, value); }
void LibXMLDumper::handleBook(Book book, pugi::xml_node root_node) {
if (book.readOnly())
return;
auto entry_node = root_node.append_child("book");
ADD_ATTRIBUTE(entry_node, "id", book.getId());
if (!book.getPath().empty()) {
ADD_ATTRIBUTE(entry_node, "path", computeRelativePath(baseDir, book.getPath()));
}
if (book.getOrigId().empty()) {
ADD_ATTR_NOT_EMPTY(entry_node, "title", book.getTitle());
ADD_ATTR_NOT_EMPTY(entry_node, "name", book.getName());
ADD_ATTR_NOT_EMPTY(entry_node, "tags", book.getTags());
ADD_ATTR_NOT_EMPTY(entry_node, "description", book.getDescription());
ADD_ATTR_NOT_EMPTY(entry_node, "language", book.getLanguage());
ADD_ATTR_NOT_EMPTY(entry_node, "creator", book.getCreator());
ADD_ATTR_NOT_EMPTY(entry_node, "publisher", book.getPublisher());
ADD_ATTR_NOT_EMPTY(entry_node, "faviconMimeType", book.getFaviconMimeType());
if (!book.getFavicon().empty())
ADD_ATTRIBUTE(entry_node, "favicon", base64_encode(book.getFavicon()));
} else {
ADD_ATTRIBUTE(entry_node, "origId", book.getOrigId());
}
ADD_ATTR_NOT_EMPTY(entry_node, "date", book.getDate());
ADD_ATTR_NOT_EMPTY(entry_node, "url", book.getUrl());
if (book.getArticleCount())
ADD_ATTRIBUTE(entry_node, "articleCount", to_string(book.getArticleCount()));
if (book.getMediaCount())
ADD_ATTRIBUTE(entry_node, "mediaCount", to_string(book.getMediaCount()));
if (book.getSize())
ADD_ATTRIBUTE(entry_node, "size", to_string(book.getSize()>>10));
ADD_ATTR_NOT_EMPTY(entry_node, "downloadId", book.getDownloadId());
}
#define ADD_TEXT_ENTRY(node, child, value) (node).append_child((child)).append_child(pugi::node_pcdata).set_value((value).c_str())
void LibXMLDumper::handleBookmark(Bookmark bookmark, pugi::xml_node root_node) {
auto entry_node = root_node.append_child("bookmark");
auto book_node = entry_node.append_child("book");
try {
auto book = library->getBookById(bookmark.getBookId());
ADD_TEXT_ENTRY(book_node, "id", book.getId());
ADD_TEXT_ENTRY(book_node, "title", book.getTitle());
ADD_TEXT_ENTRY(book_node, "language", book.getLanguage());
ADD_TEXT_ENTRY(book_node, "date", book.getDate());
} catch (...) {
ADD_TEXT_ENTRY(book_node, "id", bookmark.getBookId());
ADD_TEXT_ENTRY(book_node, "title", bookmark.getBookTitle());
ADD_TEXT_ENTRY(book_node, "language", bookmark.getLanguage());
ADD_TEXT_ENTRY(book_node, "date", bookmark.getDate());
}
ADD_TEXT_ENTRY(entry_node, "title", bookmark.getTitle());
ADD_TEXT_ENTRY(entry_node, "url", bookmark.getUrl());
}
std::string LibXMLDumper::dumpLibXMLContent(const std::vector<std::string>& bookIds)
{
pugi::xml_document doc;
/* Add the library node */
pugi::xml_node libraryNode = doc.append_child("library");
libraryNode.append_attribute("version") = KIWIX_LIBRARY_VERSION;
if (library) {
for (auto& bookId: bookIds) {
handleBook(library->getBookById(bookId), libraryNode);
}
}
return nodeToString(libraryNode);
}
std::string LibXMLDumper::dumpLibXMLBookmark()
{
pugi::xml_document doc;
/* Add the library node */
pugi::xml_node bookmarksNode = doc.append_child("bookmarks");
if (library) {
for (auto& bookmark: library->getBookmarks()) {
handleBookmark(bookmark, bookmarksNode);
}
}
return nodeToString(bookmarksNode);
}
}

View File

@@ -19,544 +19,233 @@
#include "manager.h"
namespace kiwix {
#include <pugixml.hpp>
/* Constructor */
Manager::Manager() :
writableLibraryPath("") {
namespace kiwix
{
/* Constructor */
Manager::Manager(LibraryManipulator* manipulator):
writableLibraryPath(""),
manipulator(manipulator),
mustDeleteManipulator(false)
{
}
Manager::Manager(Library* library) :
writableLibraryPath(""),
manipulator(new DefaultLibraryManipulator(library)),
mustDeleteManipulator(true)
{
}
/* Destructor */
Manager::~Manager()
{
if (mustDeleteManipulator) {
delete manipulator;
}
}
bool Manager::parseXmlDom(const pugi::xml_document& doc,
const bool readOnly,
const std::string& libraryPath)
{
pugi::xml_node libraryNode = doc.child("library");
/* Destructor */
Manager::~Manager() {
}
std::string libraryVersion = libraryNode.attribute("version").value();
bool Manager::parseXmlDom(const pugi::xml_document &doc, const bool readOnly, const string libraryPath) {
pugi::xml_node libraryNode = doc.child("library");
if (strlen(libraryNode.attribute("current").value()))
this->setCurrentBookId(libraryNode.attribute("current").value());
string libraryVersion = libraryNode.attribute("version").value();
for (pugi::xml_node bookNode = libraryNode.child("book"); bookNode; bookNode = bookNode.next_sibling("book")) {
bool ok = true;
kiwix::Book book;
book.readOnly = readOnly;
book.id = bookNode.attribute("id").value();
book.path = bookNode.attribute("path").value();
book.last = (std::string(bookNode.attribute("last").value()) != "undefined" ?
bookNode.attribute("last").value() : "");
book.indexPath = bookNode.attribute("indexPath").value();
book.indexType = XAPIAN;
book.title = bookNode.attribute("title").value();
book.name = bookNode.attribute("name").value();
book.tags = bookNode.attribute("tags").value();
book.description = bookNode.attribute("description").value();
book.language = bookNode.attribute("language").value();
book.date = bookNode.attribute("date").value();
book.creator = bookNode.attribute("creator").value();
book.publisher = bookNode.attribute("publisher").value();
book.url = bookNode.attribute("url").value();
book.origId = bookNode.attribute("origId").value();
book.articleCount = bookNode.attribute("articleCount").value();
book.mediaCount = bookNode.attribute("mediaCount").value();
book.size = bookNode.attribute("size").value();
book.favicon = bookNode.attribute("favicon").value();
book.faviconMimeType = bookNode.attribute("faviconMimeType").value();
/* Check absolute and relative paths */
this->checkAndCleanBookPaths(book, libraryPath);
/* Update the book properties with the new importer */
if (libraryVersion.empty() || atoi(libraryVersion.c_str()) <= atoi(KIWIX_LIBRARY_VERSION)) {
if (!book.path.empty()) {
ok = this->readBookFromPath(book.pathAbsolute);
}
}
if (ok) {
library.addBook(book);
}
}
return true;
}
bool Manager::readXml(const string xml, const bool readOnly, const string libraryPath) {
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load_buffer_inplace((void*)xml.data(), xml.size());
if (result) {
this->parseXmlDom(doc, readOnly, libraryPath);
}
return true;
}
bool Manager::readFile(const string path, const bool readOnly) {
return this->readFile(path, path, readOnly);
}
bool Manager::readFile(const string nativePath, const string UTF8Path, const bool readOnly) {
bool retVal = true;
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load_file(nativePath.c_str());
if (result) {
this->parseXmlDom(doc, readOnly, UTF8Path);
} else {
retVal = false;
}
/* This has to be set (although if the file does not exists) to be
* able to know where to save the library if new content are
* available */
if (!readOnly) {
this->writableLibraryPath = UTF8Path;
}
return retVal;
}
bool Manager::writeFile(const string path) {
pugi::xml_document doc;
/* Add the library node */
pugi::xml_node libraryNode = doc.append_child("library");
if (!getCurrentBookId().empty()) {
libraryNode.append_attribute("current") = getCurrentBookId().c_str();
}
if (!library.version.empty())
libraryNode.append_attribute("version") = library.version.c_str();
/* Add each book */
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if (!itr->readOnly) {
this->checkAndCleanBookPaths(*itr, path);
pugi::xml_node bookNode = libraryNode.append_child("book");
bookNode.append_attribute("id") = itr->id.c_str();
if (!itr->path.empty())
bookNode.append_attribute("path") = itr->path.c_str();
if (!itr->last.empty() && itr->last != "undefined") {
bookNode.append_attribute("last") = itr->last.c_str();
}
if (!itr->indexPath.empty())
bookNode.append_attribute("indexPath") = itr->indexPath.c_str();
if (!itr->indexPath.empty() || !itr->indexPathAbsolute.empty()) {
if (itr->indexType == XAPIAN)
bookNode.append_attribute("indexType") = "xapian";
}
if (itr->origId.empty()) {
if (!itr->title.empty())
bookNode.append_attribute("title") = itr->title.c_str();
if (!itr->name.empty())
bookNode.append_attribute("name") = itr->name.c_str();
if (!itr->tags.empty())
bookNode.append_attribute("tags") = itr->tags.c_str();
if (!itr->description.empty())
bookNode.append_attribute("description") = itr->description.c_str();
if (!itr->language.empty())
bookNode.append_attribute("language") = itr->language.c_str();
if (!itr->creator.empty())
bookNode.append_attribute("creator") = itr->creator.c_str();
if (!itr->publisher.empty())
bookNode.append_attribute("publisher") = itr->publisher.c_str();
if (!itr->favicon.empty())
bookNode.append_attribute("favicon") = itr->favicon.c_str();
if (!itr->faviconMimeType.empty())
bookNode.append_attribute("faviconMimeType") = itr->faviconMimeType.c_str();
}
if (!itr->date.empty())
bookNode.append_attribute("date") = itr->date.c_str();
if (!itr->url.empty())
bookNode.append_attribute("url") = itr->url.c_str();
if (!itr->origId.empty())
bookNode.append_attribute("origId") = itr->origId.c_str();
if (!itr->articleCount.empty())
bookNode.append_attribute("articleCount") = itr->articleCount.c_str();
if (!itr->mediaCount.empty())
bookNode.append_attribute("mediaCount") = itr->mediaCount.c_str();
if (!itr->size.empty())
bookNode.append_attribute("size") = itr->size.c_str();
}
}
/* saving file */
doc.save_file(path.c_str());
return true;
}
bool Manager::setCurrentBookId(const string id) {
if (library.current.empty() || library.current.top() != id) {
if (id.empty() && !library.current.empty())
library.current.pop();
else
library.current.push(id);
}
return true;
}
string Manager::getCurrentBookId() {
return library.current.empty() ?
"" : library.current.top();
}
/* Add a book to the library. Return empty string if failed, book id otherwise */
string Manager::addBookFromPathAndGetId(const string pathToOpen, const string pathToSave,
const string url, const bool checkMetaData) {
for (pugi::xml_node bookNode = libraryNode.child("book"); bookNode;
bookNode = bookNode.next_sibling("book")) {
kiwix::Book book;
if (this->readBookFromPath(pathToOpen, &book)) {
book.setReadOnly(readOnly);
book.updateFromXml(bookNode,
removeLastPathElement(libraryPath, true, false));
if (pathToSave != pathToOpen) {
book.path = pathToSave;
book.pathAbsolute = isRelativePath(pathToSave) ?
computeAbsolutePath(removeLastPathElement(writableLibraryPath, true, false), pathToSave) : pathToSave;
}
if (!checkMetaData ||
(checkMetaData && !book.title.empty() && !book.language.empty() && !book.date.empty())) {
book.url = url;
library.addBook(book);
return book.id;
/* Update the book properties with the new importer */
if (libraryVersion.empty()
|| atoi(libraryVersion.c_str()) <= atoi(KIWIX_LIBRARY_VERSION)) {
if (!book.getPath().empty()) {
this->readBookFromPath(book.getPath(), &book);
}
}
return "";
manipulator->addBookToLibrary(book);
}
/* Wrapper over Manager::addBookFromPath which return a bool instead of a string */
bool Manager::addBookFromPath(const string pathToOpen, const string pathToSave, const string url, const bool checkMetaData) {
return !(this->addBookFromPathAndGetId(pathToOpen, pathToSave, url, checkMetaData).empty());
return true;
}
bool Manager::readXml(const std::string& xml,
const bool readOnly,
const std::string& libraryPath)
{
pugi::xml_document doc;
pugi::xml_parse_result result
= doc.load_buffer_inplace((void*)xml.data(), xml.size());
if (result) {
this->parseXmlDom(doc, readOnly, libraryPath);
}
bool Manager::readBookFromPath(const string path, kiwix::Book *book) {
try {
kiwix::Reader *reader = new kiwix::Reader(path);
return true;
}
if (book != NULL) {
book->path = path;
book->pathAbsolute = path;
book->id = reader->getId();
book->description = reader->getDescription();
book->language = reader->getLanguage();
book->date = reader->getDate();
book->creator = reader->getCreator();
book->publisher = reader->getPublisher();
book->title = reader->getTitle();
book->name = reader->getName();
book->tags = reader->getTags();
book->origId = reader->getOrigId();
std::ostringstream articleCountStream;
articleCountStream << reader->getArticleCount();
book->articleCount = articleCountStream.str();
std::ostringstream mediaCountStream;
mediaCountStream << reader->getMediaCount();
book->mediaCount = mediaCountStream.str();
ostringstream convert; convert << reader->getFileSize();
book->size = convert.str();
bool Manager::parseOpdsDom(const pugi::xml_document& doc, const std::string& urlHost)
{
pugi::xml_node libraryNode = doc.child("feed");
string favicon;
string faviconMimeType;
if (reader->getFavicon(favicon, faviconMimeType)) {
book->favicon = base64_encode(reinterpret_cast<const unsigned char*>(favicon.c_str()), favicon.length());
book->faviconMimeType = faviconMimeType;
}
}
try {
m_totalBooks = strtoull(libraryNode.child("totalResults").child_value(), 0, 0);
m_startIndex = strtoull(libraryNode.child("startIndex").child_value(), 0, 0);
m_itemsPerPage = strtoull(libraryNode.child("itemsPerPage").child_value(), 0, 0);
m_hasSearchResult = true;
} catch(...) {
m_hasSearchResult = false;
}
delete reader;
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
return false;
}
for (pugi::xml_node entryNode = libraryNode.child("entry"); entryNode;
entryNode = entryNode.next_sibling("entry")) {
kiwix::Book book;
book.setReadOnly(false);
book.updateFromOpds(entryNode, urlHost);
/* Update the book properties with the new importer */
manipulator->addBookToLibrary(book);
}
return true;
}
bool Manager::readOpds(const std::string& content, const std::string& urlHost)
{
pugi::xml_document doc;
pugi::xml_parse_result result
= doc.load_buffer_inplace((void*)content.data(), content.size());
if (result) {
this->parseOpdsDom(doc, urlHost);
return true;
}
bool Manager::removeBookByIndex(const unsigned int bookIndex) {
return this->library.removeBookByIndex(bookIndex);
return false;
}
bool Manager::readFile(const std::string& path, const bool readOnly)
{
return this->readFile(path, path, readOnly);
}
bool Manager::readFile(const std::string& nativePath,
const std::string& UTF8Path,
const bool readOnly)
{
bool retVal = true;
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load_file(nativePath.c_str());
if (result) {
this->parseXmlDom(doc, readOnly, UTF8Path);
} else {
retVal = false;
}
bool Manager::removeBookById(const string id) {
unsigned int bookIndex = 0;
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if ( itr->id == id) {
return this->library.removeBookByIndex(bookIndex);
}
bookIndex++;
/* This has to be set (although if the file does not exists) to be
* able to know where to save the library if new content are
* available */
if (!readOnly) {
this->writableLibraryPath = UTF8Path;
}
return retVal;
}
/* Add a book to the library. Return empty string if failed, book id otherwise
*/
std::string Manager::addBookFromPathAndGetId(const std::string& pathToOpen,
const std::string& pathToSave,
const std::string& url,
const bool checkMetaData)
{
kiwix::Book book;
if (this->readBookFromPath(pathToOpen, &book)) {
if (pathToSave != pathToOpen) {
book.setPath(isRelativePath(pathToSave)
? computeAbsolutePath(
removeLastPathElement(writableLibraryPath, true, false),
pathToSave)
: pathToSave);
}
if (!checkMetaData
|| (checkMetaData && !book.getTitle().empty() && !book.getLanguage().empty()
&& !book.getDate().empty())) {
book.setUrl(url);
manipulator->addBookToLibrary(book);
return book.getId();
}
}
return "";
}
/* Wrapper over Manager::addBookFromPath which return a bool instead of a string
*/
bool Manager::addBookFromPath(const std::string& pathToOpen,
const std::string& pathToSave,
const std::string& url,
const bool checkMetaData)
{
return !(
this->addBookFromPathAndGetId(pathToOpen, pathToSave, url, checkMetaData)
.empty());
}
bool Manager::readBookFromPath(const std::string& path, kiwix::Book* book)
{
std::string tmp_path = path;
if (isRelativePath(path)) {
tmp_path = computeAbsolutePath(getCurrentDirectory(), path);
}
try {
kiwix::Reader reader(tmp_path);
book->update(reader);
book->setPathValid(true);
} catch (const std::exception& e) {
std::cerr << "Invalid " << tmp_path << " : " << e.what() << std::endl;
book->setPathValid(false);
return false;
}
vector<string> Manager::getBooksLanguages() {
std::vector<string> booksLanguages;
std::vector<kiwix::Book>::iterator itr;
std::map<string, bool> booksLanguagesMap;
return true;
}
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByLanguage);
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (booksLanguagesMap.find(itr->language) == booksLanguagesMap.end()) {
if (itr->origId.empty()) {
booksLanguagesMap[itr->language] = true;
booksLanguages.push_back(itr->language);
}
}
}
bool Manager::readBookmarkFile(const std::string& path)
{
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load_file(path.c_str());
return booksLanguages;
}
vector<string> Manager::getBooksCreators() {
std::vector<string> booksCreators;
std::vector<kiwix::Book>::iterator itr;
std::map<string, bool> booksCreatorsMap;
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByCreator);
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (booksCreatorsMap.find(itr->creator) == booksCreatorsMap.end()) {
if (itr->origId.empty()) {
booksCreatorsMap[itr->creator] = true;
booksCreators.push_back(itr->creator);
}
}
}
return booksCreators;
}
vector<string> Manager::getBooksIds() {
std::vector<string> booksIds;
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
booksIds.push_back(itr->id);
}
return booksIds;
}
vector<string> Manager::getBooksPublishers() {
std::vector<string> booksPublishers;
std::vector<kiwix::Book>::iterator itr;
std::map<string, bool> booksPublishersMap;
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByPublisher);
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if (booksPublishersMap.find(itr->publisher) == booksPublishersMap.end()) {
if (itr->origId.empty()) {
booksPublishersMap[itr->publisher] = true;
booksPublishers.push_back(itr->publisher);
}
}
}
return booksPublishers;
}
kiwix::Library Manager::cloneLibrary() {
return this->library;
}
bool Manager::getCurrentBook(Book &book) {
string currentBookId = getCurrentBookId();
if (currentBookId.empty()) {
return false;
} else {
getBookById(currentBookId, book);
return true;
}
}
bool Manager::getBookById(const string id, Book &book) {
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if ( itr->id == id) {
book = *itr;
return true;
}
}
if (!result) {
return false;
}
bool Manager::updateBookLastOpenDateById(const string id) {
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if ( itr->id == id) {
char unixdate[12];
sprintf (unixdate, "%d", (int)time(NULL));
itr->last = unixdate;
return true;
}
}
pugi::xml_node libraryNode = doc.child("bookmarks");
return false;
for (pugi::xml_node node = libraryNode.child("bookmark"); node;
node = node.next_sibling("bookmark")) {
kiwix::Bookmark bookmark;
bookmark.updateFromXml(node);
manipulator->addBookmarkToLibrary(bookmark);
}
bool Manager::setBookIndex(const string id, const string path, const supportedIndexType type) {
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if ( itr->id == id) {
itr->indexPath = path;
itr->indexPathAbsolute = isRelativePath(path) ?
computeAbsolutePath(removeLastPathElement(writableLibraryPath, true, false), path) : path;
itr->indexType = type;
return true;
}
}
return false;
}
bool Manager::setBookIndex(const string id, const string path) {
return this->setBookIndex(id, path, XAPIAN);
}
bool Manager::setBookPath(const string id, const string path) {
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if ( itr->id == id) {
itr->path = path;
itr->pathAbsolute = isRelativePath(path) ?
computeAbsolutePath(removeLastPathElement(writableLibraryPath, true, false), path) : path;
return true;
}
}
return false;
}
void Manager::removeBookPaths() {
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
itr->path = "";
itr->pathAbsolute = "";
}
}
unsigned int Manager::getBookCount(const bool localBooks, const bool remoteBooks) {
unsigned int result = 0;
std::vector<kiwix::Book>::iterator itr;
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if ((!itr->path.empty() && localBooks) || (itr->path.empty() && remoteBooks))
result++;
}
return result;
}
bool Manager::listBooks(const supportedListMode mode, const supportedListSortBy sortBy,
const unsigned int maxSize, const string language, const string creator,
const string publisher, const string search) {
this->bookIdList.clear();
std::vector<kiwix::Book>::iterator itr;
/* Sort */
if (sortBy == TITLE) {
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByTitle);
} else if (sortBy == SIZE) {
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortBySize);
} else if (sortBy == DATE) {
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByDate);
} else if (sortBy == CREATOR) {
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByCreator);
} else if (sortBy == PUBLISHER) {
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByPublisher);
}
/* Special sort for LASTOPEN */
if (mode == LASTOPEN) {
std::sort(library.books.begin(), library.books.end(), kiwix::Book::sortByLastOpen);
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
if (!itr->last.empty())
this->bookIdList.push_back(itr->id);
}
} else {
/* Generate the list of book id */
for ( itr = library.books.begin(); itr != library.books.end(); ++itr ) {
bool ok = true;
if (mode == LOCAL && itr->path.empty())
ok = false;
if (ok == true && mode == REMOTE && (!itr->path.empty() || itr->url.empty()))
ok = false;
if (ok == true && maxSize != 0 && (unsigned int)atoi(itr->size.c_str()) > maxSize * 1024 * 1024)
ok = false;
if (ok == true && !language.empty() && !matchRegex(itr->language, language))
ok = false;
if (ok == true && !creator.empty() && itr->creator != creator)
ok = false;
if (ok == true && !publisher.empty() && itr->publisher != publisher)
ok = false;
if ((ok == true && !search.empty()) && !(matchRegex(itr->title, "\\Q" + search + "\\E") ||
matchRegex(itr->description, "\\Q" + search + "\\E") ||
matchRegex(itr->language, "\\Q" + search + "\\E")
))
ok = false;
if (ok == true) {
this->bookIdList.push_back(itr->id);
}
}
}
return true;
}
void Manager::checkAndCleanBookPaths(Book &book, const string &libraryPath) {
if (!book.path.empty()) {
if (isRelativePath(book.path)) {
book.pathAbsolute = computeAbsolutePath(removeLastPathElement(libraryPath, true, false), book.path);
} else {
book.pathAbsolute = book.path;
book.path = computeRelativePath(removeLastPathElement(libraryPath, true, false), book.pathAbsolute);
}
}
if (!book.indexPath.empty()) {
if (isRelativePath(book.indexPath)) {
book.indexPathAbsolute =
computeAbsolutePath(removeLastPathElement(libraryPath, true, false), book.indexPath);
} else {
book.indexPathAbsolute = book.indexPath;
book.indexPath =
computeRelativePath(removeLastPathElement(libraryPath, true, false), book.indexPathAbsolute);
}
}
}
return true;
}
}

View File

@@ -1,34 +1,36 @@
kiwix_sources = [
'book.cpp',
'bookmark.cpp',
'library.cpp',
'manager.cpp',
'libxml_dumper.cpp',
'opds_dumper.cpp',
'downloader.cpp',
'reader.cpp',
'entry.cpp',
'searcher.cpp',
'common/base64.cpp',
'common/pathTools.cpp',
'common/regexTools.cpp',
'common/stringTools.cpp',
'common/networkTools.cpp',
'common/otherTools.cpp',
'xapian/htmlparse.cc',
'xapian/myhtmlparse.cc'
'subprocess.cpp',
'aria2.cpp',
'tools/base64.cpp',
'tools/pathTools.cpp',
'tools/regexTools.cpp',
'tools/stringTools.cpp',
'tools/networkTools.cpp',
'tools/otherTools.cpp',
]
kiwix_sources += lib_resources
if xapian_dep.found()
kiwix_sources += ['xapianSearcher.cpp']
if not get_option('android')
kiwix_sources += ['xapianIndexer.cpp']
endif
endif
if not get_option('android')
kiwix_sources += ['indexer.cpp']
if host_machine.system() == 'windows'
kiwix_sources += 'subprocess_windows.cpp'
else
subdir('android')
kiwix_sources += 'subprocess_unix.cpp'
endif
if has_ctpp2_dep
kiwix_sources += ['ctpp2/CTPP2VMStringLoader.cpp']
if get_option('android')
subdir('android')
install_dir = 'kiwix-lib/jniLibs/' + meson.get_cross_property('android_abi')
else
install_dir = get_option('libdir')
endif
config_h = configure_file(output : 'kiwix_config.h',
@@ -40,5 +42,7 @@ kiwixlib = library('kiwix',
kiwix_sources,
include_directories : inc,
dependencies : all_deps,
version: '1.0.0',
install : true)
version: meson.project_version(),
install: true,
install_dir: install_dir,
install_rpath: '$ORIGIN')

144
src/opds_dumper.cpp Normal file
View File

@@ -0,0 +1,144 @@
/*
* Copyright 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "opds_dumper.h"
#include "book.h"
#include <tools/otherTools.h>
namespace kiwix
{
/* Constructor */
OPDSDumper::OPDSDumper(Library* library)
: library(library)
{
}
/* Destructor */
OPDSDumper::~OPDSDumper()
{
}
std::string gen_date_str()
{
auto now = time(0);
auto tm = localtime(&now);
std::stringstream is;
is << std::setw(2) << std::setfill('0')
<< 1900+tm->tm_year << "-"
<< std::setw(2) << std::setfill('0') << tm->tm_mon << "-"
<< std::setw(2) << std::setfill('0') << tm->tm_mday << "T"
<< std::setw(2) << std::setfill('0') << tm->tm_hour << ":"
<< std::setw(2) << std::setfill('0') << tm->tm_min << ":"
<< std::setw(2) << std::setfill('0') << tm->tm_sec << "Z";
return is.str();
}
static std::string gen_date_from_yyyy_mm_dd(const std::string& date)
{
std::stringstream is;
is << date << "T00:00::00:Z";
return is.str();
}
void OPDSDumper::setOpenSearchInfo(int totalResults, int startIndex, int count)
{
m_totalResults = totalResults;
m_startIndex = startIndex,
m_count = count;
m_isSearchResult = true;
}
#define ADD_TEXT_ENTRY(node, child, value) (node).append_child((child)).append_child(pugi::node_pcdata).set_value((value).c_str())
pugi::xml_node OPDSDumper::handleBook(Book book, pugi::xml_node root_node) {
auto entry_node = root_node.append_child("entry");
ADD_TEXT_ENTRY(entry_node, "title", book.getTitle());
ADD_TEXT_ENTRY(entry_node, "id", "urn:uuid:"+book.getId());
ADD_TEXT_ENTRY(entry_node, "icon", rootLocation + "/meta?name=favicon&content=" + book.getHumanReadableIdFromPath());
ADD_TEXT_ENTRY(entry_node, "updated", gen_date_from_yyyy_mm_dd(book.getDate()));
ADD_TEXT_ENTRY(entry_node, "summary", book.getDescription());
auto content_node = entry_node.append_child("link");
content_node.append_attribute("type") = "text/html";
content_node.append_attribute("href") = (rootLocation + "/" + book.getHumanReadableIdFromPath()).c_str();
auto author_node = entry_node.append_child("author");
ADD_TEXT_ENTRY(author_node, "name", book.getCreator());
if (! book.getUrl().empty()) {
auto acquisition_link = entry_node.append_child("link");
acquisition_link.append_attribute("rel") = "http://opds-spec.org/acquisition/open-access";
acquisition_link.append_attribute("type") = "application/x-zim";
acquisition_link.append_attribute("href") = book.getUrl().c_str();
acquisition_link.append_attribute("length") = to_string(book.getSize()).c_str();
}
if (! book.getFaviconMimeType().empty() ) {
auto image_link = entry_node.append_child("link");
image_link.append_attribute("rel") = "http://opds-spec.org/image/thumbnail";
image_link.append_attribute("type") = book.getFaviconMimeType().c_str();
image_link.append_attribute("href") = (rootLocation + "/meta?name=favicon&content=" + book.getHumanReadableIdFromPath()).c_str();
}
return entry_node;
}
string OPDSDumper::dumpOPDSFeed(const std::vector<std::string>& bookIds)
{
date = gen_date_str();
pugi::xml_document doc;
auto root_node = doc.append_child("feed");
root_node.append_attribute("xmlns") = "http://www.w3.org/2005/Atom";
root_node.append_attribute("xmlns:opds") = "http://opds-spec.org/2010/catalog";
ADD_TEXT_ENTRY(root_node, "id", id);
ADD_TEXT_ENTRY(root_node, "title", title);
ADD_TEXT_ENTRY(root_node, "updated", date);
if (m_isSearchResult) {
ADD_TEXT_ENTRY(root_node, "totalResults", to_string(m_totalResults));
ADD_TEXT_ENTRY(root_node, "startIndex", to_string(m_startIndex));
ADD_TEXT_ENTRY(root_node, "itemsPerPage", to_string(m_count));
}
auto self_link_node = root_node.append_child("link");
self_link_node.append_attribute("rel") = "self";
self_link_node.append_attribute("href") = "";
self_link_node.append_attribute("type") = "application/atom+xml";
if (!searchDescriptionUrl.empty() ) {
auto search_link = root_node.append_child("link");
search_link.append_attribute("rel") = "search";
search_link.append_attribute("type") = "application/opensearchdescription+xml";
search_link.append_attribute("href") = searchDescriptionUrl.c_str();
}
if (library) {
for (auto& bookId: bookIds) {
handleBook(library->getBookById(bookId), root_node);
}
}
return nodeToString(root_node);
}
}

View File

File diff suppressed because it is too large Load Diff

View File

@@ -17,206 +17,373 @@
* MA 02110-1301, USA.
*/
#include <cmath>
#include "searcher.h"
#include "reader.h"
#include <zim/search.h>
#include <mustache.hpp>
#include "kiwixlib-resources.h"
#ifdef ENABLE_CTPP2
#include <ctpp2/CDT.hpp>
#include <ctpp2/CTPP2FileLogger.hpp>
#include <ctpp2/CTPP2SimpleVM.hpp>
#include "ctpp2/CTPP2VMStringLoader.hpp"
#define MAX_SEARCH_LEN 140
using namespace CTPP;
#endif
namespace kiwix
{
class _Result : public Result
{
public:
_Result(zim::Search::iterator& iterator);
virtual ~_Result(){};
virtual std::string get_url();
virtual std::string get_title();
virtual int get_score();
virtual std::string get_snippet();
virtual std::string get_content();
virtual int get_wordCount();
virtual int get_size();
virtual int get_readerIndex();
namespace kiwix {
private:
zim::Search::iterator iterator;
};
/* Constructor */
Searcher::Searcher() :
searchPattern(""),
protocolPrefix("zim://"),
searchProtocolPrefix("search://?"),
resultCountPerPage(0),
estimatedResultCount(0),
resultStart(0),
resultEnd(0)
struct SearcherInternal {
const zim::Search* _search;
zim::Search::iterator current_iterator;
SearcherInternal() : _search(NULL) {}
~SearcherInternal()
{
template_ct2 = RESOURCE::results_ct2;
loadICUExternalTables();
if (_search != NULL) {
delete _search;
}
}
/* Destructor */
Searcher::~Searcher() {}
/* Search strings in the database */
void Searcher::search(std::string &search, unsigned int resultStart,
unsigned int resultEnd, const bool verbose) {
this->reset();
};
if (verbose == true) {
cout << "Performing query `" << search << "'" << endl;
/* Constructor */
Searcher::Searcher(const std::string& humanReadableName)
: internal(new SearcherInternal()),
searchPattern(""),
protocolPrefix("zim://"),
searchProtocolPrefix("search://?"),
resultCountPerPage(0),
estimatedResultCount(0),
resultStart(0),
resultEnd(0),
contentHumanReadableId(humanReadableName)
{
loadICUExternalTables();
}
/* Destructor */
Searcher::~Searcher()
{
delete internal;
}
bool Searcher::add_reader(Reader* reader, const std::string& humanReadableName)
{
if (!reader->hasFulltextIndex()) {
return false;
}
this->readers.push_back(reader);
this->humanReaderNames.push_back(humanReadableName);
return true;
}
/* Search strings in the database */
void Searcher::search(std::string& search,
unsigned int resultStart,
unsigned int resultEnd,
const bool verbose)
{
this->reset();
if (verbose == true) {
cout << "Performing query `" << search << "'" << endl;
}
/* If resultEnd & resultStart inverted */
if (resultStart > resultEnd) {
resultEnd += resultStart;
resultStart = resultEnd - resultStart;
resultEnd -= resultStart;
}
/* Try to find results */
if (resultStart != resultEnd) {
/* Avoid big researches */
this->resultCountPerPage = resultEnd - resultStart;
if (this->resultCountPerPage > MAX_SEARCH_LEN) {
resultEnd = resultStart + MAX_SEARCH_LEN;
this->resultCountPerPage = MAX_SEARCH_LEN;
}
/* If resultEnd & resultStart inverted */
if (resultStart > resultEnd) {
resultEnd += resultStart;
resultStart = resultEnd - resultStart;
resultEnd -= resultStart;
}
/* Try to find results */
if (resultStart != resultEnd) {
/* Avoid big researches */
this->resultCountPerPage = resultEnd - resultStart;
if (this->resultCountPerPage > 70) {
resultEnd = resultStart + 70;
this->resultCountPerPage = 70;
/* Perform the search */
this->searchPattern = search;
this->resultStart = resultStart;
this->resultEnd = resultEnd;
string unaccentedSearch = removeAccents(search);
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
if ( (*current)->hasFulltextIndex() ) {
zims.push_back((*current)->getZimFileHandler());
}
/* Perform the search */
this->searchPattern = search;
this->resultStart = resultStart;
this->resultEnd = resultEnd;
string unaccentedSearch = removeAccents(search);
searchInIndex(unaccentedSearch, resultStart, resultEnd, verbose);
this->resultOffset = this->results.begin();
}
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
return;
}
void Searcher::geo_search(float latitude, float longitude, float distance,
unsigned int resultStart,
unsigned int resultEnd,
const bool verbose)
{
this->reset();
if (verbose == true) {
cout << "Performing geo query `" << distance << "&(" << latitude << ";" << longitude << ")'" << endl;
}
/* If resultEnd & resultStart inverted */
if (resultStart > resultEnd) {
resultEnd += resultStart;
resultStart = resultEnd - resultStart;
resultEnd -= resultStart;
}
/* Try to find results */
if (resultStart == resultEnd) {
return;
}
/* Reset the results */
void Searcher::reset() {
this->results.clear();
this->resultOffset = this->results.begin();
this->estimatedResultCount = 0;
this->searchPattern = "";
return;
/* Avoid big researches */
this->resultCountPerPage = resultEnd - resultStart;
if (this->resultCountPerPage > MAX_SEARCH_LEN) {
resultEnd = resultStart + MAX_SEARCH_LEN;
this->resultCountPerPage = MAX_SEARCH_LEN;
}
/* Return the result count estimation */
unsigned int Searcher::getEstimatedResultCount() {
return this->estimatedResultCount;
/* Perform the search */
std::ostringstream oss;
oss << "Articles located less than " << distance << " meters of " << latitude << ";" << longitude;
this->searchPattern = oss.str();
this->resultStart = resultStart;
this->resultEnd = resultEnd;
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
zims.push_back((*current)->getZimFileHandler());
}
zim::Search* search = new zim::Search(zims);
search->set_query("");
search->set_georange(latitude, longitude, distance);
search->set_range(resultStart, resultEnd);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
void Searcher::restart_search()
{
if (internal->_search) {
internal->current_iterator = internal->_search->begin();
}
}
Result* Searcher::getNextResult()
{
if (internal->_search &&
internal->current_iterator != internal->_search->end()) {
Result* result = new _Result(internal->current_iterator);
internal->current_iterator++;
return result;
}
return NULL;
}
/* Reset the results */
void Searcher::reset()
{
this->estimatedResultCount = 0;
this->searchPattern = "";
return;
}
void Searcher::suggestions(std::string& searchPattern, const bool verbose)
{
this->reset();
if (verbose == true) {
cout << "Performing suggestion query `" << searchPattern << "`" << endl;
}
/* Get next result */
bool Searcher::getNextResult(string &url, string &title, unsigned int &score) {
bool retVal = false;
this->searchPattern = searchPattern;
this->resultStart = 0;
this->resultEnd = 10;
string unaccentedSearch = removeAccents(searchPattern);
if (this->resultOffset != this->results.end()) {
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
zims.push_back((*current)->getZimFileHandler());
}
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
search->set_suggestion_mode(true);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
/* url */
url = this->resultOffset->url;
/* Return the result count estimation */
unsigned int Searcher::getEstimatedResultCount()
{
return this->estimatedResultCount;
}
/* title */
title = this->resultOffset->title;
bool Searcher::setProtocolPrefix(const std::string prefix)
{
this->protocolPrefix = prefix;
return true;
}
/* score */
score = this->resultOffset->score;
bool Searcher::setSearchProtocolPrefix(const std::string prefix)
{
this->searchProtocolPrefix = prefix;
return true;
}
/* increment the cursor for the next call */
this->resultOffset++;
_Result::_Result(zim::Search::iterator& iterator)
: iterator(iterator)
{
}
retVal = true;
std::string _Result::get_url()
{
return iterator.get_url();
}
std::string _Result::get_title()
{
return iterator.get_title();
}
int _Result::get_score()
{
return iterator.get_score();
}
std::string _Result::get_snippet()
{
return iterator.get_snippet();
}
std::string _Result::get_content()
{
if (iterator->good()) {
return iterator->getData();
}
return "";
}
int _Result::get_size()
{
return iterator.get_size();
}
int _Result::get_wordCount()
{
return iterator.get_wordCount();
}
int _Result::get_readerIndex()
{
return iterator.get_fileIndex();
}
string Searcher::getHtml()
{
kainjow::mustache::data results{kainjow::mustache::data::type::list};
this->restart_search();
Result* p_result = NULL;
while ((p_result = this->getNextResult())) {
kainjow::mustache::data result;
result.set("title", p_result->get_title());
result.set("url", p_result->get_url());
result.set("snippet", p_result->get_snippet());
result.set("resultContentId", humanReaderNames[p_result->get_readerIndex()]);
if (p_result->get_wordCount() >= 0) {
result.set("wordCount", kiwix::beautifyInteger(p_result->get_wordCount()));
}
return retVal;
results.push_back(result);
delete p_result;
}
bool Searcher::setProtocolPrefix(const std::string prefix) {
this->protocolPrefix = prefix;
return true;
// pages
kainjow::mustache::data pages{kainjow::mustache::data::type::list};
unsigned int pageStart
= this->resultStart / this->resultCountPerPage >= 5
? this->resultStart / this->resultCountPerPage - 4
: 0;
unsigned int pageCount
= this->estimatedResultCount / this->resultCountPerPage + 1 - pageStart;
if (pageCount > 10) {
pageCount = 10;
} else if (pageCount == 1) {
pageCount = 0;
}
bool Searcher::setSearchProtocolPrefix(const std::string prefix) {
this->searchProtocolPrefix = prefix;
return true;
}
for (unsigned int i = pageStart; i < pageStart + pageCount; i++) {
kainjow::mustache::data page;
page.set("label", to_string(i + 1));
page.set("start", to_string(i * this->resultCountPerPage));
page.set("end", to_string((i + 1) * this->resultCountPerPage));
void Searcher::setContentHumanReadableId(const string &contentHumanReadableId) {
this->contentHumanReadableId = contentHumanReadableId;
}
#ifdef ENABLE_CTPP2
string Searcher::getHtml() {
SimpleVM oSimpleVM;
// Fill data
CDT oData;
CDT resultsCDT(CDT::ARRAY_VAL);
this->resultOffset = this->results.begin();
while (this->resultOffset != this->results.end()) {
CDT result;
result["title"] = this->resultOffset->title;
result["url"] = this->resultOffset->url;
result["snippet"] = this->resultOffset->snippet;
if (this->resultOffset->size >= 0)
result["size"] = kiwix::beautifyInteger(this->resultOffset->size);
if (this->resultOffset->wordCount >= 0)
result["wordCount"] = kiwix::beautifyInteger(this->resultOffset->wordCount);
resultsCDT.PushBack(result);
this->resultOffset++;
if (i * this->resultCountPerPage == this->resultStart) {
page.set("selected", true);
}
this->resultOffset = this->results.begin();
oData["results"] = resultsCDT;
// pages
CDT pagesCDT(CDT::ARRAY_VAL);
unsigned int pageStart = this->resultStart / this->resultCountPerPage >= 5 ? this->resultStart / this->resultCountPerPage - 4 : 0;
unsigned int pageCount = this->estimatedResultCount / this->resultCountPerPage + 1 - pageStart;
if (pageCount > 10)
pageCount = 10;
else if (pageCount == 1)
pageCount = 0;
for (unsigned int i=pageStart; i<pageStart+pageCount; i++) {
CDT page;
page["label"] = i + 1;
page["start"] = i * this->resultCountPerPage;
page["end"] = (i+1) * this->resultCountPerPage;
if (i * this->resultCountPerPage == this->resultStart)
page["selected"] = true;
pagesCDT.PushBack(page);
}
oData["pages"] = pagesCDT;
oData["count"] = kiwix::beautifyInteger(this->estimatedResultCount);
oData["searchPattern"] = kiwix::encodeDiples(this->searchPattern);
oData["searchPatternEncoded"] = urlEncode(this->searchPattern);
oData["resultStart"] = this->resultStart + 1;
oData["resultEnd"] = (this->resultEnd > this->estimatedResultCount ? this->estimatedResultCount : this->resultEnd);
oData["resultRange"] = this->resultCountPerPage;
oData["resultLastPageStart"] = this->estimatedResultCount > this->resultCountPerPage ? this->estimatedResultCount - this->resultCountPerPage : 0;
oData["protocolPrefix"] = this->protocolPrefix;
oData["searchProtocolPrefix"] = this->searchProtocolPrefix;
oData["contentId"] = this->contentHumanReadableId;
VMStringLoader oLoader(template_ct2.c_str(), template_ct2.size());
FileLogger oLogger(stderr);
// DEBUG only (write output to stdout)
// oSimpleVM.Run(oData, oLoader, stdout, oLogger);
std::string sResult;
oSimpleVM.Run(oData, oLoader, sResult, oLogger);
return sResult;
pages.push_back(page);
}
#endif
std::string template_str = RESOURCE::search_result_tmpl;
kainjow::mustache::mustache tmpl(template_str);
kainjow::mustache::data allData;
allData.set("results", results);
allData.set("pages", pages);
allData.set("hasResult", this->estimatedResultCount != 0);
allData.set("count", kiwix::beautifyInteger(this->estimatedResultCount));
allData.set("searchPattern", kiwix::encodeDiples(this->searchPattern));
allData.set("searchPatternEncoded", urlEncode(this->searchPattern));
allData.set("resultStart", to_string(this->resultStart + 1));
allData.set("resultEnd", to_string(min(this->resultEnd, this->estimatedResultCount)));
allData.set("resultRange", to_string(this->resultCountPerPage));
allData.set("resultLastPageStart", to_string(this->estimatedResultCount > this->resultCountPerPage
? round(this->estimatedResultCount / this->resultCountPerPage) * this->resultCountPerPage
: 0));
allData.set("lastResult", to_string(this->estimatedResultCount));
allData.set("protocolPrefix", this->protocolPrefix);
allData.set("searchProtocolPrefix", this->searchProtocolPrefix);
allData.set("contentId", this->contentHumanReadableId);
std::stringstream ss;
tmpl.render(allData, [&ss](const std::string& str) { ss << str; });
return ss.str();
}
}

40
src/subprocess.cpp Normal file
View File

@@ -0,0 +1,40 @@
#include "subprocess.h"
#ifdef _WIN32
# include "subprocess_windows.h"
#else
# include "subprocess_unix.h"
#endif
Subprocess::Subprocess(std::unique_ptr<SubprocessImpl> impl, commandLine_t& commandLine) :
mp_impl(std::move(impl))
{
mp_impl->run(commandLine);
}
Subprocess::~Subprocess()
{
mp_impl->kill();
}
std::unique_ptr<Subprocess> Subprocess::run(commandLine_t& commandLine)
{
#ifdef _WIN32
auto impl = std::unique_ptr<SubprocessImpl>(new WinImpl);
#else
auto impl = std::unique_ptr<UnixImpl>(new UnixImpl);
#endif
return std::unique_ptr<Subprocess>(new Subprocess(std::move(impl), commandLine));
}
bool Subprocess::isRunning()
{
return mp_impl->isRunning();
}
bool Subprocess::kill()
{
return mp_impl->kill();
}

36
src/subprocess.h Normal file
View File

@@ -0,0 +1,36 @@
#ifndef KIWIX_SUBPROCESS_H_
#define KIWIX_SUBPROCESS_H_
#include <string>
#include <memory>
#include <vector>
typedef std::vector<const char *> commandLine_t;
class SubprocessImpl
{
public:
virtual void run(commandLine_t& commandLine) = 0;
virtual bool kill() = 0;
virtual bool isRunning() = 0;
virtual ~SubprocessImpl() = default;
};
class Subprocess
{
private:
// Impl depends of the system (window, unix, ...)
std::unique_ptr<SubprocessImpl> mp_impl;
Subprocess(std::unique_ptr<SubprocessImpl> impl, commandLine_t& commandLine);
public:
static std::unique_ptr<Subprocess> run(commandLine_t& commandLine);
~Subprocess();
bool isRunning();
bool kill();
};
#endif // KIWIX_SUBPROCESS_H_

93
src/subprocess_unix.cpp Normal file
View File

@@ -0,0 +1,93 @@
#include "subprocess_unix.h"
#include <sys/types.h>
#include <unistd.h>
#include <iostream>
#include <signal.h>
#include <sys/wait.h>
#include <stdlib.h>
UnixImpl::UnixImpl():
m_pid(0),
m_running(false),
m_mutex(PTHREAD_MUTEX_INITIALIZER),
m_waitingThread()
{
}
UnixImpl::~UnixImpl()
{
kill();
// Android has no pthread_cancel :(
#ifdef __ANDROID__
pthread_kill(m_waitingThread, SIGUSR1);
#else
pthread_cancel(m_waitingThread);
#endif
}
#ifdef __ANDROID__
void thread_exit_handler(int sig) {
pthread_exit(0);
}
#endif
void* UnixImpl::waitForPID(void* _self)
{
#ifdef __ANDROID__
struct sigaction actions;
memset(&actions, 0, sizeof(actions));
sigemptyset(&actions.sa_mask);
actions.sa_flags = 0;
actions.sa_handler = thread_exit_handler;
sigaction(SIGUSR1, &actions, NULL);
#endif
UnixImpl* self = static_cast<UnixImpl*>(_self);
waitpid(self->m_pid, NULL, WEXITED);
pthread_mutex_lock(&self->m_mutex);
self->m_running = false;
pthread_mutex_unlock(&self->m_mutex);
return self;
}
void UnixImpl::run(commandLine_t& commandLine)
{
const char* binary = commandLine[0];
int pid = fork();
switch(pid) {
case -1:
std::cerr << "cannot fork" << std::endl;
break;
case 0:
commandLine.push_back(NULL);
if (execvp(binary, const_cast<char* const*>(commandLine.data()))) {
perror("Cannot launch\n");
exit(-1);
}
break;
default:
m_pid = pid;
m_running = true;
pthread_create(&m_waitingThread, NULL, waitForPID, this);
break;
}
}
bool UnixImpl::kill()
{
return (::kill(m_pid, SIGKILL) == 0);
}
bool UnixImpl::isRunning()
{
pthread_mutex_lock(&m_mutex);
bool ret = m_running;
pthread_mutex_unlock(&m_mutex);
return ret;
}

28
src/subprocess_unix.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef KIWIX_SUBPROCESS_UNIX_H_
#define KIWIX_SUBPROCESS_UNIX_H_
#include "subprocess.h"
#include <pthread.h>
class UnixImpl : public SubprocessImpl
{
private:
int m_pid;
bool m_running;
pthread_mutex_t m_mutex;
pthread_t m_waitingThread;
public:
UnixImpl();
virtual ~UnixImpl();
void run(commandLine_t& commandLine);
bool kill();
bool isRunning();
static void* waitForPID(void* self);
};
#endif //KIWIX_SUBPROCESS_UNIX_H_

View File

@@ -0,0 +1,93 @@
#include "subprocess_windows.h"
#include <windows.h>
#include <winbase.h>
#include <iostream>
#include <sstream>
WinImpl::WinImpl():
m_pid(0),
m_running(false),
m_handle(INVALID_HANDLE_VALUE)
{
InitializeCriticalSection(&m_criticalSection);
}
WinImpl::~WinImpl()
{
kill();
CloseHandle(m_handle);
DeleteCriticalSection(&m_criticalSection);
}
DWORD WINAPI WinImpl::waitForPID(void* _self)
{
WinImpl* self = static_cast<WinImpl*>(_self);
WaitForSingleObject(self->m_handle, INFINITE);
EnterCriticalSection(&self->m_criticalSection);
self->m_running = false;
LeaveCriticalSection(&self->m_criticalSection);
return 0;
}
std::unique_ptr<wchar_t[]> toWideChar(const std::string& value)
{
auto size = MultiByteToWideChar(CP_UTF8, 0,
value.c_str(), -1, nullptr, 0);
auto wdata = std::unique_ptr<wchar_t[]>(new wchar_t[size]);
auto ret = MultiByteToWideChar(CP_UTF8, 0,
value.c_str(), -1, wdata.get(), size);
if (0 == ret) {
std::ostringstream oss;
oss << "Cannot convert to wchar : " << GetLastError();
throw std::runtime_error(oss.str());
}
return wdata;
}
void WinImpl::run(commandLine_t& commandLine)
{
STARTUPINFOW startInfo = {0};
PROCESS_INFORMATION procInfo;
startInfo.cb = sizeof(startInfo);
std::ostringstream oss;
for(auto& item: commandLine) {
oss << item << " ";
}
auto wCommandLine = toWideChar(oss.str());
if (CreateProcessW(
NULL,
wCommandLine.get(),
NULL,
NULL,
false,
CREATE_NO_WINDOW,
NULL,
NULL,
&startInfo,
&procInfo)) {
m_pid = procInfo.dwProcessId;
m_handle = procInfo.hProcess;
CloseHandle(procInfo.hThread);
m_running = true;
CreateThread(NULL, 0, &waitForPID, this, 0, NULL );
}
}
bool WinImpl::kill()
{
return TerminateProcess(m_handle, 0);
}
bool WinImpl::isRunning()
{
EnterCriticalSection(&m_criticalSection);
bool ret = m_running;
LeaveCriticalSection(&m_criticalSection);
return ret;
}

28
src/subprocess_windows.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef KIWIX_SUBPROCESS_WINDOWS_H_
#define KIWIX_SUBPROCESS_WINDOWS_H_
#include "subprocess.h"
#include <windows.h>
#include <synchapi.h>
class WinImpl : public SubprocessImpl
{
private:
int m_pid;
bool m_running;
HANDLE m_handle;
CRITICAL_SECTION m_criticalSection;
public:
WinImpl();
virtual ~WinImpl();
void run(commandLine_t& commandLine);
bool kill();
bool isRunning();
static DWORD WINAPI waitForPID(void* self);
};
#endif //KIWIX_SUBPROCESS_WINDOWS_H_

View File

@@ -24,7 +24,7 @@
René Nyffenegger rene.nyffenegger@adp-gmbh.ch
*/
#include <common/base64.h>
#include <tools/base64.h>
#include <iostream>
static const std::string base64_chars =
@@ -37,8 +37,10 @@ static inline bool is_base64(unsigned char c) {
return (isalnum(c) || (c == '+') || (c == '/'));
}
std::string base64_encode(unsigned char const* bytes_to_encode, unsigned int in_len) {
std::string base64_encode(const std::string& inString) {
std::string ret;
auto in_len = inString.size();
const unsigned char* bytes_to_encode = reinterpret_cast<const unsigned char*>(inString.data());
int i = 0;
int j = 0;
unsigned char char_array_3[3];

209
src/tools/networkTools.cpp Normal file
View File

@@ -0,0 +1,209 @@
/*
* Copyright 2012 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <tools/networkTools.h>
#ifdef _WIN32
#include <winsock2.h>
#include <ws2tcpip.h>
#else
#include <net/if.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>
#endif
#include <curl/curl.h>
#include <sstream>
#include <iostream>
std::map<std::string, std::string> kiwix::getNetworkInterfaces()
{
std::map<std::string, std::string> interfaces;
#ifdef _WIN32
SOCKET sd = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0);
if (sd == (SOCKET)SOCKET_ERROR) {
std::cerr << "Failed to get a socket. Error " << WSAGetLastError()
<< std::endl;
return interfaces;
}
INTERFACE_INFO InterfaceList[20];
unsigned long nBytesReturned;
if (WSAIoctl(sd,
SIO_GET_INTERFACE_LIST,
0,
0,
&InterfaceList,
sizeof(InterfaceList),
&nBytesReturned,
0,
0)
== SOCKET_ERROR) {
std::cerr << "Failed calling WSAIoctl: error " << WSAGetLastError()
<< std::endl;
return interfaces;
}
int nNumInterfaces = nBytesReturned / sizeof(INTERFACE_INFO);
for (int i = 0; i < nNumInterfaces; ++i) {
sockaddr_in* pAddress;
pAddress = (sockaddr_in*)&(InterfaceList[i].iiAddress);
/* Add to the map */
std::string interfaceName = std::string(inet_ntoa(pAddress->sin_addr));
std::string interfaceIp = std::string(inet_ntoa(pAddress->sin_addr));
interfaces.insert(
std::pair<std::string, std::string>(interfaceName, interfaceIp));
}
#else
/* Get Network interfaces information */
char buf[16384];
struct ifconf ifconf;
int fd = socket(PF_INET, SOCK_DGRAM, 0); /* Only IPV4 */
ifconf.ifc_len = sizeof buf;
ifconf.ifc_buf = buf;
if (ioctl(fd, SIOCGIFCONF, &ifconf) != 0) {
perror("ioctl(SIOCGIFCONF)");
exit(EXIT_FAILURE);
}
/* Go through each interface */
int i;
size_t len;
struct ifreq* ifreq;
ifreq = ifconf.ifc_req;
for (i = 0; i < ifconf.ifc_len;) {
if (ifreq->ifr_addr.sa_family == AF_INET) {
/* Get the network interface ip */
char host[128] = {0};
const int error = getnameinfo(&(ifreq->ifr_addr),
sizeof ifreq->ifr_addr,
host,
sizeof host,
0,
0,
NI_NUMERICHOST);
if (!error) {
std::string interfaceName = std::string(ifreq->ifr_name);
std::string interfaceIp = std::string(host);
/* Add to the map */
interfaces.insert(
std::pair<std::string, std::string>(interfaceName, interfaceIp));
} else {
perror("getnameinfo()");
}
}
/* some systems have ifr_addr.sa_len and adjust the length that
* way, but not mine. weird */
#ifndef __linux__
len = IFNAMSIZ + ifreq->ifr_addr.sa_len;
#else
len = sizeof *ifreq;
#endif
ifreq = (struct ifreq*)((char*)ifreq + len);
i += len;
}
#endif
return interfaces;
}
std::string kiwix::getBestPublicIp()
{
std::map<std::string, std::string> interfaces = kiwix::getNetworkInterfaces();
#ifndef _WIN32
const char* const prioritizedNames[]
= {"eth0", "eth1", "wlan0", "wlan1", "en0", "en1"};
const int count = (sizeof prioritizedNames) / (sizeof prioritizedNames[0]);
for (int i = 0; i < count; ++i) {
std::map<std::string, std::string>::const_iterator it
= interfaces.find(prioritizedNames[i]);
if (it != interfaces.end()) {
return it->second;
}
}
#endif
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end();
++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "192.168") {
return interfaceIp;
}
}
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end();
++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "172.16.") {
return interfaceIp;
}
}
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end();
++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 3 && interfaceIp.substr(0, 3) == "10.") {
return interfaceIp;
}
}
return "127.0.0.1";
}
size_t write_callback_to_iss(char* ptr, size_t size, size_t nmemb, void* userdata)
{
auto str = static_cast<std::stringstream*>(userdata);
str->write(ptr, nmemb);
return nmemb;
}
std::string kiwix::download(const std::string& url) {
auto curl = curl_easy_init();
std::stringstream ss;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPGET, 1L);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_callback_to_iss);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &ss);
auto res = curl_easy_perform(curl);
if (res != CURLE_OK) {
curl_easy_cleanup(curl);
throw std::runtime_error("Cannot perform request");
}
long response_code;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_cleanup(curl);
if (response_code != 200) {
throw std::runtime_error("Invalid return code from server");
}
return ss.str();
}

197
src/tools/otherTools.cpp Normal file
View File

@@ -0,0 +1,197 @@
/*
* Copyright 2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <tools/otherTools.h>
#include <map>
static std::map<std::string, std::string> codeisomapping {
{ "aa", "aar" },
{ "af", "afr" },
{ "ak", "aka" },
{ "am", "amh" },
{ "ar", "ara" },
{ "as", "asm" },
{ "az", "aze" },
{ "ba", "bak" },
{ "be", "bel" },
{ "bg", "bul" },
{ "bm", "bam" },
{ "bn", "ben" },
{ "bo", "bod" },
{ "br", "bre" },
{ "bs", "bos" },
{ "ca", "cat" },
{ "ce", "che" },
{ "co", "cos" },
{ "cs", "ces" },
{ "cu", "chu" },
{ "cv", "chv" },
{ "cy", "cym" },
{ "da", "dan" },
{ "de", "deu" },
{ "dv", "div" },
{ "dz", "dzo" },
{ "ee", "ewe" },
{ "el", "ell" },
{ "en", "eng" },
{ "es", "spa" },
{ "et", "est" },
{ "eu", "eus" },
{ "fa", "fas" },
{ "ff", "ful" },
{ "fi", "fin" },
{ "fo", "fao" },
{ "fr", "fra" },
{ "fy", "fry" },
{ "ga", "gle" },
{ "gd", "gla" },
{ "gl", "glg" },
{ "gn", "grn" },
{ "gu", "guj" },
{ "gv", "glv" },
{ "ha", "hau" },
{ "he", "heb" },
{ "hi", "hin" },
{ "hr", "hrv" },
{ "hu", "hun" },
{ "hy", "hye" },
{ "ia", "ina" },
{ "id", "ind" },
{ "ig", "ibo" },
{ "is", "isl" },
{ "it", "ita" },
{ "iu", "iku" },
{ "ja", "jpn" },
{ "jv", "jav" },
{ "ka", "kat" },
{ "ki", "kik" },
{ "kk", "kaz" },
{ "kl", "kal" },
{ "km", "khm" },
{ "kn", "kan" },
{ "ko", "kor" },
{ "ks", "kas" },
{ "ku", "kur" },
{ "kw", "cor" },
{ "ky", "kir" },
{ "lb", "ltz" },
{ "lg", "lug" },
{ "ln", "lin" },
{ "lo", "lao" },
{ "lt", "lit" },
{ "lv", "lav" },
{ "mg", "mlg" },
{ "mi", "mri" },
{ "mi", "mri" },
{ "mk", "mkd" },
{ "ml", "mal" },
{ "mn", "mon" },
{ "mr", "mar" },
{ "ms", "msa" },
{ "mt", "mlt" },
{ "my", "mya" },
{ "nb", "nob" },
{ "ne", "nep" },
{ "nl", "nld" },
{ "nn", "nno" },
{ "no", "nor" },
{ "ny", "nya" },
{ "oc", "oci" },
{ "om", "orm" },
{ "or", "ori" },
{ "os", "oss" },
{ "pa", "pan" },
{ "pl", "pol" },
{ "ps", "pus" },
{ "pt", "por" },
{ "qu", "que" },
{ "rm", "roh" },
{ "rn", "run" },
{ "ro", "ron" },
{ "ru", "rus" },
{ "rw", "kin" },
{ "sa", "san" },
{ "sd", "snd" },
{ "se", "sme" },
{ "sg", "sag" },
{ "si", "sin" },
{ "sk", "slk" },
{ "sl", "slv" },
{ "sn", "sna" },
{ "so", "som" },
{ "sq", "sqi" },
{ "sr", "srp" },
{ "ss", "ssw" },
{ "st", "sot" },
{ "sv", "swe" },
{ "sw", "swa" },
{ "ta", "tam" },
{ "te", "tel" },
{ "tg", "tgk" },
{ "th", "tha" },
{ "ti", "tir" },
{ "tk", "tuk" },
{ "tl", "tgl" },
{ "tn", "tsn" },
{ "to", "ton" },
{ "tr", "tur" },
{ "ts", "tso" },
{ "tt", "tat" },
{ "ug", "uig" },
{ "uk", "ukr" },
{ "ur", "urd" },
{ "uz", "uzb" },
{ "ve", "ven" },
{ "vi", "vie" },
{ "wa", "wln" },
{ "wo", "wol" },
{ "xh", "xho" },
{ "yo", "yor" },
{ "zh", "zho" },
{ "zu", "zul" }
};
void kiwix::sleep(unsigned int milliseconds)
{
#ifdef _WIN32
Sleep(milliseconds);
#else
usleep(1000 * milliseconds);
#endif
}
struct XmlStringWriter: pugi::xml_writer
{
std::string result;
virtual void write(const void* data, size_t size){
result.append(static_cast<const char*>(data), size);
}
};
std::string kiwix::nodeToString(pugi::xml_node node)
{
XmlStringWriter writer;
node.print(writer, " ");
return writer.result;
}
std::string kiwix::converta2toa3(const std::string& a2code){
return codeisomapping.at(a2code);
}

335
src/tools/pathTools.cpp Normal file
View File

@@ -0,0 +1,335 @@
/*
* Copyright 2011-2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <tools/pathTools.h>
#ifdef __APPLE__
#include <limits.h>
#include <mach-o/dyld.h>
#elif _WIN32
#include <direct.h>
#include <windows.h>
#include "shlwapi.h"
#define getcwd _getcwd // stupid MSFT "deprecation" warning
#endif
#ifdef _WIN32
#else
#include <unistd.h>
#endif
#ifdef _WIN32
const std::string SEPARATOR("\\");
#else
const std::string SEPARATOR("/");
#include <unistd.h>
#endif
#include <stdlib.h>
#ifndef PATH_MAX
#define PATH_MAX 1024
#endif
bool isRelativePath(const string& path)
{
#ifdef _WIN32
return path.empty() || path.substr(1, 2) == ":\\" ? false : true;
#else
return path.empty() || path.substr(0, 1) == "/" ? false : true;
#endif
}
string computeRelativePath(const string path, const string absolutePath)
{
std::vector<std::string> pathParts = kiwix::split(path, SEPARATOR);
std::vector<std::string> absolutePathParts
= kiwix::split(absolutePath, SEPARATOR);
unsigned int commonCount = 0;
while (commonCount < pathParts.size()
&& commonCount < absolutePathParts.size()
&& pathParts[commonCount] == absolutePathParts[commonCount]) {
commonCount++;
}
string relativePath;
#ifdef _WIN32
/* On Windows you have a token more because the root is represented
by a letter */
if (commonCount == 0) {
relativePath = ".." + SEPARATOR;
}
#endif
for (unsigned int i = commonCount; i < pathParts.size(); i++) {
relativePath += ".." + SEPARATOR;
}
for (unsigned int i = commonCount; i < absolutePathParts.size(); i++) {
relativePath += absolutePathParts[i];
relativePath += i + 1 < absolutePathParts.size() ? SEPARATOR : "";
}
return relativePath;
}
/* Warning: the relative path must be with slashes */
string computeAbsolutePath(const string path, const string relativePath)
{
string absolutePath;
if (path.empty()) {
char* path = NULL;
size_t size = 0;
#ifdef _WIN32
path = _getcwd(path, size);
#else
path = getcwd(path, size);
#endif
absolutePath = string(path) + SEPARATOR;
} else {
absolutePath = path.substr(path.length() - 1, 1) == SEPARATOR
? path
: path + SEPARATOR;
}
#if _WIN32
char* cRelativePath = _strdup(relativePath.c_str());
#else
char* cRelativePath = strdup(relativePath.c_str());
#endif
char* token = strtok(cRelativePath, "/");
while (token != NULL) {
if (string(token) == "..") {
absolutePath = removeLastPathElement(absolutePath, true, false);
token = strtok(NULL, "/");
} else if (strcmp(token, ".") && strcmp(token, "")) {
absolutePath += string(token);
token = strtok(NULL, "/");
if (token != NULL) {
absolutePath += SEPARATOR;
}
} else {
token = strtok(NULL, "/");
}
}
return absolutePath;
}
string removeLastPathElement(const string path,
const bool removePreSeparator,
const bool removePostSeparator)
{
string newPath = path;
size_t offset = newPath.find_last_of(SEPARATOR);
if (removePreSeparator &&
#ifndef _WIN32
offset != newPath.find_first_of(SEPARATOR) &&
#endif
offset == newPath.length() - 1) {
newPath = newPath.substr(0, offset);
offset = newPath.find_last_of(SEPARATOR);
}
newPath = removePostSeparator ? newPath.substr(0, offset)
: newPath.substr(0, offset + 1);
return newPath;
}
string appendToDirectory(const string& directoryPath, const string& filename)
{
string newPath = directoryPath + SEPARATOR + filename;
return newPath;
}
string getLastPathElement(const string& path)
{
return path.substr(path.find_last_of(SEPARATOR) + 1);
}
unsigned int getFileSize(const string& path)
{
#ifdef _WIN32
struct _stat filestatus;
_stat(path.c_str(), &filestatus);
#else
struct stat filestatus;
stat(path.c_str(), &filestatus);
#endif
return filestatus.st_size / 1024;
}
string getFileSizeAsString(const string& path)
{
ostringstream convert;
convert << getFileSize(path);
return convert.str();
}
string getFileContent(const string& path)
{
std::ifstream f(path, std::ios::in|std::ios::ate);
std::string content;
if (f.is_open()) {
auto size = f.tellg();
content.reserve(size);
f.seekg(0, std::ios::beg);
content.assign((std::istreambuf_iterator<char>(f)),
std::istreambuf_iterator<char>());
}
return content;
}
bool fileExists(const string& path)
{
#ifdef _WIN32
return PathFileExists(path.c_str());
#else
bool flag = false;
fstream fin;
fin.open(path.c_str(), ios::in);
if (fin.is_open()) {
flag = true;
}
fin.close();
return flag;
#endif
}
bool makeDirectory(const string& path)
{
#ifdef _WIN32
int status = _mkdir(path.c_str());
#else
int status = mkdir(path.c_str(), S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
#endif
return status == 0;
}
string makeTmpDirectory()
{
#ifdef _WIN32
char cbase[MAX_PATH+1];
int base_len = GetTempPath(MAX_PATH+1, cbase);
UUID uuid;
UuidCreate(&uuid);
char* dir_name;
UuidToString(&uuid, reinterpret_cast<unsigned char**>(&dir_name));
string dir(cbase, base_len);
dir += dir_name;
_mkdir(dir.c_str());
RpcStringFree(reinterpret_cast<unsigned char**>(&dir_name));
#else
string base = "/tmp";
auto _template = base + "/kiwix-lib_XXXXXX";
char* _template_array = new char[_template.size()+1];
memcpy(_template_array, _template.c_str(), _template.size());
string dir = mkdtemp(_template_array);
delete[] _template_array;
#endif
return dir;
}
/* Try to create a link and if does not work then make a copy */
bool copyFile(const string& sourcePath, const string& destPath)
{
try {
#ifndef _WIN32
if (link(sourcePath.c_str(), destPath.c_str()) != 0) {
#endif
std::ifstream infile(sourcePath.c_str(), std::ios_base::binary);
std::ofstream outfile(destPath.c_str(), std::ios_base::binary);
outfile << infile.rdbuf();
#ifndef _WIN32
}
#endif
} catch (exception& e) {
cerr << e.what() << endl;
return false;
}
return true;
}
string getExecutablePath()
{
char binRootPath[PATH_MAX];
#ifdef _WIN32
GetModuleFileName(NULL, binRootPath, PATH_MAX);
return std::string(binRootPath);
#elif __APPLE__
uint32_t max = (uint32_t)PATH_MAX;
_NSGetExecutablePath(binRootPath, &max);
return std::string(binRootPath);
#else
ssize_t size = readlink("/proc/self/exe", binRootPath, PATH_MAX);
if (size != -1) {
return std::string(binRootPath, size);
}
#endif
return "";
}
bool writeTextFile(const string& path, const string& content)
{
std::ofstream file;
file.open(path.c_str());
file << content;
file.close();
return true;
}
string getCurrentDirectory()
{
char* a_cwd = getcwd(NULL, 0);
string s_cwd(a_cwd);
free(a_cwd);
return s_cwd;
}
string getDataDirectory()
{
#ifdef _WIN32
char* cDataDir = ::getenv("APPDATA");
#else
char* cDataDir = ::getenv("KIWIX_DATA_DIR");
#endif
std::string dataDir = cDataDir==nullptr ? "" : cDataDir;
if (!dataDir.empty())
return dataDir;
#ifdef _WIN32
cDataDir = ::getenv("USERPROFILE");
dataDir = cDataDir==nullptr ? getCurrentDirectory() : cDataDir;
#else
cDataDir = ::getenv("XDG_DATA_HOME");
dataDir = cDataDir==nullptr ? "" : cDataDir;
if (dataDir.empty()) {
cDataDir = ::getenv("HOME");
dataDir = cDataDir==nullptr ? getCurrentDirectory() : cDataDir;
dataDir = appendToDirectory(dataDir, ".local");
dataDir = appendToDirectory(dataDir, "share");
}
#endif
return appendToDirectory(dataDir, "kiwix");
}

View File

@@ -17,14 +17,15 @@
* MA 02110-1301, USA.
*/
#include <common/regexTools.h>
#include <tools/regexTools.h>
std::map<std::string, RegexMatcher*> regexCache;
std::map<std::string, icu::RegexMatcher*> regexCache;
icu::RegexMatcher* buildRegex(const std::string& regex)
{
icu::RegexMatcher* matcher;
auto itr = regexCache.find(regex);
RegexMatcher *buildRegex(const std::string &regex) {
RegexMatcher *matcher;
std::map<std::string, RegexMatcher*>::iterator itr = regexCache.find(regex);
/* Regex is in cache */
if (itr != regexCache.end()) {
matcher = itr->second;
@@ -33,8 +34,8 @@ RegexMatcher *buildRegex(const std::string &regex) {
/* Regex needs to be parsed (and cached) */
else {
UErrorCode status = U_ZERO_ERROR;
UnicodeString uregex = UnicodeString(regex.c_str());
matcher = new RegexMatcher(uregex, UREGEX_CASE_INSENSITIVE, status);
icu::UnicodeString uregex(regex.c_str());
matcher = new icu::RegexMatcher(uregex, UREGEX_CASE_INSENSITIVE, status);
regexCache[regex] = matcher;
}
@@ -42,40 +43,47 @@ RegexMatcher *buildRegex(const std::string &regex) {
}
/* todo */
void freeRegexCache() {
void freeRegexCache()
{
}
bool matchRegex(const std::string &content, const std::string &regex) {
bool matchRegex(const std::string& content, const std::string& regex)
{
ucnv_setDefaultName("UTF-8");
UnicodeString ucontent = UnicodeString(content.c_str());
RegexMatcher *matcher = buildRegex(regex);
icu::UnicodeString ucontent(content.c_str());
auto matcher = buildRegex(regex);
matcher->reset(ucontent);
return matcher->find();
}
std::string replaceRegex(const std::string &content, const std::string &replacement, const std::string &regex) {
std::string replaceRegex(const std::string& content,
const std::string& replacement,
const std::string& regex)
{
ucnv_setDefaultName("UTF-8");
UnicodeString ucontent = UnicodeString(content.c_str());
UnicodeString ureplacement = UnicodeString(replacement.c_str());
RegexMatcher *matcher = buildRegex(regex);
icu::UnicodeString ucontent(content.c_str());
icu::UnicodeString ureplacement(replacement.c_str());
auto matcher = buildRegex(regex);
matcher->reset(ucontent);
UErrorCode status = U_ZERO_ERROR;
UnicodeString uresult = matcher->replaceAll(ureplacement, status);
auto uresult = matcher->replaceAll(ureplacement, status);
std::string tmp;
uresult.toUTF8String(tmp);
return tmp;
}
std::string appendToFirstOccurence(const std::string &content, const std::string regex, const std::string &replacement) {
std::string appendToFirstOccurence(const std::string& content,
const std::string regex,
const std::string& replacement)
{
ucnv_setDefaultName("UTF-8");
UnicodeString ucontent = UnicodeString(content.c_str());
UnicodeString ureplacement = UnicodeString(replacement.c_str());
RegexMatcher *matcher = buildRegex(regex);
icu::UnicodeString ucontent(content.c_str());
icu::UnicodeString ureplacement(replacement.c_str());
auto matcher = buildRegex(regex);
matcher->reset(ucontent);
if (matcher->find()) {
UErrorCode status = U_ZERO_ERROR;
ucontent.insert(matcher->end(status), ureplacement);
ucontent.insert(matcher->end(status), ureplacement);
std::string tmp;
ucontent.toUTF8String(tmp);
return tmp;
@@ -83,4 +91,3 @@ std::string appendToFirstOccurence(const std::string &content, const std::strin
return content;
}

374
src/tools/stringTools.cpp Normal file
View File

@@ -0,0 +1,374 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <tools/stringTools.h>
#include <unicode/normlzr.h>
#include <unicode/rep.h>
#include <unicode/translit.h>
#include <unicode/ucnv.h>
#include <unicode/uniset.h>
#include <unicode/ustring.h>
/* tell ICU where to find its dat file (tables) */
void kiwix::loadICUExternalTables()
{
#ifdef __APPLE__
std::string executablePath = getExecutablePath();
std::string executableDirectory = removeLastPathElement(executablePath);
std::string datPath
= computeAbsolutePath(executableDirectory, "icudt58l.dat");
try {
u_setDataDirectory(datPath.c_str());
} catch (exception& e) {
std::cerr << e.what() << std::endl;
}
#endif
}
std::string kiwix::removeAccents(const std::string& text)
{
loadICUExternalTables();
ucnv_setDefaultName("UTF-8");
UErrorCode status = U_ZERO_ERROR;
auto removeAccentsTrans = icu::Transliterator::createInstance(
"Lower; NFD; [:M:] remove; NFC", UTRANS_FORWARD, status);
icu::UnicodeString ustring(text.c_str());
removeAccentsTrans->transliterate(ustring);
delete removeAccentsTrans;
std::string unaccentedText;
ustring.toUTF8String(unaccentedText);
return unaccentedText;
}
/* Prepare integer for display */
std::string kiwix::beautifyInteger(uint64_t number)
{
std::stringstream numberStream;
numberStream << number;
std::string numberString = numberStream.str();
signed int offset = numberString.size() - 3;
while (offset > 0) {
numberString.insert(offset, ",");
offset -= 3;
}
return numberString;
}
std::string kiwix::beautifyFileSize(uint64_t number)
{
std::stringstream ss;
ss << std::fixed << std::setprecision(2);
if (number>>30)
ss << (number/(1024.0*1024*1024)) << " GB";
else if (number>>20)
ss << (number/(1024.0*1024)) << " MB";
else if (number>>10)
ss << (number/1024.0) << " KB";
else
ss << number << " B";
return ss.str();
}
void kiwix::printStringInHexadecimal(icu::UnicodeString s)
{
std::cout << std::showbase << std::hex;
for (int i = 0; i < s.length(); i++) {
char c = (char)((s.getTerminatedBuffer())[i]);
if (c & 0x80) {
std::cout << (c & 0xffff) << " ";
} else {
std::cout << c << " ";
}
}
std::cout << std::endl;
}
void kiwix::printStringInHexadecimal(const char* s)
{
std::cout << std::showbase << std::hex;
for (char const* pc = s; *pc; ++pc) {
if (*pc & 0x80) {
std::cout << (*pc & 0xffff);
} else {
std::cout << *pc;
}
std::cout << ' ';
}
std::cout << std::endl;
}
void kiwix::stringReplacement(std::string& str,
const std::string& oldStr,
const std::string& newStr)
{
size_t pos = 0;
while ((pos = str.find(oldStr, pos)) != std::string::npos) {
str.replace(pos, oldStr.length(), newStr);
pos += newStr.length();
}
}
/* Encode string to avoid XSS attacks */
std::string kiwix::encodeDiples(const std::string& str)
{
std::string result = str;
kiwix::stringReplacement(result, "<", "&lt;");
kiwix::stringReplacement(result, ">", "&gt;");
return result;
}
/* urlEncode() based on javascript encodeURI() &
encodeURIComponent(). Mostly code from rstudio/httpuv (GPLv3) */
bool isReservedUrlChar(char c)
{
switch (c) {
case ';':
case ',':
case '/':
case '?':
case ':':
case '@':
case '&':
case '=':
case '+':
case '$':
return true;
default:
return false;
}
}
bool needsEscape(char c, bool encodeReserved)
{
if (c >= 'a' && c <= 'z')
return false;
if (c >= 'A' && c <= 'Z')
return false;
if (c >= '0' && c <= '9')
return false;
if (isReservedUrlChar(c))
return encodeReserved;
switch (c) {
case '-':
case '_':
case '.':
case '!':
case '~':
case '*':
case '\'':
case '(':
case ')':
return false;
}
return true;
}
int hexToInt(char c) {
switch (c) {
case '0': return 0;
case '1': return 1;
case '2': return 2;
case '3': return 3;
case '4': return 4;
case '5': return 5;
case '6': return 6;
case '7': return 7;
case '8': return 8;
case '9': return 9;
case 'A': case 'a': return 10;
case 'B': case 'b': return 11;
case 'C': case 'c': return 12;
case 'D': case 'd': return 13;
case 'E': case 'e': return 14;
case 'F': case 'f': return 15;
default: return -1;
}
}
std::string kiwix::urlEncode(const std::string& value, bool encodeReserved)
{
std::ostringstream os;
os << std::hex << std::uppercase;
for (std::string::const_iterator it = value.begin();
it != value.end();
it++) {
if (!needsEscape(*it, encodeReserved)) {
os << *it;
} else {
os << '%' << std::setw(2) << static_cast<unsigned int>(static_cast<unsigned char>(*it));
}
}
return os.str();
}
std::string kiwix::urlDecode(const std::string& value, bool component)
{
std::ostringstream os;
for (std::string::const_iterator it = value.begin();
it != value.end();
it++) {
// If there aren't enough characters left for this to be a
// valid escape code, just use the character and move on
if (it > value.end() - 3) {
os << *it;
continue;
}
if (*it == '%') {
char hi = *(++it);
char lo = *(++it);
int iHi = hexToInt(hi);
int iLo = hexToInt(lo);
if (iHi < 0 || iLo < 0) {
// Invalid escape sequence
os << '%' << hi << lo;
continue;
}
char c = (char)(iHi << 4 | iLo);
if (!component && isReservedUrlChar(c)) {
os << '%' << hi << lo;
} else {
os << c;
}
} else {
os << *it;
}
}
return os.str();
}
/* Split string in a token array */
std::vector<std::string> kiwix::split(const std::string& str,
const std::string& delims = " *-")
{
std::string::size_type lastPos = str.find_first_not_of(delims, 0);
std::string::size_type pos = str.find_first_of(delims, lastPos);
std::vector<std::string> tokens;
while (std::string::npos != pos || std::string::npos != lastPos) {
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = str.find_first_not_of(delims, pos);
pos = str.find_first_of(delims, lastPos);
}
return tokens;
}
std::vector<std::string> kiwix::split(const char* lhs, const char* rhs)
{
const std::string m1(lhs), m2(rhs);
return split(m1, m2);
}
std::vector<std::string> kiwix::split(const char* lhs, const std::string& rhs)
{
return split(lhs, rhs.c_str());
}
std::vector<std::string> kiwix::split(const std::string& lhs, const char* rhs)
{
return split(lhs.c_str(), rhs);
}
std::string kiwix::ucFirst(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
icu::UnicodeString unicodeWord(word.c_str());
auto unicodeFirstLetter = icu::UnicodeString(unicodeWord, 0, 1).toUpper();
unicodeWord.replace(0, 1, unicodeFirstLetter);
unicodeWord.toUTF8String(result);
return result;
}
std::string kiwix::ucAll(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
icu::UnicodeString unicodeWord(word.c_str());
unicodeWord.toUpper().toUTF8String(result);
return result;
}
std::string kiwix::lcFirst(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
icu::UnicodeString unicodeWord(word.c_str());
auto unicodeFirstLetter = icu::UnicodeString(unicodeWord, 0, 1).toLower();
unicodeWord.replace(0, 1, unicodeFirstLetter);
unicodeWord.toUTF8String(result);
return result;
}
std::string kiwix::lcAll(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
icu::UnicodeString unicodeWord(word.c_str());
unicodeWord.toLower().toUTF8String(result);
return result;
}
std::string kiwix::toTitle(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
icu::UnicodeString unicodeWord(word.c_str());
unicodeWord = unicodeWord.toTitle(0);
unicodeWord.toUTF8String(result);
return result;
}
std::string kiwix::normalize(const std::string& word)
{
return kiwix::lcAll(word);
}

View File

@@ -1,373 +0,0 @@
/* htmlparse.cc: simple HTML parser for omega indexer
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2001 Ananova Ltd
* Copyright 2002,2006,2007,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
// #include <config.h>
#include "htmlparse.h"
#include <xapian.h>
// #include "utf8convert.h"
#include <algorithm>
#include <ctype.h>
#include <cstring>
#include <stdio.h>
#include <stdlib.h>
using namespace std;
inline void
lowercase_string(string &str)
{
for (string::iterator i = str.begin(); i != str.end(); ++i) {
*i = tolower(static_cast<unsigned char>(*i));
}
}
map<string, unsigned int> HtmlParser::named_ents;
inline static bool
p_notdigit(char c)
{
return !isdigit(static_cast<unsigned char>(c));
}
inline static bool
p_notxdigit(char c)
{
return !isxdigit(static_cast<unsigned char>(c));
}
inline static bool
p_notalnum(char c)
{
return !isalnum(static_cast<unsigned char>(c));
}
inline static bool
p_notwhitespace(char c)
{
return !isspace(static_cast<unsigned char>(c));
}
inline static bool
p_nottag(char c)
{
return !isalnum(static_cast<unsigned char>(c)) &&
c != '.' && c != '-' && c != ':'; // ':' for XML namespaces.
}
inline static bool
p_whitespacegt(char c)
{
return isspace(static_cast<unsigned char>(c)) || c == '>';
}
inline static bool
p_whitespaceeqgt(char c)
{
return isspace(static_cast<unsigned char>(c)) || c == '=' || c == '>';
}
bool
HtmlParser::get_parameter(const string & param, string & value)
{
map<string, string>::const_iterator i = parameters.find(param);
if (i == parameters.end()) return false;
value = i->second;
return true;
}
HtmlParser::HtmlParser()
{
static const struct ent { const char *n; unsigned int v; } ents[] = {
#include "namedentities.h"
{ NULL, 0 }
};
if (named_ents.empty()) {
const struct ent *i = ents;
while (i->n) {
named_ents[string(i->n)] = i->v;
++i;
}
}
}
void
HtmlParser::decode_entities(string &s)
{
// We need a const_iterator version of s.end() - otherwise the
// find() and find_if() templates don't work...
string::const_iterator amp = s.begin(), s_end = s.end();
while ((amp = find(amp, s_end, '&')) != s_end) {
unsigned int val = 0;
string::const_iterator end, p = amp + 1;
if (p != s_end && *p == '#') {
p++;
if (p != s_end && (*p == 'x' || *p == 'X')) {
// hex
p++;
end = find_if(p, s_end, p_notxdigit);
sscanf(s.substr(p - s.begin(), end - p).c_str(), "%x", &val);
} else {
// number
end = find_if(p, s_end, p_notdigit);
val = atoi(s.substr(p - s.begin(), end - p).c_str());
}
} else {
end = find_if(p, s_end, p_notalnum);
string code = s.substr(p - s.begin(), end - p);
map<string, unsigned int>::const_iterator i;
i = named_ents.find(code);
if (i != named_ents.end()) val = i->second;
}
if (end < s_end && *end == ';') end++;
if (val) {
string::size_type amp_pos = amp - s.begin();
if (val < 0x80) {
s.replace(amp_pos, end - amp, 1u, char(val));
} else {
// Convert unicode value val to UTF-8.
char seq[4];
unsigned len = Xapian::Unicode::nonascii_to_utf8(val, seq);
s.replace(amp_pos, end - amp, seq, len);
}
s_end = s.end();
// We've modified the string, so the iterators are no longer
// valid...
amp = s.begin() + amp_pos + 1;
} else {
amp = end;
}
}
}
void
HtmlParser::parse_html(const string &body)
{
in_script = false;
parameters.clear();
string::const_iterator start = body.begin();
while (true) {
// Skip through until we find an HTML tag, a comment, or the end of
// document. Ignore isolated occurrences of `<' which don't start
// a tag or comment.
string::const_iterator p = start;
while (true) {
p = find(p, body.end(), '<');
if (p == body.end()) break;
unsigned char ch = *(p + 1);
// Tag, closing tag, or comment (or SGML declaration).
if ((!in_script && isalpha(ch)) || ch == '/' || ch == '!') break;
if (ch == '?') {
// PHP code or XML declaration.
// XML declaration is only valid at the start of the first line.
// FIXME: need to deal with BOMs...
if (p != body.begin() || body.size() < 20) break;
// XML declaration looks something like this:
// <?xml version="1.0" encoding="UTF-8"?>
if (p[2] != 'x' || p[3] != 'm' || p[4] != 'l') break;
if (strchr(" \t\r\n", p[5]) == NULL) break;
string::const_iterator decl_end = find(p + 6, body.end(), '?');
if (decl_end == body.end()) break;
// Default charset for XML is UTF-8.
charset = "UTF-8";
string decl(p + 6, decl_end);
size_t enc = decl.find("encoding");
if (enc == string::npos) break;
enc = decl.find_first_not_of(" \t\r\n", enc + 8);
if (enc == string::npos || enc == decl.size()) break;
if (decl[enc] != '=') break;
enc = decl.find_first_not_of(" \t\r\n", enc + 1);
if (enc == string::npos || enc == decl.size()) break;
if (decl[enc] != '"' && decl[enc] != '\'') break;
char quote = decl[enc++];
size_t enc_end = decl.find(quote, enc);
if (enc != string::npos)
charset = decl.substr(enc, enc_end - enc);
break;
}
p++;
}
// Process text up to start of tag.
if (p > start) {
string text = body.substr(start - body.begin(), p - start);
// convert_to_utf8(text, charset);
decode_entities(text);
process_text(text);
}
if (p == body.end()) break;
start = p + 1;
if (start == body.end()) break;
if (*start == '!') {
if (++start == body.end()) break;
if (++start == body.end()) break;
// comment or SGML declaration
if (*(start - 1) == '-' && *start == '-') {
++start;
string::const_iterator close = find(start, body.end(), '>');
// An unterminated comment swallows rest of document
// (like Netscape, but unlike MSIE IIRC)
if (close == body.end()) break;
p = close;
// look for -->
while (p != body.end() && (*(p - 1) != '-' || *(p - 2) != '-'))
p = find(p + 1, body.end(), '>');
if (p != body.end()) {
// Check for htdig's "ignore this bit" comments.
if (p - start == 15 && string(start, p - 2) == "htdig_noindex") {
string::size_type i;
i = body.find("<!--/htdig_noindex-->", p + 1 - body.begin());
if (i == string::npos) break;
start = body.begin() + i + 21;
continue;
}
// If we found --> skip to there.
start = p;
} else {
// Otherwise skip to the first > we found (as Netscape does).
start = close;
}
} else {
// just an SGML declaration, perhaps giving the DTD - ignore it
start = find(start - 1, body.end(), '>');
if (start == body.end()) break;
}
++start;
} else if (*start == '?') {
if (++start == body.end()) break;
// PHP - swallow until ?> or EOF
start = find(start + 1, body.end(), '>');
// look for ?>
while (start != body.end() && *(start - 1) != '?')
start = find(start + 1, body.end(), '>');
// unterminated PHP swallows rest of document (rather arbitrarily
// but it avoids polluting the database when things go wrong)
if (start != body.end()) ++start;
} else {
// opening or closing tag
int closing = 0;
if (*start == '/') {
closing = 1;
start = find_if(start + 1, body.end(), p_notwhitespace);
}
p = start;
start = find_if(start, body.end(), p_nottag);
string tag = body.substr(p - body.begin(), start - p);
// convert tagname to lowercase
lowercase_string(tag);
if (closing) {
closing_tag(tag);
if (in_script && tag == "script") in_script = false;
/* ignore any bogus parameters on closing tags */
p = find(start, body.end(), '>');
if (p == body.end()) break;
start = p + 1;
} else {
// FIXME: parse parameters lazily.
while (start < body.end() && *start != '>') {
string name, value;
p = find_if(start, body.end(), p_whitespaceeqgt);
name.assign(body, start - body.begin(), p - start);
p = find_if(p, body.end(), p_notwhitespace);
start = p;
if (start != body.end() && *start == '=') {
start = find_if(start + 1, body.end(), p_notwhitespace);
p = body.end();
int quote = *start;
if (quote == '"' || quote == '\'') {
start++;
p = find(start, body.end(), quote);
}
if (p == body.end()) {
// unquoted or no closing quote
p = find_if(start, body.end(), p_whitespacegt);
}
value.assign(body, start - body.begin(), p - start);
start = find_if(p, body.end(), p_notwhitespace);
if (!name.empty()) {
// convert parameter name to lowercase
lowercase_string(name);
// in case of multiple entries, use the first
// (as Netscape does)
parameters.insert(make_pair(name, value));
}
}
}
#if 0
cout << "<" << tag;
map<string, string>::const_iterator x;
for (x = parameters.begin(); x != parameters.end(); x++) {
cout << " " << x->first << "=\"" << x->second << "\"";
}
cout << ">\n";
#endif
opening_tag(tag);
parameters.clear();
// In <script> tags we ignore opening tags to avoid problems
// with "a<b".
if (tag == "script") in_script = true;
if (start != body.end() && *start == '>') ++start;
}
}
}
}

View File

@@ -1,49 +0,0 @@
/* htmlparse.h: simple HTML parser for omega indexer
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2002,2006,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
#ifndef OMEGA_INCLUDED_HTMLPARSE_H
#define OMEGA_INCLUDED_HTMLPARSE_H
#include <string>
#include <map>
using std::string;
using std::map;
class HtmlParser {
map<string, string> parameters;
protected:
void decode_entities(string &s);
bool in_script;
string charset;
static map<string, unsigned int> named_ents;
bool get_parameter(const string & param, string & value);
public:
virtual void process_text(const string &/*text*/) { }
virtual void opening_tag(const string &/*tag*/) { }
virtual void closing_tag(const string &/*tag*/) { }
virtual void parse_html(const string &text);
HtmlParser();
virtual ~HtmlParser() { }
};
#endif // OMEGA_INCLUDED_HTMLPARSE_H

View File

@@ -1,302 +0,0 @@
/* myhtmlparse.cc: subclass of HtmlParser for extracting text.
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2002,2003,2004,2006,2007,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
// #include <config.h>
#include "myhtmlparse.h"
// #include "utf8convert.h"
#include <ctype.h>
#include <string.h>
inline void
lowercase_string(string &str)
{
for (string::iterator i = str.begin(); i != str.end(); ++i) {
*i = tolower(static_cast<unsigned char>(*i));
}
}
void
MyHtmlParser::parse_html(const string &text, const string &charset_,
bool charset_from_meta_)
{
charset = charset_;
charset_from_meta = charset_from_meta_;
HtmlParser::parse_html(text);
}
void
MyHtmlParser::process_text(const string &text)
{
if (!text.empty() && !in_script_tag && !in_style_tag) {
string::size_type b = text.find_first_not_of(WHITESPACE);
if (b) pending_space = true;
while (b != string::npos) {
if (pending_space && !dump.empty()) dump += ' ';
string::size_type e = text.find_first_of(WHITESPACE, b);
pending_space = (e != string::npos);
if (!pending_space) {
dump.append(text.data() + b, text.size() - b);
return;
}
dump.append(text.data() + b, e - b);
b = text.find_first_not_of(WHITESPACE, e + 1);
}
}
}
void
MyHtmlParser::opening_tag(const string &tag)
{
if (tag.empty()) return;
switch (tag[0]) {
case 'a':
if (tag == "address") pending_space = true;
break;
case 'b':
if (tag == "body") {
dump.resize(0);
break;
}
if (tag == "blockquote" || tag == "br") pending_space = true;
break;
case 'c':
if (tag == "center") pending_space = true;
break;
case 'd':
if (tag == "dd" || tag == "dir" || tag == "div" || tag == "dl" ||
tag == "dt") pending_space = true;
break;
case 'e':
if (tag == "embed") pending_space = true;
break;
case 'f':
if (tag == "fieldset" || tag == "form") pending_space = true;
break;
case 'h':
// hr, and h1, ..., h6
if (tag.length() == 2 && strchr("r123456", tag[1]))
pending_space = true;
break;
case 'i':
if (tag == "iframe" || tag == "img" || tag == "isindex" ||
tag == "input") pending_space = true;
break;
case 'k':
if (tag == "keygen") pending_space = true;
break;
case 'l':
if (tag == "legend" || tag == "li" || tag == "listing")
pending_space = true;
break;
case 'm':
if (tag == "meta") {
string content;
if (get_parameter("content", content)) {
string name;
if (get_parameter("name", name)) {
lowercase_string(name);
if (name == "description") {
if (sample.empty()) {
swap(sample, content);
// convert_to_utf8(sample, charset);
decode_entities(sample);
}
} else if (name == "keywords") {
if (!keywords.empty()) keywords += ' ';
// convert_to_utf8(content, charset);
decode_entities(content);
keywords += content;
} else if (name == "robots") {
decode_entities(content);
lowercase_string(content);
if (content.find("none") != string::npos ||
content.find("noindex") != string::npos) {
indexing_allowed = false;
throw true;
}
}
break;
}
// If the current charset came from a meta tag, don't
// force reparsing again!
if (charset_from_meta) break;
string hdr;
if (get_parameter("http-equiv", hdr)) {
lowercase_string(hdr);
if (hdr == "content-type") {
lowercase_string(content);
size_t start = content.find("charset=");
if (start == string::npos) break;
start += 8;
if (start == content.size()) break;
size_t end = start;
if (content[start] != '"') {
while (end < content.size()) {
unsigned char ch = content[end];
if (ch <= 32 || ch >= 127 ||
strchr(";()<>@,:\\\"/[]?={}", ch))
break;
++end;
}
} else {
++start;
++end;
while (end < content.size()) {
unsigned char ch = content[end];
if (ch == '"') break;
if (ch == '\\') content.erase(end, 1);
++end;
}
}
string newcharset(content, start, end - start);
if (charset != newcharset) {
throw newcharset;
}
}
}
break;
}
if (charset_from_meta) break;
string newcharset;
if (get_parameter("charset", newcharset)) {
// HTML5 added: <meta charset="...">
lowercase_string(newcharset);
if (charset != newcharset) {
throw newcharset;
}
}
break;
}
if (tag == "marquee" || tag == "menu" || tag == "multicol")
pending_space = true;
break;
case 'o':
if (tag == "ol" || tag == "option") pending_space = true;
break;
case 'p':
if (tag == "p" || tag == "pre" || tag == "plaintext")
pending_space = true;
break;
case 'q':
if (tag == "q") pending_space = true;
break;
case 's':
if (tag == "style") {
in_style_tag = true;
break;
}
if (tag == "script") {
in_script_tag = true;
break;
}
if (tag == "select") pending_space = true;
break;
case 't':
if (tag == "table" || tag == "td" || tag == "textarea" ||
tag == "th") pending_space = true;
break;
case 'u':
if (tag == "ul") pending_space = true;
break;
case 'x':
if (tag == "xmp") pending_space = true;
break;
}
}
void
MyHtmlParser::closing_tag(const string &tag)
{
if (tag.empty()) return;
switch (tag[0]) {
case 'a':
if (tag == "address") pending_space = true;
break;
case 'b':
if (tag == "body") {
throw true;
}
if (tag == "blockquote" || tag == "br") pending_space = true;
break;
case 'c':
if (tag == "center") pending_space = true;
break;
case 'd':
if (tag == "dd" || tag == "dir" || tag == "div" || tag == "dl" ||
tag == "dt") pending_space = true;
break;
case 'f':
if (tag == "fieldset" || tag == "form") pending_space = true;
break;
case 'h':
// hr, and h1, ..., h6
if (tag.length() == 2 && strchr("r123456", tag[1]))
pending_space = true;
break;
case 'i':
if (tag == "iframe") pending_space = true;
break;
case 'l':
if (tag == "legend" || tag == "li" || tag == "listing")
pending_space = true;
break;
case 'm':
if (tag == "marquee" || tag == "menu") pending_space = true;
break;
case 'o':
if (tag == "ol" || tag == "option") pending_space = true;
break;
case 'p':
if (tag == "p" || tag == "pre") pending_space = true;
break;
case 'q':
if (tag == "q") pending_space = true;
break;
case 's':
if (tag == "style") {
in_style_tag = false;
break;
}
if (tag == "script") {
in_script_tag = false;
break;
}
if (tag == "select") pending_space = true;
break;
case 't':
if (tag == "title") {
if (title.empty()) swap(title, dump);
break;
}
if (tag == "table" || tag == "td" || tag == "textarea" ||
tag == "th") pending_space = true;
break;
case 'u':
if (tag == "ul") pending_space = true;
break;
case 'x':
if (tag == "xmp") pending_space = true;
break;
}
}

View File

@@ -1,65 +0,0 @@
/* myhtmlparse.h: subclass of HtmlParser for extracting text
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2002,2003,2004,2006,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
#ifndef OMEGA_INCLUDED_MYHTMLPARSE_H
#define OMEGA_INCLUDED_MYHTMLPARSE_H
#include "htmlparse.h"
// FIXME: Should we include \xa0 which is non-breaking space in iso-8859-1, but
// not in all charsets and perhaps spans of all \xa0 should become a single
// \xa0?
#define WHITESPACE " \t\n\r"
class MyHtmlParser : public HtmlParser {
public:
bool in_script_tag;
bool in_style_tag;
bool pending_space;
bool indexing_allowed;
bool charset_from_meta;
string title, sample, keywords, dump;
void process_text(const string &text);
void opening_tag(const string &tag);
void closing_tag(const string &tag);
void parse_html(const string &text, const string &charset_,
bool charset_from_meta_);
MyHtmlParser() :
in_script_tag(false),
in_style_tag(false),
pending_space(false),
indexing_allowed(true),
charset_from_meta(false) { }
void reset() {
in_script_tag = false;
in_style_tag = false;
pending_space = false;
indexing_allowed = true;
charset_from_meta = false;
title.resize(0);
sample.resize(0);
keywords.resize(0);
dump.resize(0);
}
};
#endif // OMEGA_INCLUDED_MYHTMLPARSE_H

View File

@@ -1,279 +0,0 @@
/* namedentities.h: named HTML entities.
*
* Copyright (C) 2006,2007 Olly Betts
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef OMEGA_INCLUDED_NAMEDENTITIES_H
#define OMEGA_INCLUDED_NAMEDENTITIES_H
// Names and values from: "Character entity references in HTML 4"
// http://www.w3.org/TR/html4/sgml/entities.html
{ "quot", 34 },
{ "amp", 38 },
{ "apos", 39 }, // Not in HTML 4 list but used in OpenOffice XML.
{ "lt", 60 },
{ "gt", 62 },
{ "nbsp", 160 },
{ "iexcl", 161 },
{ "cent", 162 },
{ "pound", 163 },
{ "curren", 164 },
{ "yen", 165 },
{ "brvbar", 166 },
{ "sect", 167 },
{ "uml", 168 },
{ "copy", 169 },
{ "ordf", 170 },
{ "laquo", 171 },
{ "not", 172 },
{ "shy", 173 },
{ "reg", 174 },
{ "macr", 175 },
{ "deg", 176 },
{ "plusmn", 177 },
{ "sup2", 178 },
{ "sup3", 179 },
{ "acute", 180 },
{ "micro", 181 },
{ "para", 182 },
{ "middot", 183 },
{ "cedil", 184 },
{ "sup1", 185 },
{ "ordm", 186 },
{ "raquo", 187 },
{ "frac14", 188 },
{ "frac12", 189 },
{ "frac34", 190 },
{ "iquest", 191 },
{ "Agrave", 192 },
{ "Aacute", 193 },
{ "Acirc", 194 },
{ "Atilde", 195 },
{ "Auml", 196 },
{ "Aring", 197 },
{ "AElig", 198 },
{ "Ccedil", 199 },
{ "Egrave", 200 },
{ "Eacute", 201 },
{ "Ecirc", 202 },
{ "Euml", 203 },
{ "Igrave", 204 },
{ "Iacute", 205 },
{ "Icirc", 206 },
{ "Iuml", 207 },
{ "ETH", 208 },
{ "Ntilde", 209 },
{ "Ograve", 210 },
{ "Oacute", 211 },
{ "Ocirc", 212 },
{ "Otilde", 213 },
{ "Ouml", 214 },
{ "times", 215 },
{ "Oslash", 216 },
{ "Ugrave", 217 },
{ "Uacute", 218 },
{ "Ucirc", 219 },
{ "Uuml", 220 },
{ "Yacute", 221 },
{ "THORN", 222 },
{ "szlig", 223 },
{ "agrave", 224 },
{ "aacute", 225 },
{ "acirc", 226 },
{ "atilde", 227 },
{ "auml", 228 },
{ "aring", 229 },
{ "aelig", 230 },
{ "ccedil", 231 },
{ "egrave", 232 },
{ "eacute", 233 },
{ "ecirc", 234 },
{ "euml", 235 },
{ "igrave", 236 },
{ "iacute", 237 },
{ "icirc", 238 },
{ "iuml", 239 },
{ "eth", 240 },
{ "ntilde", 241 },
{ "ograve", 242 },
{ "oacute", 243 },
{ "ocirc", 244 },
{ "otilde", 245 },
{ "ouml", 246 },
{ "divide", 247 },
{ "oslash", 248 },
{ "ugrave", 249 },
{ "uacute", 250 },
{ "ucirc", 251 },
{ "uuml", 252 },
{ "yacute", 253 },
{ "thorn", 254 },
{ "yuml", 255 },
{ "OElig", 338 },
{ "oelig", 339 },
{ "Scaron", 352 },
{ "scaron", 353 },
{ "Yuml", 376 },
{ "fnof", 402 },
{ "circ", 710 },
{ "tilde", 732 },
{ "Alpha", 913 },
{ "Beta", 914 },
{ "Gamma", 915 },
{ "Delta", 916 },
{ "Epsilon", 917 },
{ "Zeta", 918 },
{ "Eta", 919 },
{ "Theta", 920 },
{ "Iota", 921 },
{ "Kappa", 922 },
{ "Lambda", 923 },
{ "Mu", 924 },
{ "Nu", 925 },
{ "Xi", 926 },
{ "Omicron", 927 },
{ "Pi", 928 },
{ "Rho", 929 },
{ "Sigma", 931 },
{ "Tau", 932 },
{ "Upsilon", 933 },
{ "Phi", 934 },
{ "Chi", 935 },
{ "Psi", 936 },
{ "Omega", 937 },
{ "alpha", 945 },
{ "beta", 946 },
{ "gamma", 947 },
{ "delta", 948 },
{ "epsilon", 949 },
{ "zeta", 950 },
{ "eta", 951 },
{ "theta", 952 },
{ "iota", 953 },
{ "kappa", 954 },
{ "lambda", 955 },
{ "mu", 956 },
{ "nu", 957 },
{ "xi", 958 },
{ "omicron", 959 },
{ "pi", 960 },
{ "rho", 961 },
{ "sigmaf", 962 },
{ "sigma", 963 },
{ "tau", 964 },
{ "upsilon", 965 },
{ "phi", 966 },
{ "chi", 967 },
{ "psi", 968 },
{ "omega", 969 },
{ "thetasym", 977 },
{ "upsih", 978 },
{ "piv", 982 },
{ "ensp", 8194 },
{ "emsp", 8195 },
{ "thinsp", 8201 },
{ "zwnj", 8204 },
{ "zwj", 8205 },
{ "lrm", 8206 },
{ "rlm", 8207 },
{ "ndash", 8211 },
{ "mdash", 8212 },
{ "lsquo", 8216 },
{ "rsquo", 8217 },
{ "sbquo", 8218 },
{ "ldquo", 8220 },
{ "rdquo", 8221 },
{ "bdquo", 8222 },
{ "dagger", 8224 },
{ "Dagger", 8225 },
{ "bull", 8226 },
{ "hellip", 8230 },
{ "permil", 8240 },
{ "prime", 8242 },
{ "Prime", 8243 },
{ "lsaquo", 8249 },
{ "rsaquo", 8250 },
{ "oline", 8254 },
{ "frasl", 8260 },
{ "euro", 8364 },
{ "image", 8465 },
{ "weierp", 8472 },
{ "real", 8476 },
{ "trade", 8482 },
{ "alefsym", 8501 },
{ "larr", 8592 },
{ "uarr", 8593 },
{ "rarr", 8594 },
{ "darr", 8595 },
{ "harr", 8596 },
{ "crarr", 8629 },
{ "lArr", 8656 },
{ "uArr", 8657 },
{ "rArr", 8658 },
{ "dArr", 8659 },
{ "hArr", 8660 },
{ "forall", 8704 },
{ "part", 8706 },
{ "exist", 8707 },
{ "empty", 8709 },
{ "nabla", 8711 },
{ "isin", 8712 },
{ "notin", 8713 },
{ "ni", 8715 },
{ "prod", 8719 },
{ "sum", 8721 },
{ "minus", 8722 },
{ "lowast", 8727 },
{ "radic", 8730 },
{ "prop", 8733 },
{ "infin", 8734 },
{ "ang", 8736 },
{ "and", 8743 },
{ "or", 8744 },
{ "cap", 8745 },
{ "cup", 8746 },
{ "int", 8747 },
{ "there4", 8756 },
{ "sim", 8764 },
{ "cong", 8773 },
{ "asymp", 8776 },
{ "ne", 8800 },
{ "equiv", 8801 },
{ "le", 8804 },
{ "ge", 8805 },
{ "sub", 8834 },
{ "sup", 8835 },
{ "nsub", 8836 },
{ "sube", 8838 },
{ "supe", 8839 },
{ "oplus", 8853 },
{ "otimes", 8855 },
{ "perp", 8869 },
{ "sdot", 8901 },
{ "lceil", 8968 },
{ "rceil", 8969 },
{ "lfloor", 8970 },
{ "rfloor", 8971 },
{ "lang", 9001 },
{ "rang", 9002 },
{ "loz", 9674 },
{ "spades", 9824 },
{ "clubs", 9827 },
{ "hearts", 9829 },
{ "diams", 9830 },
#endif // OMEGA_INCLUDED_NAMEDENTITIES_H

View File

@@ -1,111 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "xapianIndexer.h"
namespace kiwix {
/* Constructor */
XapianIndexer::XapianIndexer() {
/*
stemmer(Xapian::Stem("french")) {
this->indexer.set_stemmer(this->stemmer);
*/
}
void XapianIndexer::indexingPrelude(const string indexPath) {
this->writableDatabase = Xapian::WritableDatabase(indexPath+".tmp", Xapian::DB_CREATE_OR_OVERWRITE | Xapian::DB_BACKEND_GLASS);
this->writableDatabase.begin_transaction(true);
/* Insert the stopwords */
if (!this->stopWords.empty()) {
std::vector<std::string>::iterator it = this->stopWords.begin();
for( ; it != this->stopWords.end(); ++it) {
this->stopper.add(*it);
}
this->indexer.set_stopper(&(this->stopper));
}
}
void XapianIndexer::index(const string &url,
const string &title,
const string &unaccentedTitle,
const string &keywords,
const string &content,
const string &snippet,
const string &size,
const string &wordCount) {
/* Put the data in the document */
Xapian::Document currentDocument;
currentDocument.clear_values();
currentDocument.add_value(0, title);
currentDocument.add_value(1, snippet);
currentDocument.add_value(2, size);
currentDocument.add_value(3, wordCount);
currentDocument.set_data(url);
indexer.set_document(currentDocument);
/* Index the title */
if (!unaccentedTitle.empty()) {
this->indexer.index_text_without_positions(unaccentedTitle, this->getTitleBoostFactor(content.size()));
}
/* Index the keywords */
if (!keywords.empty()) {
this->indexer.index_text_without_positions(keywords, keywordsBoostFactor);
}
/* Index the content */
if (!content.empty()) {
this->indexer.index_text_without_positions(content);
}
/* add to the database */
this->writableDatabase.add_document(currentDocument);
}
void XapianIndexer::flush() {
this->writableDatabase.commit_transaction();
this->writableDatabase.begin_transaction(true);
}
void XapianIndexer::indexingPostlude(const string indexPath) {
this->flush();
this->writableDatabase.commit_transaction();
#ifdef _WIN32
this->writableDatabase.close();
#endif
/* Compacting the index */
Xapian::Compactor compactor;
try {
Xapian::Database src;
src.add_database(Xapian::Database(indexPath+".tmp"));
src.compact(indexPath, Xapian::Compactor::FULL | Xapian::DBCOMPACT_SINGLE_FILE, 0, compactor);
} catch (const Xapian::Error &error) {
cerr << indexPath << ": " << error.get_description() << endl;
exit(1);
} catch (const char * msg) {
cerr << indexPath << ": " << msg << endl;
exit(1);
}
}
}

View File

@@ -1,99 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "xapianSearcher.h"
#include <zim/zim.h>
#include <zim/file.h>
#include <zim/article.h>
#include <zim/error.h>
#include <sys/types.h>
#include <unistd.h>
namespace kiwix {
/* Constructor */
XapianSearcher::XapianSearcher(const string &xapianDirectoryPath)
: Searcher(),
stemmer(Xapian::Stem("english")) {
this->openIndex(xapianDirectoryPath);
}
/* Open Xapian readable database */
void XapianSearcher::openIndex(const string &directoryPath) {
try
{
zim::File zimFile = zim::File(directoryPath);
zim::Article xapianArticle = zimFile.getArticle('Z', "/fulltextIndex/xapian");
if (!xapianArticle.good())
throw NoXapianIndexInZim();
zim::offset_type dbOffset = xapianArticle.getOffset();
int databasefd = open(directoryPath.c_str(), O_RDONLY);
lseek(databasefd, dbOffset, SEEK_SET);
this->readableDatabase = Xapian::Database(databasefd);
} catch (...) {
this->readableDatabase = Xapian::Database(directoryPath);
}
}
/* Close Xapian writable database */
void XapianSearcher::closeIndex() {
return;
}
/* Search strings in the database */
void XapianSearcher::searchInIndex(string &search, const unsigned int resultStart,
const unsigned int resultEnd, const bool verbose) {
/* Create the query */
Xapian::QueryParser queryParser;
Xapian::Query query = queryParser.parse_query(search);
/* Create the enquire object */
Xapian::Enquire enquire(this->readableDatabase);
enquire.set_query(query);
/* Get the results */
Xapian::MSet matches = enquire.get_mset(resultStart, resultEnd - resultStart);
Xapian::MSetIterator i;
for (i = matches.begin(); i != matches.end(); ++i) {
Xapian::Document doc = i.get_document();
Result result;
result.url = doc.get_data();
result.title = doc.get_value(0);
result.snippet = doc.get_value(1);
result.size = (doc.get_value(2).empty() == true ? -1 : atoi(doc.get_value(2).c_str()));
result.wordCount = (doc.get_value(3).empty() == true ? -1 : atoi(doc.get_value(3).c_str()));
result.score = i.get_percent();
this->results.push_back(result);
if (verbose) {
std::cout << "Document ID " << *i << " \t";
std::cout << i.get_percent() << "% ";
std::cout << "\t[" << doc.get_data() << "] - " << doc.get_value(0) << std::endl;
}
}
/* Update the global resultCount value*/
this->estimatedResultCount = matches.get_matches_estimated();
return;
}
}

264
src/xmlrpc.h Normal file
View File

@@ -0,0 +1,264 @@
#ifndef KIWIX_XMLRPC_H_
#define KIWIX_XMLRPC_H_
#include <tools/otherTools.h>
namespace kiwix {
class InvalidRPCNode : public std::runtime_error {
public:
InvalidRPCNode(const std::string& msg) : std::runtime_error(msg) {};
};
class Struct;
class Array;
class Value {
pugi::xml_node m_value;
public:
Value(pugi::xml_node value) : m_value(value) { }
void set(int value) {
if (!m_value.child("int"))
m_value.append_child("int");
m_value.child("int").text().set(value);
};
int getAsI() const {
if (!m_value.child("int"))
throw InvalidRPCNode("Type Error");
return m_value.child("int").text().as_int();
}
void set(bool value) {
if (!m_value.child("boolean"))
m_value.append_child("boolean");
m_value.child("boolean").text().set(value);
};
int getAsB() const {
if (!m_value.child("boolean"))
throw InvalidRPCNode("Type Error");
return m_value.child("boolean").text().as_bool();
}
void set(const std::string& value) {
if (!m_value.child("string"))
m_value.append_child("string");
m_value.child("string").text().set(value.c_str());
};
std::string getAsS() const {
if (!m_value.child("string"))
throw InvalidRPCNode("Type Error");
return m_value.child("string").text().as_string();
}
void set(double value) {
if (!m_value.child("double"))
m_value.append_child("double");
m_value.child("double").text().set(value);
};
double getAsD() const {
if (!m_value.child("double"))
throw InvalidRPCNode("Type Error");
return m_value.child("double").text().as_double();
}
inline Struct getStruct();
inline Array getArray();
};
class Array {
pugi::xml_node m_array;
public:
Array(pugi::xml_node array) : m_array(array) {
if (!m_array.child("data"))
m_array.append_child("data");
}
Value addValue() {
auto value = m_array.child("data").append_child("value");
return Value(value);
}
Value getValue(int index) const {
auto value = m_array.child("data").child("value");
while(index && value) {
value = value.next_sibling();
index--;
}
if (0==index) {
return Value(value);
} else {
throw InvalidRPCNode("Index error");
}
}
};
class Member {
pugi::xml_node m_member;
public:
Member(pugi::xml_node member) : m_member(member) { }
Value getValue() const {
return Value(m_member.child("value"));
};
};
class Struct {
pugi::xml_node m_struct;
public:
Struct(pugi::xml_node _struct) : m_struct(_struct) { }
Member getMember(const std::string& name) const {
for(auto member=m_struct.first_child(); member; member=member.next_sibling()) {
std::string member_name = member.child("name").text().get();
if (member_name == name) {
return Member(member);
}
}
throw InvalidRPCNode("Key Error");
}
Member addMember(const std::string& name) {
auto member = m_struct.append_child("member");
member.append_child("name").text().set(name.c_str());
member.append_child("value");
return Member(member);
}
};
class Fault : public Struct {
public:
Fault(pugi::xml_node fault) : Struct(fault) {};
int getFaultCode() const {
return getMember("faultCode").getValue().getAsI();
}
std::string getFaultString() const {
return getMember("faultString").getValue().getAsS();
}
};
Struct Value::getStruct() {
if (!m_value.child("struct"))
m_value.append_child("struct");
return Struct(m_value.child("struct"));
}
Array Value::getArray() {
if (!m_value.child("array"))
m_value.append_child("array");
return Array(m_value.child("array"));
}
class Param {
pugi::xml_node m_param;
public:
Param(pugi::xml_node param) : m_param(param) {
if (!m_param.child("value"))
m_param.append_child("value");
};
Value getValue() const {
return Value(m_param.child("value"));
};
};
class Params {
pugi::xml_node m_params;
public:
Params(pugi::xml_node params) : m_params(params) {};
Param addParam() {
auto param = m_params.append_child("param");
return Param(param);
}
Param getParam(int index) const {
auto param = m_params.child("param");
while(index && param) {
param = param.next_sibling();
index--;
}
if (0==index) {
return Param(param);
} else {
throw InvalidRPCNode("Index Error");
}
}
};
class MethodCall {
pugi::xml_document m_doc;
public:
MethodCall(const std::string& methodName, const std::string& secret) {
auto mCall = m_doc.append_child("methodCall");
mCall.append_child("methodName").text().set(methodName.c_str());
mCall.append_child("params");
if (!secret.empty()) {
getParams().addParam().getValue().set(secret);
}
}
Params getParams() const {
return Params(m_doc.child("methodCall").child("params"));
}
Value newParamValue() {
return getParams().addParam().getValue();
}
std::string toString() const {
return nodeToString(m_doc);
}
};
class MethodResponse {
pugi::xml_document m_doc;
public:
MethodResponse(const std::string& content) {
m_doc.load_buffer(content.c_str(), content.size());
}
Params getParams() const {
auto params = m_doc.child("methodResponse").child("params");
if (!params)
throw InvalidRPCNode("No params");
return Params(params);
}
Value getParamValue(int index) const {
return getParams().getParam(index).getValue();
}
bool isFault() const {
return (!!m_doc.child("methodResponse").child("fault"));
}
Fault getFault() const {
auto fault = m_doc.child("methodResponse").child("fault");
if (!fault)
throw InvalidRPCNode("No fault");
return Fault(fault.child("value").child("struct"));
}
};
};
#endif // KIWIX_XMLRPC_H_

View File

@@ -1,7 +1,11 @@
lib_resources = custom_target('resources',
input: 'resources_list.txt',
output: ['kiwixlib-resources.cpp', 'kiwixlib-resources.h'],
command:[res_compiler, '--cxxfile', '@OUTPUT0@', '--hfile', '@OUTPUT1@', '@INPUT@']
)
input: 'resources_list.txt',
output: ['kiwixlib-resources.cpp', 'kiwixlib-resources.h'],
command:[res_compiler,
'--cxxfile', '@OUTPUT0@',
'--hfile', '@OUTPUT1@',
'--source_dir', '@OUTDIR@',
'@INPUT@'],
depend_files: files('search_result.tmpl')
)

View File

@@ -1,4 +1 @@
results.ct2
stopwords/en
stopwords/he
stopwords/fra
search_result.tmpl

View File

Binary file not shown.

158
static/search_result.tmpl Normal file
View File

@@ -0,0 +1,158 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="content-type" />
<style type="text/css">
body{
color: #00000;
font: small/normal Arial,Helvetica,Sans-Serif;
margin-top: 0.5em;
font-size: 90%;
}
a{
color: #04c;
}
a:visited {
color: #639
}
a:hover {
text-decoration: underline
}
.header {
font-size: 120%;
}
ul {
margin:0;
padding:0
}
.results {
font-size: 110%;
}
.results li {
list-style-type:none;
margin-top: 0.5em;
}
.results a {
font-size: 110%;
text-decoration: underline
}
cite {
font-style:normal;
word-wrap:break-word;
display: block;
font-size: 100%;
}
.informations {
color: #388222;
font-size: 100%;
}
.footer {
padding: 0;
margin-top: 1em;
width: 100%;
float: left
}
.footer a, .footer span {
display: block;
padding: .3em .7em;
margin: 0 .38em 0 0;
text-align:center;
text-decoration: none;
}
.footer a:hover {
background: #ededed;
}
.footer ul, .footer li {
list-style:none;
margin: 0;
padding: 0;
}
.footer li {
float: left;
}
.selected {
background: #ededed;
}
</style>
<title>Search: {{searchPattern}}</title>
</head>
<body bgcolor="white">
<div class="header">
{{#hasResult}}
Results
<b>
{{resultStart}}-{{resultEnd}}
</b> of <b>
{{count}}
</b> for <b>
{{searchPattern}}
</b>
{{/hasResult}}
{{^hasResult}}
No results were found for <b>{{searchPattern}}</b>
{{/hasResult}}
</div>
<div class="results">
<ul>
{{#results}}
<li>
<a href="{{protocolPrefix}}{{resultContentId}}/{{url}}">
{{title}}
</a>
{{#snippet}}
<cite>{{>snippet}}...</cite>
{{/snippet}}
{{#wordCount}}
<div class="informations">{{wordCount}} words</div>
{{/wordCount}}
</li>
{{/results}}
</ul>
</div>
<div class="footer">
<ul>
{{#resultLastPageStart}}
<li>
<a href="{{searchProtocolPrefix}}pattern={{searchPatternEncoded}}{{#contentId}}&content={{.}}{{/contentId}}&start=0&end={{resultRange}}">
</a>
</li>
{{/resultLastPageStart}}
{{#pages}}
<li>
<a {{#selected}}class="selected"{{/selected}}
href="{{searchProtocolPrefix}}pattern={{searchPatternEncoded}}{{#contentId}}&content={{.}}{{/contentId}}&start={{start}}&end={{end}}">
{{label}}
</a>
</li>
{{/pages}}
{{#resultLastPageStart}}
<li>
<a href="{{searchProtocolPrefix}}pattern={{searchPatternEncoded}}{{#contentId}}&content={{.}}{{/contentId}}&start={{resultLastPageStart}}&end={{lastResult}}">
</a>
</li>
{{/resultLastPageStart}}
</ul>
</div>
</body>
</html>

View File

@@ -1,671 +0,0 @@
a
able
about
above
abst
accordance
according
accordingly
across
act
actually
added
adj
adopted
affected
affecting
affects
after
afterwards
again
against
ah
all
almost
alone
along
already
also
although
always
am
among
amongst
an
and
announce
another
any
anybody
anyhow
anymore
anyone
anything
anyway
anyways
anywhere
apparently
approximately
are
aren
arent
arise
around
as
aside
ask
asking
at
auth
available
away
awfully
b
back
be
became
because
become
becomes
becoming
been
before
beforehand
begin
beginning
beginnings
begins
behind
being
believe
below
beside
besides
between
beyond
biol
both
brief
briefly
but
by
c
ca
came
can
cannot
can't
cause
causes
certain
certainly
co
com
come
comes
contain
containing
contains
could
couldnt
d
date
did
didn't
different
do
does
doesn't
doing
done
don't
down
downwards
due
during
e
each
ed
edu
effect
eg
eight
eighty
either
else
elsewhere
end
ending
enough
especially
et
et-al
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
except
f
far
few
ff
fifth
first
five
fix
followed
following
follows
for
former
formerly
forth
found
four
from
further
furthermore
g
gave
get
gets
getting
give
given
gives
giving
go
goes
gone
got
gotten
h
had
happens
hardly
has
hasn't
have
haven't
having
he
hed
hence
her
here
hereafter
hereby
herein
heres
hereupon
hers
herself
hes
hi
hid
him
himself
his
hither
home
how
howbeit
however
hundred
i
id
ie
if
i'll
im
immediate
immediately
importance
important
in
inc
indeed
index
information
instead
into
invention
inward
is
isn't
it
itd
it'll
its
itself
i've
j
just
k
keep
keeps
kept
keys
kg
km
know
known
knows
l
largely
last
lately
later
latter
latterly
least
less
lest
let
lets
like
liked
likely
line
little
'll
look
looking
looks
ltd
m
made
mainly
make
makes
many
may
maybe
me
mean
means
meantime
meanwhile
merely
mg
might
million
miss
ml
more
moreover
most
mostly
mr
mrs
much
mug
must
my
myself
n
na
name
namely
nay
nd
near
nearly
necessarily
necessary
need
needs
neither
never
nevertheless
new
next
nine
ninety
no
nobody
non
none
nonetheless
noone
nor
normally
nos
not
noted
nothing
now
nowhere
o
obtain
obtained
obviously
of
off
often
oh
ok
okay
old
omitted
on
once
one
ones
only
onto
or
ord
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
overall
owing
own
p
page
pages
part
particular
particularly
past
per
perhaps
placed
please
plus
poorly
possible
possibly
potentially
pp
predominantly
present
previously
primarily
probably
promptly
proud
provides
put
q
que
quickly
quite
qv
r
ran
rather
rd
re
readily
really
recent
recently
ref
refs
regarding
regardless
regards
related
relatively
research
respectively
resulted
resulting
results
right
run
s
said
same
saw
say
saying
says
sec
section
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sent
seven
several
shall
she
shed
she'll
shes
should
shouldn't
show
showed
shown
showns
shows
significant
significantly
similar
similarly
since
six
slightly
so
some
somebody
somehow
someone
somethan
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specifically
specified
specify
specifying
state
states
still
stop
strongly
sub
substantially
successfully
such
sufficiently
suggest
sup
sure
t
take
taken
taking
tell
tends
th
than
thank
thanks
thanx
that
that'll
thats
that've
the
their
theirs
them
themselves
then
thence
there
thereafter
thereby
thered
therefore
therein
there'll
thereof
therere
theres
thereto
thereupon
there've
these
they
theyd
they'll
theyre
they've
think
this
those
thou
though
thoughh
thousand
throug
through
throughout
thru
thus
til
tip
to
together
too
took
toward
towards
tried
tries
truly
try
trying
ts
twice
two
u
un
under
unfortunately
unless
unlike
unlikely
until
unto
up
upon
ups
us
use
used
useful
usefully
usefulness
uses
using
usually
v
value
various
've
very
via
viz
vol
vols
vs
w
want
wants
was
wasn't
way
we
wed
welcome
we'll
went
were
weren't
we've
what
whatever
what'll
whats
when
whence
whenever
where
whereafter
whereas
whereby
wherein
wheres
whereupon
wherever
whether
which
while
whim
whither
who
whod
whoever
whole
who'll
whom
whomever
whos
whose
why
widely
willing
wish
with
within
without
won't
words
world
would
wouldn't
www
x
y
yes
yet
you
youd
you'll
your
youre
yours
yourself
yourselves
you've
z
zero

View File

@@ -1,124 +0,0 @@
alors
au
aucuns
aussi
autre
avant
avec
avoir
bon
car
ce
cela
ces
ceux
chaque
ci
comme
comment
dans
des
du
dedans
dehors
depuis
deux
devrait
doit
donc
dos
droite
début
elle
elles
en
encore
essai
est
et
eu
fait
faites
fois
font
force
haut
hors
ici
il
ils
je
la
le
les
leur
ma
maintenant
mais
mes
mine
moins
mon
mot
même
ni
nommés
notre
nous
nouveaux
ou
par
parce
parole
pas
personnes
peut
peu
pièce
plupart
pour
pourquoi
quand
que
quel
quelle
quelles
quels
qui
sa
sans
ses
seulement
si
sien
son
sont
sous
soyez
sur
ta
tandis
tellement
tels
tes
ton
tous
tout
trop
très
tu
valeur
voie
voient
vont
votre
vous
vu
ça
étaient
état
étions
été
être

View File

@@ -1,87 +0,0 @@
של
את
על
לא
כי
עם
הוא
גם
ב
זה
היא
כל
יותר
או
אבל
בין
היה
אם
מיליון
יש
כך
אני
הם
דולר
אמר
עד
לאחר
ישראל
רק
שקל
כדי
מה
לפני
אחד
החברה
כמו
זאת
היום
אך
ל
ה
כ
אין
אתמול
שלא
כבר
עוד
לו
זו
אל
בן
אותו
שני
בית
ידי
כמה
ביותר
ולא
הממשלה
אחרי
חברת
היתה
שלו
היו
נגד
בכל
אביב
ראש
בישראל
לי
שנים
פי
בו
מ
מאוד
להיות
שהוא
מי
אלף
אלא
אף
אחר
הזה
אחת
בבית
אלה
אנחנו

2
subprojects/gtest/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
# Ignore CI build directory
build/

Some files were not shown because too many files have changed in this diff Show More