Compare commits

...

109 Commits
2.0.2 ... 4.0.0

Author SHA1 Message Date
Matthieu Gautier
ab94ac0ee8 Merge pull request #195 from kiwix/new_version
New version
2019-01-29 11:38:06 +01:00
Matthieu Gautier
1ac6d4cb20 New version 4.0.0 2019-01-29 11:29:59 +01:00
Matthieu Gautier
26b61a2d09 We do not need the exact version 0.43.0 for meson. 2019-01-29 11:29:59 +01:00
Matthieu Gautier
aab88c9022 Merge pull request #194 from kiwix/common2tools
[API break] Move all the tools in the tools directory instead of common.
2019-01-23 16:55:09 +01:00
Matthieu Gautier
af7689e3e8 [API break] Move all the tools in the tools directory instead of common.
The `common` name is from the time where kiwix was only one repository
for all the project (android, desktop, server...).

Now we have split the repositories and kiwix-lib is the "common" repo,
the "common" directory is somehow nonsense.
2019-01-23 15:31:38 +01:00
Matthieu Gautier
ecb2a80baf Merge pull request #193 from kiwix/fix_uninitalized
Correctly initialize retVal.
2019-01-23 12:05:34 +01:00
Matthieu Gautier
b996a2877c Correctly initialize retVal. 2019-01-23 11:51:30 +01:00
Matthieu Gautier
a98594c084 Merge pull request #192 from kiwix/workaround_depend_files
Workaround a bug in meson 0.43.0 about custom_target's depend_files option.
2019-01-10 16:08:24 +01:00
Matthieu Gautier
b9696dceac Workaround a bug in meson 0.43.0 about custom_target's depend_files option.
There is a bug in meson 0.43.0 about the option depend_files
(mesonbuild/meson#2633)

By using the `files('search_result.tmpl')`, we workaround the bug and
have everything working whatever the meson version is.
2019-01-10 15:59:42 +01:00
Matthieu Gautier
550b6df414 Merge pull request #191 from kiwix/mustache_template
Move the templating system to mustache instead of ctpp2.
2019-01-10 11:45:56 +01:00
Matthieu Gautier
be498c3b16 Make the string Tools functions available in android. 2019-01-09 18:29:20 +01:00
Matthieu Gautier
92c9a47a0d Move the templating system to mustache instead of ctpp2.
Mustache templating system is a bit simpler than ctpp2 and ctpp2 is no
more maintained (see #189).
We are moving to the kainjow's Mustache project
(https://github.com/kainjow/Mustache).

It simplify a lot our system has it is header only and we don't have to
precompile the template.

Fix #21
2019-01-09 18:28:48 +01:00
Matthieu Gautier
c73ac9f2cd Merge pull request #190 from kiwix/no_external_index
Remove support for external index.
2019-01-08 16:13:54 +01:00
Matthieu Gautier
5159d985c6 Remove support for external index.
This feature is considered obsolete for a while.
In fact, it was already not supported since June 2018 as we were compiling
xapian without the chert backend support.

Assume that we don't support it and remove it from the code.
See kiwix/kiwix-tools#245

This is a API break. library.xml files will still work but the indexPath
and indexType will be dropped silently from the file.
2019-01-07 16:47:08 +01:00
Matthieu Gautier
cb98f11ddc Merge pull request #188 from kiwix/create_directory
Create the datadirectory to not fail to write the aria2 session file.
2018-12-14 16:44:46 +01:00
Matthieu Gautier
29046bfc05 Create the datadirectory to not fail to write the aria2 session file.
Fix kiwix/kiwix-desktop#69
2018-12-14 15:24:13 +01:00
Matthieu Gautier
dd5dd14ec9 Merge pull request #187 from kiwix/new_version
new version 3.1.1
2018-12-13 18:05:35 +01:00
Matthieu Gautier
49a606a043 new version 3.1.1 2018-12-13 17:29:21 +01:00
Matthieu Gautier
b641f7b116 Merge pull request #186 from kiwix/fix_library
Fix library
2018-12-11 17:08:30 +01:00
Matthieu Gautier
e6d7ba06fb Convert the standard opds date to our format (YYYY-MM-DD) 2018-12-11 17:02:02 +01:00
Matthieu Gautier
0f812c6584 The update entry of the book should be the date of the book, not the feed. 2018-12-11 17:01:33 +01:00
Matthieu Gautier
716c87dd20 Remove duplicate language attribute in the libxml dumper.
Silly copy/paste.
2018-12-11 17:00:56 +01:00
Matthieu Gautier
090c4f5970 Merge pull request #185 from kiwix/new_version
new version 3.1.0
2018-12-03 11:21:15 +01:00
Matthieu Gautier
cf28af4439 new version 3.1.0 2018-12-02 15:56:00 +01:00
Matthieu Gautier
6777bfeecf Merge pull request #184 from kiwix/bookmarks
Bookmarks
2018-12-02 15:52:52 +01:00
Matthieu Gautier
12498e2cfe Add bookmarks support.
The library now contains (simple) methods to handle bookmarks.
The bookmark are stored in a separate xml file.

Bookmark are mainly a couple (`zimId`, `articleUrl`).
However, in the xml we store a bit more data :
- The article's title (for display)
- The book's title, lang and date (for potential update of zim files)
2018-12-02 15:47:29 +01:00
Matthieu Gautier
b5ce60a627 Move the dump of the library into library.xml in a specific class.
The same way the dump into a opds feed is in a specific class.
2018-11-28 12:09:28 +01:00
Matthieu Gautier
c9cc58973c Merge pull request #183 from kiwix/book_faviconUrl
Add Book::getFaviconUrl
2018-11-15 17:53:24 +01:00
Matthieu Gautier
062124a2a0 Add Book::getFaviconUrl 2018-11-15 17:47:41 +01:00
Matthieu Gautier
622b22b2cc Merge pull request #180 from kiwix/new_version
New version 3.0.3
2018-11-12 18:05:43 +01:00
Matthieu Gautier
2821b9e06a New version 3.0.3 2018-11-12 16:48:35 +01:00
Matthieu Gautier
ac49776792 Merge pull request #182 from kiwix/fix_aria2c_launch
Wait a bit more between attempts to connect to aria2c rpc.
2018-11-12 16:28:21 +01:00
Matthieu Gautier
94a053e821 Wait a bit more between attempts to connect to aria2c rpc. 2018-11-12 16:14:21 +01:00
Matthieu Gautier
84e831eae9 Merge pull request #181 from kiwix/fix_aria2c_launch
Correctly run aria2c when packaged with kiwix-desktop in appimage.
2018-11-12 14:41:57 +01:00
Matthieu Gautier
4b9692bbd5 Correctly run aria2c when packaged with kiwix-desktop in appimage.
By default, we are searching in the PATH env var.
However, with an appImage, the executable directory is not in the PATH,
so we have to use an absolute path if we can.

If we cannot find the aria2c executable in the executable directory let's
try to use the system one.
2018-11-12 14:35:17 +01:00
Matthieu Gautier
be6f96adc0 Merge pull request #179 from kiwix/fix_library
Fix library
2018-11-12 12:21:54 +01:00
Matthieu Gautier
4b31842c4a Correctly convert filesize from Kbyte to byte.
`reader.getFileSize()` return the size of the zim in Kbyte in a
`unsigned int` (32 bits). This is ok as it would overflow if the size
of the size is greater than 4294967295 kbytes (so ~4Tbytes).

However, we need to convert the return size into a unsigned 64 bits integer
else, when converting to bytes, we will overflow at 4Gbytes.
Even in `m_size` is a uint64_t.
2018-11-12 12:16:05 +01:00
Matthieu Gautier
cf1cfe774e Correctly check for ArticleCount and MediaCount before writing them. 2018-11-12 10:58:10 +01:00
Matthieu Gautier
82b38b96e2 Merge pull request #178 from kiwix/fix_en_mapping
Fix en mapping
2018-11-12 10:57:18 +01:00
Matthieu Gautier
8c4b9fbe95 Add missing en->eng mapping to codeisomapping.
The most common used language was missing :/

Fix kiwix/kiwix-desktop#51
2018-11-12 10:36:45 +01:00
Matthieu Gautier
ab63cb2fb8 Sort codeisomapping alphabetically.
This is only code formating, no real change.
2018-11-12 10:34:19 +01:00
Matthieu Gautier
3958b2a06f Make the internal map codeisomapping static.
Symbole should not be visible outside of the compilation unit.
2018-11-12 10:33:35 +01:00
Matthieu Gautier
9fa7d78ba1 Merge pull request #176 from kiwix/win_relpath
Win relpath
2018-11-03 12:33:14 +01:00
Matthieu Gautier
57d3552b97 New version 3.0.2 2018-11-03 12:20:13 +01:00
Matthieu Gautier
d4ecda40ff Use the correct separator when computing relativePath. 2018-11-03 12:18:54 +01:00
Matthieu Gautier
802df71410 Merge pull request #175 from kiwix/fix
Fix
2018-11-02 17:32:22 +01:00
Matthieu Gautier
4d904c4d8b New version 3.0.1 2018-11-02 17:10:05 +01:00
Matthieu Gautier
9ab44e6a5f Get information about the total number of book of a search.
When we do a search and paging the result, we need to display to the
user the total number of book, not only the `itemsPerPage`.

So, we need to parse correctly the xml to keep information of the total
number of book.
2018-11-02 17:04:55 +01:00
Matthieu Gautier
5f4c04e79e Fix use of getAsI when parsing download rpc.
The value is store as a string in in the xml, so we cannot use getAsI.
We have to get the string and parse it to an int.
We cannot use strtoull because android stdc++ lib doesn't have it.

We have to implement our how parseFromString function using a
istringstream.
2018-11-02 17:03:03 +01:00
Matthieu Gautier
360c913230 Merge pull request #174 from kiwix/new_version
New version 3.0.0
2018-10-31 14:47:48 +01:00
Matthieu Gautier
a60ffe78d5 New version 3.0.0 2018-10-31 14:35:22 +01:00
Matthieu Gautier
b977b08683 Merge pull request #173 from kiwix/subprocess_windows
Subprocess windows
2018-10-31 14:04:21 +01:00
Matthieu Gautier
bb07ff5610 Do not add NULL at end of commandLine on Windows. 2018-10-31 13:56:42 +01:00
Matthieu Gautier
1787e30440 Better launch of the aria2 process.
By setting the ApplicationName to NULL, CreateProcessW will
search for the application in the path.
2018-10-31 13:56:42 +01:00
Matthieu Gautier
ccb3d8639d Use correct name for aria2c on windows. 2018-10-30 18:43:30 +01:00
Matthieu Gautier
5ed095531e Correctly set pkgconfig file for static curl linking. 2018-10-30 12:59:30 +01:00
Matthieu Gautier
29e554b47b Include pthread 2018-10-29 14:30:35 +01:00
Matthieu Gautier
68dc4d40b5 Include windows.h before synchapi.h 2018-10-29 12:20:00 +01:00
Matthieu Gautier
8dbc34e9ae Merge pull request #172 from kiwix/alpha2toalpha3
Alpha2toalpha3
2018-10-26 14:27:55 +02:00
Matthieu Gautier
2682fa8f9c Remove unecessary variable or output. 2018-10-26 14:19:10 +02:00
Matthieu Gautier
a22f962722 Correctly store the size of the book in the library.
`reader.getFileSize()` return ko.
2018-10-26 14:18:40 +02:00
Matthieu Gautier
a1876e3b27 Add a method converta2toa3 to convert language code alpha2 to alpha3.
Qt give use alpha2 language code but we use alpha3.
2018-10-26 14:18:06 +02:00
Matthieu Gautier
50b7e5664a Merge pull request #171 from kiwix/remoteContentManager
Remote content manager
2018-10-24 16:48:52 +02:00
Matthieu Gautier
ad654ead08 Do not force the download port to be 80.
We may want to use url with port != 80.
2018-10-24 11:56:38 +02:00
Matthieu Gautier
c6206edfb4 Do not always download the favicon of a book. Download as needed.
When parsing a opds feed, the favicon is a url, not a dataurl.
If we download the favicon all the times, it may take a lot of time to
parse the feed.

We store the url and download the favicon only when needed (when displayed)
2018-10-24 11:56:05 +02:00
Matthieu Gautier
c20ae18bff An opds feed can also be the openSearch result.
We must be able to set the correct entry in the feed for a searchResult.
2018-10-24 11:51:38 +02:00
Matthieu Gautier
b1508c0b98 Better listBooksIds supported mode.
Only have REMOTE or LOCAL is a bit restrictive. By using flags a user
can specify for complex request.
2018-10-24 11:50:11 +02:00
Matthieu Gautier
2d59e12a4d Merge pull request #170 from kiwix/content_manager
Content manager
2018-10-24 11:18:14 +02:00
Matthieu Gautier
1b44eb33f3 [TRAVIS] Last osx version of travis already have python3 installed. 2018-10-24 11:07:10 +02:00
Matthieu Gautier
34021994cd Fix for Android
- No std::to_string. We have to implement it with a ostringstream
- No pthread_cancel. So we use pthread_kill to send a signal to the thread.
2018-10-24 10:48:53 +02:00
Matthieu Gautier
910ce5f10d Fix for Windows
- "winsock2.h" needs to be included before "windows.h". But if a
  compilation unit include "windows.h" and after "networkTools.h", we
  fails and it is complicated to handle. The include must not be in the
  header but in the cpp
- windows define some ERROR macro. It is a pitty but we cannot use `ERROR`
  in our enum.
- If build statically using mingw we need to define `CURL_STATICLIB`
2018-10-24 10:47:12 +02:00
Matthieu Gautier
c66c7e9c20 Store the size of the book in OPDSFeed. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
ad69fdd8c0 Move the download method from the downloader to networkTools.
The download method is a simple method to download content.
It use curl to download the content instead of aria.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
a73ef23f6e Keep the book size in byte in memory (instead of in kb)
We keep the size in kb in library.xml for compatibility.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
fe6d5fa93e Store the downloadId in the book (and in the library). 2018-10-24 10:47:12 +02:00
Matthieu Gautier
43ff8565d1 Add a Download class to encapsulate a aria2 download. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
f718c4c472 Add a LibraryManipulator.
Library client (kiwix-desktop) need to know when a book is added to
library by the manager. By using a LibraryManipulator, we can do
dependency injection.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
8176a6eded Be more resilient to potential aria2 error. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
bb1f777078 Store the aria2 session and recover from it. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
829c34dd69 Store in the book instance if the given path is valid.
The path may exist and not be valid if the zim file is not truncated
(ie, interrupted download)
2018-10-24 10:47:12 +02:00
Matthieu Gautier
9c0f9696ed Better beautifyInteger and beautifyFileSize. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
be6dc01b4f Add few helper methods to xmlrpc objects. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
18fc5cb4df Correctly set the aria2 secret rpc. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
996829e4d7 Allow a OPDSDumper to dump only a subset of the library. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
5128861136 Remove default value for book pointer of readBookFromPath.
This is a nonsense to accept NULL pointer here.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
7804bf2276 Reimplement listBooksIds.
No real improvement.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
99e313f915 Clean includes of manager.h 2018-10-24 10:47:12 +02:00
Matthieu Gautier
839320d5e7 Move the Book class in its own source file. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
1e8f85eaff Rename methods title() into getTitle().
Same for all attributes.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
e0704b3b21 Move the initialization code of a book from xml|opds into Book. 2018-10-24 10:47:12 +02:00
Matthieu Gautier
57fbb98bca Do not store the favicon base64 encoded in the book.
The fact that the favicon is base64 encoded in a storage detail.
2018-10-24 10:47:12 +02:00
Matthieu Gautier
c7f9218350 base64_encode takes a string instead of a char* 2018-10-24 10:47:12 +02:00
Matthieu Gautier
66a9a69480 Move the code updating a book from a reader in the Book class. 2018-09-06 18:30:37 +02:00
Matthieu Gautier
04b05dd68b Remove removeBookById from the Manager.
Use the same method of the `Library`.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
aa6772b345 Remove the "last" book functionnality.
- This is not used by any application.
- This is application specific and should not be stored in the library
  (who is a list of book).
2018-09-06 18:30:37 +02:00
Matthieu Gautier
efae3e0d2f Do not make the Manager responsible to create the Library.
The `Manager` manage a library already existing.
This avoid the Library clone stuff.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
bba3c252e4 Make the member of the book protected.
It is up to the book to manage its attribute.

Also remove the `absolutePath` (and `indexAbsolutePath`). The `Book::path` is always stored
absolute.
The fact that the path can be stored absolute or relative in the
`library.xml` is not relevant for the book.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
57ac6f0305 Use a map to store the Library's books.
Having the books sorted is useless.
We handle books by id not by index.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
541fb0cfd1 Remove the "current" book functionnality.
- This is not used by any application.
- This is application specific and should not be stored in the library
  (who is a list of book).
2018-09-06 18:30:37 +02:00
Matthieu Gautier
c9eac04050 Make the Library`s book vector private.
Move a lot of methods from Manager to Library. Because books is private
and thoses methods are better in Library.
2018-09-06 18:30:37 +02:00
Matthieu Gautier
741c67786a Add update method to Book. 2018-09-06 18:30:37 +02:00
Matthieu Gautier
db9000f706 Make the downloader use the aria2c wrapper instead of the aria2 library. 2018-09-06 18:30:34 +02:00
Matthieu Gautier
0a93cb0872 Add aria2 downloader using subprocess aria2c. 2018-09-06 18:29:49 +02:00
Matthieu Gautier
f4846c1ac8 Add a tool's function to get the data directory.
The data directory is where kiwix application should store data.
2018-08-29 15:28:52 +02:00
Matthieu Gautier
9b516ac35d Add a small wrapper around pugixml do handle xmprpc. 2018-08-29 15:28:52 +02:00
Matthieu Gautier
79b780b75b Move the function to convert from xml_node to string in otherTools.
This can be usefull elsewhere than in opds_dumper
2018-08-29 15:28:52 +02:00
Matthieu Gautier
f3dd83907d Add backend to launch subprocess.
The windows backend is not tested.
2018-08-29 15:28:52 +02:00
Matthieu Gautier
c351e7ccf1 Merge pull request #168 from kiwix/java_jdk8+
Update jni build script to java jdk 8+.
2018-08-20 17:53:39 +02:00
Matthieu Gautier
7c634738dd Update jni build script to java jdk 8+.
With jdk8, `javac` has an option `-h` to generate the header files of
native classes.
So there is no need to run `javah` several times.

As there is now only one command to run (`javac`), there is no need for
the wrapper script `gen_kiwix.sh`.

Fix #167
2018-08-20 12:17:29 +02:00
68 changed files with 2918 additions and 3163 deletions

View File

@@ -18,6 +18,7 @@ addons:
apt:
packages:
- cmake
- python3.5
- python3-pip
- libbz2-dev
- ccache

View File

@@ -1,3 +1,95 @@
kiwix-lib 4.0.0
===============
* [API break] Remove support for external index.
* Move to the mustache templating system instead of ctpp2.
* Make meson.build works for meson>=0.43.0
* [API break] Move the basic tools from the `common` directory to `tools`.
kiwix-lib 3.1.1
===============
* The OPDS feed book's date must be the date of the book, not the date of the
feed generation.
* Convert the standard opds date to our format (YYYY-MM-DD)
* Remove duplicate language attribute in the libxml dumper.
* Create the datadirectory to not fail to write a file in a non-existent
directory
kiwix-lib 3.1.0
===============
* Add a method to get the favicon url of book (if available).
* Move dump code of library.xml in a specific class.
* Add a first support to bookmarks
kiwix-lib 3.0.3
===============
* Add the 'en' language to the mapping alpha2-code ('en') to alpha3-code
('eng').
* Correctly write the 'ArticleCount' and 'MediaCount' in the library.xml.
* Correctly fill the book size for the zim file size.
* Fix launch of aria2c.
kiwix-lib 3.0.2
===============
* Use the correct path separator when computing relativePath on Windows.
kiwix-lib 3.0.1
===============
* Small fix about parsing the opdsStream.
kiwix-lib 3.0.0
===============
* Change the downloader to use aria2 using a separated process (with rpc)
instead of using the libaria2. This simplify a lot the link process to
libaria2 on Windows.
- kiwix-lib doesn't depend on libaria2 anymore.
- kiwix-lib now depends on libcurl.
* [API break] Library class API has been updated :
- Books are referenced by id, not index. A lot of methods have been
updated this way.
- Books "list" is now private.
- There is no more "current" book.
- listBooksIds's filters have been updated.
* [API break] Book class API has been updated :
- Move the definition of Book in `book.h`.
- Use getter/setter methods instead public members.
- Size (getSize/setSize) is now returned in bytes, not kB.
- Dependending of how the book has been initialized (opdsfeed), the
faviconUrl may be stored in the book, the favicon being downloaded when
using `getFavicon`.
- The path (and indexPath) are always absolute path.
- Book has now a downloadId, corresponding to the aria2 download id (if
exists)
* [API break] Manager class API has been updated :
- The manager is mainly use to fill a Libray from a "library.xml" file or
opds feed. Other operations (has removeBookById, setBookPath, filter, ...)
have been removed.
- The manager use a intermediate class (LibraryManipulator) to add book to
the library. This dependency injection allow caller code to hook the add
of a book to the library.
- The manager work on a existing Library. It doesn't how a internal
Library.
* [API break] OpdsDumper class API has been updated :
- dumpOPDSFeed method now take the list of bookIds to dump instead of
dumping all books in the library.
- OpdsDumper can now dump openSearch result information (total result
count, start index, ...).
* [API break] Common tools API has been updated :
- `base64_encode` and `base64_decode` take std::string as arguments.
- New `download` function in networkTools.h using libcurl.
- New `getDataDirectory` function in pathTools.
- Better `beautifyInteger` and `beautifyFileSize` functions.
- New `nodeToString` function serializing a pugi::xml_node to a string.
- New `converta2toa3` function to convert alpha2 language code to aplha3
language code.
kiwix-lib 2.0.2
===============

View File

@@ -33,10 +33,6 @@ libraries need to be available:
(package libzim-dev on Ubuntu)
* Pugixml ........................................ http://pugixml.org/
(package libpugixml-dev on Ubuntu)
* ctpp2 ........................................ http://ctpp.havoc.ru/
(package libctpp2-dev on Ubuntu)
* Xapian ......................................... https://xapian.org/
(package libxapian-dev on Ubuntu)
* libaria2 .................................. https://aria2.github.io/
(no package on Ubuntu)
@@ -49,10 +45,6 @@ version by hand.
If you want to install these dependencies locally, then use the
kiwix-lib directory as install prefix.
If you compile ctpp2 from source and want to compile the Kiwix library
statically then you will probably need to rename ctpp2 static library
from ctpp2-st.a to ctpp2.a.
Environment
-------------

119
include/book.h Normal file
View File

@@ -0,0 +1,119 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_BOOK_H
#define KIWIX_BOOK_H
#include <string>
namespace pugi {
class xml_node;
}
namespace kiwix
{
class OPDSDumper;
class Reader;
/**
* A class to store information about a book (a zim file)
*/
class Book
{
public:
Book();
~Book();
bool update(const Book& other);
void update(const Reader& reader);
void updateFromXml(const pugi::xml_node& node, const std::string& baseDir);
void updateFromOpds(const pugi::xml_node& node, const std::string& urlHost);
std::string getHumanReadableIdFromPath();
bool readOnly() const { return m_readOnly; }
const std::string& getId() const { return m_id; }
const std::string& getPath() const { return m_path; }
bool isPathValid() const { return m_pathValid; }
const std::string& getTitle() const { return m_title; }
const std::string& getDescription() const { return m_description; }
const std::string& getLanguage() const { return m_language; }
const std::string& getCreator() const { return m_creator; }
const std::string& getPublisher() const { return m_publisher; }
const std::string& getDate() const { return m_date; }
const std::string& getUrl() const { return m_url; }
const std::string& getName() const { return m_name; }
const std::string& getTags() const { return m_tags; }
const std::string& getOrigId() const { return m_origId; }
const uint64_t& getArticleCount() const { return m_articleCount; }
const uint64_t& getMediaCount() const { return m_mediaCount; }
const uint64_t& getSize() const { return m_size; }
const std::string& getFavicon() const;
const std::string& getFaviconUrl() const { return m_faviconUrl; }
const std::string& getFaviconMimeType() const { return m_faviconMimeType; }
const std::string& getDownloadId() const { return m_downloadId; }
void setReadOnly(bool readOnly) { m_readOnly = readOnly; }
void setId(const std::string& id) { m_id = id; }
void setPath(const std::string& path);
void setPathValid(bool valid) { m_pathValid = valid; }
void setTitle(const std::string& title) { m_title = title; }
void setDescription(const std::string& description) { m_description = description; }
void setLanguage(const std::string& language) { m_language = language; }
void setCreator(const std::string& creator) { m_creator = creator; }
void setPublisher(const std::string& publisher) { m_publisher = publisher; }
void setDate(const std::string& date) { m_date = date; }
void setUrl(const std::string& url) { m_url = url; }
void setName(const std::string& name) { m_name = name; }
void setTags(const std::string& tags) { m_tags = tags; }
void setOrigId(const std::string& origId) { m_origId = origId; }
void setArticleCount(uint64_t articleCount) { m_articleCount = articleCount; }
void setMediaCount(uint64_t mediaCount) { m_mediaCount = mediaCount; }
void setSize(uint64_t size) { m_size = size; }
void setFavicon(const std::string& favicon) { m_favicon = favicon; }
void setFaviconMimeType(const std::string& faviconMimeType) { m_faviconMimeType = faviconMimeType; }
void setDownloadId(const std::string& downloadId) { m_downloadId = downloadId; }
protected:
std::string m_id;
std::string m_downloadId;
std::string m_path;
bool m_pathValid;
std::string m_title;
std::string m_description;
std::string m_language;
std::string m_creator;
std::string m_publisher;
std::string m_date;
std::string m_url;
std::string m_name;
std::string m_tags;
std::string m_origId;
uint64_t m_articleCount;
uint64_t m_mediaCount;
bool m_readOnly;
uint64_t m_size;
mutable std::string m_favicon;
std::string m_faviconUrl;
std::string m_faviconMimeType;
};
}
#endif

68
include/bookmark.h Normal file
View File

@@ -0,0 +1,68 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_BOOKMARK_H
#define KIWIX_BOOKMARK_H
#include <string>
namespace pugi {
class xml_node;
}
namespace kiwix
{
/**
* A class to store information about a bookmark (an article in a book)
*/
class Bookmark
{
public:
Bookmark();
~Bookmark();
void updateFromXml(const pugi::xml_node& node);
const std::string& getBookId() const { return m_bookId; }
const std::string& getBookTitle() const { return m_bookTitle; }
const std::string& getUrl() const { return m_url; }
const std::string& getTitle() const { return m_title; }
const std::string& getLanguage() const { return m_language; }
const std::string& getDate() const { return m_date; }
void setBookId(const std::string& bookId) { m_bookId = bookId; }
void setBookTitle(const std::string& bookTitle) { m_bookTitle = bookTitle; }
void setUrl(const std::string& url) { m_url = url; }
void setTitle(const std::string& title) { m_title = title; }
void setLanguage(const std::string& language) { m_language = language; }
void setDate(const std::string& date) { m_date = date; }
protected:
std::string m_bookId;
std::string m_bookTitle;
std::string m_url;
std::string m_title;
std::string m_language;
std::string m_date;
};
}
#endif

View File

@@ -1,4 +0,0 @@
#include <string>
std::string base64_encode(unsigned char const* , unsigned int len);
std::string base64_decode(std::string const& s);

View File

@@ -1,79 +0,0 @@
/*
* Copyright 2013 Renaud Gaudin <reg@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef _CTPP2_VM_STRING_LOADER_HPP__
#define _CTPP2_VM_STRING_LOADER_HPP__ 1
#include <ctpp2/CTPP2VMLoader.hpp>
#include <ctpp2/CTPP2Util.hpp>
#include <ctpp2/CTPP2Exception.hpp>
#include <ctpp2/CTPP2VMExecutable.hpp>
#include <ctpp2/CTPP2VMInstruction.hpp>
#include <ctpp2/CTPP2VMMemoryCore.hpp>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <string>
/**
@file VMStringLoader.hpp
@brief Load program core from file
*/
namespace CTPP // C++ Template Engine
{
// FWD
struct VMExecutable;
/**
@class VMStringLoader CTPP2VMStringLoader.hpp <CTPP2VMStringLoader.hpp>
@brief Load program core from file
*/
class CTPP2DECL VMStringLoader:
public VMLoader
{
public:
/**
*/
VMStringLoader(CCHAR_P rawContent, size_t rawContentSize);
/**
@brief Get ready-to-run program
*/
const VMMemoryCore * GetCore() const;
/**
@brief A destructor
*/
~VMStringLoader() throw();
private:
/** Program core */
VMExecutable * oCore;
/** Ready-to-run program */
VMMemoryCore * pVMMemoryCore;
};
} // namespace CTPP
#endif // _CTPP2_VM_STRING_LOADER_HPP__
// End.

View File

@@ -21,15 +21,15 @@
#define KIWIX_DOWNLOADER_H
#include <string>
#ifdef ENABLE_LIBARIA2
# include <aria2/aria2.h>
#endif
#include <vector>
#include <map>
#include <pthread.h>
#include <memory>
namespace kiwix
{
class Aria2;
struct DownloadedFile {
DownloadedFile()
: success(false) {}
@@ -37,6 +37,46 @@ struct DownloadedFile {
std::string path;
};
class AriaError : public std::runtime_error {
public:
AriaError(const std::string& message) : std::runtime_error(message) {}
};
class Download {
public:
typedef enum { K_ACTIVE, K_WAITING, K_PAUSED, K_ERROR, K_COMPLETE, K_REMOVED, K_UNKNOWN } StatusResult;
Download() :
m_status(K_UNKNOWN) {}
Download(std::shared_ptr<Aria2> p_aria, std::string did)
: mp_aria(p_aria),
m_status(K_UNKNOWN),
m_did(did) {};
void updateStatus(bool follow=false);
StatusResult getStatus() { return m_status; }
std::string getDid() { return m_did; }
std::string getFollowedBy() { return m_followedBy; }
uint64_t getTotalLength() { return m_totalLength; }
uint64_t getCompletedLength() { return m_completedLength; }
uint64_t getDownloadSpeed() { return m_downloadSpeed; }
uint64_t getVerifiedLength() { return m_verifiedLength; }
std::string getPath() { return m_path; }
std::vector<std::string>& getUris() { return m_uris; }
protected:
std::shared_ptr<Aria2> mp_aria;
StatusResult m_status;
std::string m_did = "";
std::string m_followedBy = "";
uint64_t m_totalLength;
uint64_t m_completedLength;
uint64_t m_downloadSpeed;
uint64_t m_verifiedLength;
std::vector<std::string> m_uris;
std::string m_path;
};
/**
* A tool to download things.
*
@@ -45,28 +85,19 @@ class Downloader
{
public:
Downloader();
~Downloader();
virtual ~Downloader();
/**
* Download a content.
*
* @param url the url to download
* @return the content downloaded.
*/
DownloadedFile download(const std::string& url);
void close();
Download* startDownload(const std::string& uri);
Download* getDownload(const std::string& did);
size_t getNbDownload() { return m_knownDownloads.size(); }
std::vector<std::string> getDownloadIds();
private:
static pthread_mutex_t globalLock;
std::string tmpDir;
#ifdef ENABLE_LIBARIA2
DownloadedFile* fileHandle;
aria2::Session* session;
static int downloadEventCallback(aria2::Session* session,
aria2::DownloadEvent event,
aria2::A2Gid gid,
void* userData);
#endif
std::map<std::string, std::unique_ptr<Download>> m_knownDownloads;
std::shared_ptr<Aria2> mp_aria;
};
}

View File

@@ -24,7 +24,6 @@
#include <zim/article.h>
#include <exception>
#include <string>
#include "common.h"
using namespace std;

View File

@@ -20,78 +20,42 @@
#ifndef KIWIX_LIBRARY_H
#define KIWIX_LIBRARY_H
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stack>
#include <string>
#include <vector>
#include <map>
#include "common/regexTools.h"
#include "common/stringTools.h"
#include "book.h"
#include "bookmark.h"
#define KIWIX_LIBRARY_VERSION "20110515"
using namespace std;
namespace kiwix
{
enum supportedIndexType { UNKNOWN, XAPIAN };
class OPDSDumper;
/**
* A class to store information about a book (a zim file)
*/
class Book
{
public:
Book();
~Book();
static bool sortByLastOpen(const Book& a, const Book& b);
static bool sortByTitle(const Book& a, const Book& b);
static bool sortBySize(const Book& a, const Book& b);
static bool sortByDate(const Book& a, const Book& b);
static bool sortByCreator(const Book& a, const Book& b);
static bool sortByPublisher(const Book& a, const Book& b);
static bool sortByLanguage(const Book& a, const Book& b);
string getHumanReadableIdFromPath();
string id;
string path;
string pathAbsolute;
string last;
string indexPath;
string indexPathAbsolute;
supportedIndexType indexType;
string title;
string description;
string language;
string creator;
string publisher;
string date;
string url;
string name;
string tags;
string origId;
string articleCount;
string mediaCount;
bool readOnly;
string size;
string favicon;
string faviconMimeType;
enum supportedListSortBy { UNSORTED, TITLE, SIZE, DATE, CREATOR, PUBLISHER };
enum supportedListMode {
ALL = 0,
LOCAL = 1,
REMOTE = 1 << 1,
NOLOCAL = 1 << 2,
NOREMOTE = 1 << 3,
VALID = 1 << 4,
NOVALID = 1 << 5
};
/**
* A Library store several books.
*/
class Library
{
std::map<std::string, kiwix::Book> m_books;
std::vector<kiwix::Bookmark> m_bookmarks;
public:
Library();
~Library();
string version;
/**
* Add a book to the library.
*
@@ -104,26 +68,136 @@ class Library
*/
bool addBook(const Book& book);
/**
* Add a bookmark to the library.
*
* @param bookmark the book to add.
*/
void addBookmark(const Bookmark& bookmark);
/**
* Remove a bookmarkk
*
* @param zimId The zimId of the bookmark.
* @param url The url of the bookmark.
* @return True if the bookmark has been removed.
*/
bool removeBookmark(const std::string& zimId, const std::string& url);
Book& getBookById(const std::string& id);
/**
* Remove a book from the library.
*
* @param bookIndex the index of the book to remove.
* @return True
* @param id the id of the book to remove.
* @return True if the book were in the lirbrary and has been removed.
*/
bool removeBookByIndex(const unsigned int bookIndex);
vector<kiwix::Book> books;
bool removeBookById(const std::string& id);
/*
* 'current' is the variable storing the current content/book id
* in the library. This is used to be able to load per default a
* content. As Kiwix may work with many library XML files, you may
* have "current" defined many time with different values. The
* last XML file read has the priority, Although we do not have an
* library object for each file, we want to be able to fallback to
* an 'old' current book if the one which should be load
* failed. That is the reason why we need a stack here
/**
* Write the library to a file.
*
* @param path the path of the file to write to.
* @return True if the library has been correctly saved.
*/
stack<string> current;
bool writeToFile(const std::string& path);
/**
* Write the library bookmarks to a file.
*
* @param path the path of the file to write to.
* @return True if the library has been correctly saved.
*/
bool writeBookmarksToFile(const std::string& path);
/**
* Get the number of book in the library.
*
* @param localBooks If we must count local books (books with a path).
* @param remoteBooks If we must count remote books (books with an url)
* @return The number of books.
*/
unsigned int getBookCount(const bool localBooks, const bool remoteBooks);
/**
* Get all langagues of the books in the library.
*
* @return A list of languages.
*/
std::vector<std::string> getBooksLanguages();
/**
* Get all book creators of the books in the library.
*
* @return A list of book creators.
*/
std::vector<std::string> getBooksCreators();
/**
* Get all book publishers of the books in the library.
*
* @return A list of book publishers.
*/
std::vector<std::string> getBooksPublishers();
/**
* Get all bookmarks.
*
* @return A list of bookmarks
*/
const std::vector<kiwix::Bookmark>& getBookmarks() { return m_bookmarks; }
/**
* Get all book ids of the books in the library.
*
* @return A list of book ids.
*/
std::vector<std::string> getBooksIds();
/**
* Filter the library and generate a new one with the keep elements.
*
* This is equivalent to `listBookIds(ALL, UNSORTED, search)`.
*
* @param search List only books with search in the title or description.
* @return The list of bookIds corresponding to the query.
*/
std::vector<std::string> filter(const std::string& search);
/**
* List books in the library.
*
* @param mode The mode of listing :
* - LOCAL  : list only local books (with a path).
* - REMOTE : list only remote books (with an url).
* - VALID  : list only valid books (without a path or with a
* path pointing to a valid zim file).
* - NOLOCAL : list only books without valid path.
* - NOREMOTE : list only books without url.
* - NOVALID : list only books not valid.
* - ALL : Do not do any filter (LOCAL or REMOTE)
* - Flags can be combined.
* @param sortBy Attribute to sort by the book list.
* @param search List only books with search in the title, description.
* @param language List only books in this language.
* @param creator List only books of this creator.
* @param publisher List only books of this publisher.
* @param maxSize Do not list book bigger than maxSize.
* Set to 0 to cancel this filter.
* @return The list of bookIds corresponding to the query.
*/
std::vector<std::string> listBooksIds(
int supportedListMode = ALL,
supportedListSortBy sortBy = UNSORTED,
const std::string& search = "",
const std::string& language = "",
const std::string& creator = "",
const std::string& publisher = "",
size_t maxSize = 0);
friend class OPDSDumper;
friend class libXMLDumper;
};
}

83
include/libxml_dumper.h Normal file
View File

@@ -0,0 +1,83 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_LIBXML_DUMPER_H
#define KIWIX_LIBXML_DUMPER_H
#include <string>
#include <vector>
#include <pugixml.hpp>
#include "library.h"
namespace kiwix
{
/**
* A tool to dump a `Library` into a basic library.xml
*
*/
class LibXMLDumper
{
public:
LibXMLDumper() = default;
LibXMLDumper(Library* library);
~LibXMLDumper();
/**
* Dump the library.xml
*
* @param id The id of the library.
* @return The library.xml content.
*/
std::string dumpLibXMLContent(const std::vector<std::string>& bookIds);
/**
* Dump the bookmark of the library.
*
* @return The bookmark.xml content.
*/
std::string dumpLibXMLBookmark();
/**
* Set the base directory used.
*
* @param baseDir the base directory to use.
*/
void setBaseDir(const std::string& baseDir) { this->baseDir = baseDir; }
/**
* Set the library to dump.
*
* @param library The library to dump.
*/
void setLibrary(Library* library) { this->library = library; }
protected:
kiwix::Library* library;
std::string baseDir;
private:
void handleBook(Book book, pugi::xml_node root_node);
void handleBookmark(Bookmark bookmark, pugi::xml_node root_node);
};
}
#endif // KIWIX_OPDS_DUMPER_H

View File

@@ -20,24 +20,41 @@
#ifndef KIWIX_MANAGER_H
#define KIWIX_MANAGER_H
#include <time.h>
#include <sstream>
#include <string>
#include <pugixml.hpp>
#include "common/base64.h"
#include "common/pathTools.h"
#include "common/regexTools.h"
#include "book.h"
#include "library.h"
#include "reader.h"
using namespace std;
#include <string>
#include <vector>
namespace pugi {
class xml_document;
}
namespace kiwix
{
enum supportedListMode { LASTOPEN, REMOTE, LOCAL };
enum supportedListSortBy { TITLE, SIZE, DATE, CREATOR, PUBLISHER };
class LibraryManipulator {
public:
virtual ~LibraryManipulator() {}
virtual bool addBookToLibrary(Book book) = 0;
virtual void addBookmarkToLibrary(Bookmark bookmark) = 0;
};
class DefaultLibraryManipulator : public LibraryManipulator {
public:
DefaultLibraryManipulator(Library* library) :
library(library) {}
virtual ~DefaultLibraryManipulator() {}
bool addBookToLibrary(Book book) {
return library->addBook(book);
}
void addBookmarkToLibrary(Bookmark bookmark) {
library->addBookmark(bookmark);
}
private:
kiwix::Library* library;
};
/**
* A tool to manage a `Library`.
@@ -48,7 +65,8 @@ enum supportedListSortBy { TITLE, SIZE, DATE, CREATOR, PUBLISHER };
class Manager
{
public:
Manager();
Manager(LibraryManipulator* manipulator);
Manager(Library* library);
~Manager();
/**
@@ -59,7 +77,7 @@ class Manager
* updated content.
* @return True if file has been properly parsed.
*/
bool readFile(const string path, const bool readOnly = true);
bool readFile(const std::string& path, const bool readOnly = true);
/**
* Read a `library.xml` and add book in the file to the library.
@@ -71,8 +89,8 @@ class Manager
* updated content.
* @return True if file has been properly parsed.
*/
bool readFile(const string nativePath,
const string UTF8Path,
bool readFile(const std::string& nativePath,
const std::string& UTF8Path,
const bool readOnly = true);
/**
@@ -84,9 +102,9 @@ class Manager
* @param libraryPath The library path (used to resolve relative path)
* @return True if the content has been properly parsed.
*/
bool readXml(const string& xml,
bool readXml(const std::string& xml,
const bool readOnly = true,
const string libraryPath = "");
const std::string& libraryPath = "");
/**
* Load a library content stored in a OPDS stream.
@@ -97,69 +115,16 @@ class Manager
* @param libraryPath The library path (used to resolve relative path)
* @return True if the content has been properly parsed.
*/
bool readOpds(const string& content, const std::string& urlHost);
/**
* Write the library to a file.
*
* @param path the path of the file to write.
* @return True.
*/
bool writeFile(const string path);
bool readOpds(const std::string& content, const std::string& urlHost);
/**
* Remove a book from the library.
* Load a bookmark file.
*
* @param bookIndex the index of the book to remove
* @return True
* @param path The path of the file to read.
* @return True if the content has been properly parsed.
*/
bool removeBookByIndex(const unsigned int bookIndex);
/**
* Remove a book from the library.
*
* @param id the id of the book to remove.
* @return True if the book were in the library.
*/
bool removeBookById(const string id);
/**
* Set the current book.
*
* @param id The id to add to the stack of current books.
* If id is empty, remove the current book from the stack.
* @return True
*/
bool setCurrentBookId(const string id);
/**
* Get the current book id.
*
* @return The id of the current book (or empty string if no current book).
*/
string getCurrentBookId() const;
/**
* Set the path of the external fulltext index associated to a book.
*
* @param id The id of the book to set.
* @param path The path of the external fullext index.
* @param supportedIndexType The type of the fulltext index.
* @return True if the book is in the library.
*/
bool setBookIndex(const string id,
const string path,
const supportedIndexType type = XAPIAN);
/**
* Set the path of the zim file associated to a book.
*
* @param id The id of the book to set.
* @param path The path of the zim file.
* @return True if the book is in the library.
*/
bool setBookPath(const string id, const string path);
bool readBookmarkFile(const std::string& path);
/**
* Add a book to the library.
@@ -172,9 +137,9 @@ class Manager
* @return The id of the book if the book has been added to the library.
* Else, an empty string.
*/
string addBookFromPathAndGetId(const string pathToOpen,
const string pathToSave = "",
const string url = "",
std::string addBookFromPathAndGetId(const std::string& pathToOpen,
const std::string& pathToSave = "",
const std::string& url = "",
const bool checkMetaData = false);
/**
@@ -188,18 +153,11 @@ class Manager
* @return True if the book has been added to the library.
*/
bool addBookFromPath(const string pathToOpen,
const string pathToSave = "",
const string url = "",
bool addBookFromPath(const std::string& pathToOpen,
const std::string& pathToSave = "",
const std::string& url = "",
const bool checkMetaData = false);
/**
* Clone and return the internal library.
*
* @return A clone of the library.
*/
Library cloneLibrary();
/**
* Get the book corresponding to an id.
*
@@ -207,24 +165,7 @@ class Manager
* @param[out] book The book corresponding to the id.
* @return True if the book has been found.
*/
bool getBookById(const string id, Book& book);
/**
* Get the current book.
*
* @param[out] The current book.
* @return True if there is a current book.
*/
bool getCurrentBook(Book& book);
/**
* Get the number of book in the library.
*
* @param localBooks If we must count local books (books with a path).
* @param remoteBooks If we must count remote books (books with an url)
* @return The number of books.
*/
unsigned int getBookCount(const bool localBooks, const bool remoteBooks);
bool getBookById(const std::string& id, Book& book);
/**
* Update the "last open date" of a book
@@ -232,7 +173,7 @@ class Manager
* @param id the id of the book.
* @return True if the book is in the library.
*/
bool updateBookLastOpenDateById(const string id);
bool updateBookLastOpenDateById(const std::string& id);
/**
* Remove (set to empty) paths of all books in the library.
@@ -261,64 +202,31 @@ class Manager
bool listBooks(const supportedListMode mode,
const supportedListSortBy sortBy,
const unsigned int maxSize,
const string language,
const string creator,
const string publisher,
const string search);
const std::string& language,
const std::string& creator,
const std::string& publisher,
const std::string& search);
/**
* Filter the library and generate a new one with the keep elements.
*
* @param search List only books with search in the title or description.
* @return A `Library`.
*/
Library filter(const string& search);
std::string writableLibraryPath;
/**
* Get all langagues of the books in the library.
*
* @return A list of languages.
*/
vector<string> getBooksLanguages();
/**
* Get all book creators of the books in the library.
*
* @return A list of book creators.
*/
vector<string> getBooksCreators();
/**
* Get all book publishers of the books in the library.
*
* @return A list of book publishers.
*/
vector<string> getBooksPublishers();
/**
* Get all book ids of the books in the library.
*
* @return A list of book ids.
*/
vector<string> getBooksIds();
string writableLibraryPath;
vector<std::string> bookIdList;
bool m_hasSearchResult = false;
uint64_t m_totalBooks = 0;
uint64_t m_startIndex = 0;
uint64_t m_itemsPerPage = 0;
protected:
kiwix::Library library;
kiwix::LibraryManipulator* manipulator;
bool mustDeleteManipulator;
bool readBookFromPath(const string path, Book* book = NULL);
bool readBookFromPath(const std::string& path, Book* book);
bool parseXmlDom(const pugi::xml_document& doc,
const bool readOnly,
const string libraryPath);
const std::string& libraryPath);
bool parseOpdsDom(const pugi::xml_document& doc,
const std::string& urlHost);
private:
void checkAndCleanBookPaths(Book& book, const string& libraryPath);
void checkAndCleanBookPaths(Book& book, const std::string& libraryPath);
};
}

View File

@@ -1,7 +1,10 @@
headers = [
'book.h',
'bookmark.h',
'common.h',
'library.h',
'manager.h',
'libxml_dumper.h',
'opds_dumper.h',
'downloader.h',
'reader.h',
@@ -9,26 +12,15 @@ headers = [
'searcher.h'
]
if xapian_dep.found()
headers += ['xapianSearcher.h']
endif
install_headers(headers, subdir:'kiwix')
install_headers(
'common/base64.h',
'common/networkTools.h',
'common/otherTools.h',
'common/pathTools.h',
'common/regexTools.h',
'common/stringTools.h',
subdir:'kiwix/common'
'tools/base64.h',
'tools/networkTools.h',
'tools/otherTools.h',
'tools/pathTools.h',
'tools/regexTools.h',
'tools/stringTools.h',
subdir:'kiwix/tools'
)
if has_ctpp2_dep
install_headers(
'ctpp2/CTPP2VMStringLoader.hpp',
subdir:'kiwix/ctpp2'
)
endif

View File

@@ -26,9 +26,9 @@
#include <pugixml.hpp>
#include "common/base64.h"
#include "common/pathTools.h"
#include "common/regexTools.h"
#include "tools/base64.h"
#include "tools/pathTools.h"
#include "tools/regexTools.h"
#include "library.h"
#include "reader.h"
@@ -45,7 +45,7 @@ class OPDSDumper
{
public:
OPDSDumper() = default;
OPDSDumper(Library library);
OPDSDumper(Library* library);
~OPDSDumper();
/**
@@ -54,7 +54,7 @@ class OPDSDumper
* @param id The id of the library.
* @return The OPDS feed.
*/
std::string dumpOPDSFeed();
std::string dumpOPDSFeed(const std::vector<std::string>& bookIds);
/**
* Set the id of the opds stream.
@@ -84,20 +84,33 @@ class OPDSDumper
*/
void setSearchDescriptionUrl(const std::string& searchDescriptionUrl) { this->searchDescriptionUrl = searchDescriptionUrl; }
/**
* Set some informations about the search results.
*
* @param totalResult the total number of results of the search.
* @param startIndex the start index of the result.
* @param count the number of result of the current set (or page).
*/
void setOpenSearchInfo(int totalResult, int startIndex, int count);
/**
* Set the library to dump.
*
* @param library The library to dump.
*/
void setLibrary(Library library) { this->library = library; }
void setLibrary(Library* library) { this->library = library; }
protected:
kiwix::Library library;
kiwix::Library* library;
std::string id;
std::string title;
std::string date;
std::string rootLocation;
std::string searchDescriptionUrl;
int m_totalResults;
int m_startIndex;
int m_count;
bool m_isSearchResult = false;
private:
pugi::xml_node handleBook(Book book, pugi::xml_node root_node);

View File

@@ -31,8 +31,8 @@
#include <string>
#include "common.h"
#include "entry.h"
#include "common/pathTools.h"
#include "common/stringTools.h"
#include "tools/pathTools.h"
#include "tools/stringTools.h"
using namespace std;

View File

@@ -29,8 +29,8 @@
#include <string>
#include <vector>
#include <vector>
#include "common/pathTools.h"
#include "common/stringTools.h"
#include "tools/pathTools.h"
#include "tools/stringTools.h"
#include "kiwix_config.h"
using namespace std;
@@ -57,24 +57,7 @@ struct SearcherInternal;
* The Searcher class is reponsible to do different kind of search using the
* fulltext index.
*
* Historically, there are two kind of fulltext index :
* - The legacy one, is the external fulltext index. A directory stored outside
* of the zim file.
* - The new one, a embedded fulltext index in the zim file.
*
* Legacy external fulltext index has to be considered as obsolet format with
* less functionnalities:
* - No multi zim search ;
* - No geo_search ;
* - No suggestions search ;
*
* To reflect this, there is two Search creation "API":
* - One for the external fulltext index, using the constructor taking a
* xapianDirectoryPath) ;
* - One for the embedded fulltext index, using a "empty" constructor and the
* `add_reader` method".
*
* On top of that, the Searcher may (if compiled with ctpp2) be used to
* Searcher may (if compiled with ctpp2) be used to
* generate a html page for the search result. This use a template that need a
* humanReaderName. This feature is only used by kiwix-serve and this should be
* move outside of Searcher (and with a better API). If you don't use the html
@@ -92,18 +75,6 @@ class Searcher
*/
Searcher(const string& humanReadableName = "");
/**
* The constructor for legacy external fulltext index.
*
* @param xapianDirectoryPath The path to the external index directory.
* @param reader The reader associated to the external index.
* It will be used retrive the article content or generate
* the snippet.
* @param humanReadableName The humanReadableName for the zim.
*/
Searcher(const string& xapianDirectoryPath,
Reader* reader,
const string& humanReadableName);
~Searcher();
/**
@@ -192,12 +163,10 @@ class Searcher
*/
bool setSearchProtocolPrefix(const std::string prefix);
#ifdef ENABLE_CTPP2
/**
* Generate the html page with the resutls of the search.
*/
string getHtml();
#endif
protected:
std::string beautifyInteger(const unsigned int number);

4
include/tools/base64.h Normal file
View File

@@ -0,0 +1,4 @@
#include <string>
std::string base64_encode(const std::string& inString);
std::string base64_decode(const std::string& s);

View File

@@ -20,30 +20,14 @@
#ifndef KIWIX_NETWORKTOOLS_H
#define KIWIX_NETWORKTOOLS_H
#ifdef _WIN32
#include <winsock2.h>
#include <ws2tcpip.h>
#else
#include <net/if.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>
#endif
#include <iostream>
#include <map>
#include <string>
#include <vector>
namespace kiwix
{
std::map<std::string, std::string> getNetworkInterfaces();
std::string getBestPublicIp();
std::string download(const std::string& url);
}
#endif

View File

@@ -26,9 +26,13 @@
#include <unistd.h>
#endif
#include <pugixml.hpp>
namespace kiwix
{
void sleep(unsigned int milliseconds);
std::string nodeToString(pugi::xml_node node);
std::string converta2toa3(const std::string& a2code);
}
#endif

View File

@@ -59,5 +59,6 @@ bool copyFile(const string& sourcePath, const string& destPath);
string getLastPathElement(const string& path);
string getExecutablePath();
string getCurrentDirectory();
string getDataDirectory();
bool writeTextFile(const string& path, const string& content);
#endif

View File

@@ -33,10 +33,8 @@
namespace kiwix
{
#ifndef __ANDROID__
std::string beautifyInteger(const unsigned int number);
std::string beautifyFileSize(const unsigned int number);
std::string beautifyInteger(uint64_t number);
std::string beautifyFileSize(uint64_t number);
void printStringInHexadecimal(const char* s);
void printStringInHexadecimal(icu::UnicodeString s);
void stringReplacement(std::string& str,
@@ -44,8 +42,6 @@ void stringReplacement(std::string& str,
const std::string& newStr);
std::string encodeDiples(const std::string& str);
#endif
std::string removeAccents(const std::string& text);
void loadICUExternalTables();
@@ -64,6 +60,20 @@ std::string lcFirst(const std::string& word);
std::string toTitle(const std::string& word);
std::string normalize(const std::string& word);
template<typename T>
std::string to_string(T value)
{
std::ostringstream oss;
oss << value;
return oss.str();
}
template<typename T>
T extractFromString(const std::string& str) {
std::istringstream iss(str);
T ret;
iss >> ret;
return ret;
}
} //namespace kiwix
#endif

View File

@@ -1,98 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_XAPIAN_SEARCHER_H
#define KIWIX_XAPIAN_SEARCHER_H
#include <xapian.h>
#include "reader.h"
#include "searcher.h"
#include <map>
#include <string>
using namespace std;
namespace kiwix
{
class XapianSearcher;
class XapianResult : public Result
{
public:
XapianResult(XapianSearcher* searcher, Xapian::MSetIterator& iterator);
virtual ~XapianResult(){};
virtual std::string get_url();
virtual std::string get_title();
virtual int get_score();
virtual std::string get_snippet();
virtual std::string get_content();
virtual int get_wordCount();
virtual int get_size();
virtual int get_readerIndex() { return 0; };
private:
XapianSearcher* searcher;
Xapian::MSetIterator iterator;
Xapian::Document document;
};
class NoXapianIndexInZim : public exception
{
virtual const char* what() const throw()
{
return "There is no fulltext index in the zim file";
}
};
class XapianSearcher
{
friend class XapianResult;
public:
XapianSearcher(const string& xapianDirectoryPath, Reader* reader);
virtual ~XapianSearcher(){};
void searchInIndex(string& search,
const unsigned int resultStart,
const unsigned int resultEnd,
const bool verbose = false);
virtual Result* getNextResult();
void restart_search();
Xapian::MSet results;
protected:
void closeIndex();
void openIndex(const string& xapianDirectoryPath);
void setup_queryParser();
Reader* reader;
Xapian::Database readableDatabase;
std::string language;
std::string stopwords;
Xapian::QueryParser queryParser;
Xapian::Stem stemmer;
Xapian::SimpleStopper stopper;
Xapian::MSetIterator current_result;
std::map<std::string, int> valuesmap;
};
}
#endif

View File

@@ -1,10 +1,9 @@
project('kiwix-lib', 'cpp',
version : '2.0.2',
version : '4.0.0',
license : 'GPL',
default_options : ['c_std=c11', 'cpp_std=c++11', 'werror=true'])
compiler = meson.get_compiler('cpp')
find_library_in_compiler = meson.version().version_compare('>=0.31.0')
static_deps = get_option('android') or get_option('default_library') == 'static'
if get_option('android')
@@ -17,78 +16,24 @@ thread_dep = dependency('threads')
libicu_dep = dependency('icu-i18n', static:static_deps)
libzim_dep = dependency('libzim', version : '>=4.0.0', static:static_deps)
pugixml_dep = dependency('pugixml', static:static_deps)
libaria2_dep = dependency('libaria2', static:static_deps, required:false)
libcurl_dep = dependency('libcurl', static:static_deps)
ctpp2_include_path = ''
has_ctpp2_dep = false
ctpp2_prefix_install = get_option('ctpp2-install-prefix')
ctpp2_link_args = []
if ctpp2_prefix_install == ''
if compiler.has_header('ctpp2/CTPP2Logger.hpp')
if find_library_in_compiler
ctpp2_lib = compiler.find_library('ctpp2')
else
ctpp2_lib = find_library('ctpp2')
endif
ctpp2_link_args = ['-lctpp2']
if meson.is_cross_build() and host_machine.system() == 'windows'
if find_library_in_compiler
iconv_lib = compiler.find_library('iconv', required:false)
else
iconv_lib = find_library('iconv', required:false)
endif
if iconv_lib.found()
ctpp2_link_args += ['-liconv']
endif
endif
has_ctpp2_dep = true
ctpp2_dep = declare_dependency(link_args:ctpp2_link_args)
else
message('ctpp2/CTPP2Logger.hpp not found. Compiling without CTPP2 support')
endif
else
if not find_library_in_compiler
error('For custom ctpp2_prefix_install you need a meson version >=0.31.0')
endif
ctpp2_include_path = ctpp2_prefix_install + '/include'
ctpp2_include_args = ['-I'+ctpp2_include_path]
if compiler.has_header('ctpp2/CTPP2Logger.hpp', args:ctpp2_include_args)
ctpp2_include_dir = include_directories(ctpp2_include_path, is_system:true)
ctpp2_lib_path = join_paths(ctpp2_prefix_install, get_option('libdir'))
message(ctpp2_lib_path)
ctpp2_lib = compiler.find_library('ctpp2', dirs:ctpp2_lib_path, required:false)
if not ctpp2_lib.found()
ctpp2_lib_path = join_paths(ctpp2_prefix_install, 'lib')
message(ctpp2_lib_path)
ctpp2_lib = compiler.find_library('ctpp2', dirs:ctpp2_lib_path)
endif
ctpp2_link_args = ['-L'+ctpp2_lib_path, '-lctpp2']
if meson.is_cross_build() and host_machine.system() == 'windows'
iconv_lib = compiler.find_library('iconv', required:false)
if iconv_lib.found()
ctpp2_link_args += ['-liconv']
endif
endif
has_ctpp2_dep = true
ctpp2_dep = declare_dependency(include_directories:ctpp2_include_dir, link_args:ctpp2_link_args)
else
message('ctpp2/CTPP2Logger.hpp not found. Compiling without CTPP2 support')
endif
if not compiler.has_header('mustache.hpp')
error('Cannot found header mustache.hpp')
endif
xapian_dep = dependency('xapian-core', required:false, static:static_deps)
all_deps = [thread_dep, libicu_dep, libzim_dep, xapian_dep, pugixml_dep, libaria2_dep]
if has_ctpp2_dep
all_deps += [ctpp2_dep]
extra_cflags = ''
if target_machine.system() == 'windows' and static_deps
add_project_arguments('-DCURL_STATICLIB', language : 'cpp')
extra_cflags += '-DCURL_STATICLIB'
endif
all_deps = [thread_dep, libicu_dep, libzim_dep, pugixml_dep, libcurl_dep]
inc = include_directories('include')
conf = configuration_data()
conf.set('VERSION', '"@0@"'.format(meson.project_version()))
conf.set('ENABLE_CTPP2', has_ctpp2_dep)
conf.set('ENABLE_LIBARIA2', libaria2_dep.found())
if build_machine.system() == 'windows'
extra_link_args = ['-lshlwapi', '-lwinmm']
@@ -102,21 +47,7 @@ subdir('static')
subdir('src')
subdir('test')
pkg_requires = ['libzim', 'icu-i18n', 'pugixml']
if libaria2_dep.found()
pkg_requires += ['libaria2']
endif
if xapian_dep.found()
pkg_requires += ['xapian-core']
endif
extra_cflags = ''
if has_ctpp2_dep
extra_libs += ctpp2_link_args
if ctpp2_include_path != ''
extra_cflags = '-I'+ctpp2_include_path
endif
endif
pkg_requires = ['libzim', 'icu-i18n', 'pugixml', 'libcurl']
pkg_conf = configuration_data()
pkg_conf.set('prefix', get_option('prefix'))

View File

@@ -1,4 +1,2 @@
option('ctpp2-install-prefix', type : 'string', value : '',
description : 'Prefix where ctpp libs has been installed')
option('android', type : 'boolean', value : false,
description : 'Do we make a kiwix-lib for android')

View File

@@ -1,8 +0,0 @@
#!/usr/bin/env bash
ctpp2c=$1
SOURCE=$(pwd)/$2
DEST=$3
$ctpp2c $SOURCE $DEST

View File

@@ -1,5 +1,4 @@
res_compiler = find_program('kiwix-compile-resources')
intermediate_ctpp2c = find_program('ctpp2c.sh')
install_data(res_compiler.path(), install_dir:get_option('bindir'))

View File

@@ -1,16 +0,0 @@
#!/usr/bin/env bash
set -e
BUILD_PATH=$(pwd)
echo "javac -d $BUILD_PATH/src/android $@"
javac -d $BUILD_PATH/src/android/ "$@"
cd $BUILD_PATH/src/android
echo "javah -jni org.kiwix.kiwixlib"
javah -jni org.kiwix.kiwixlib.JNIKiwix
javah -jni org.kiwix.kiwixlib.JNIKiwixReader
javah -jni org.kiwix.kiwixlib.JNIKiwixSearcher
cd $BUILD_PATH

View File

@@ -24,7 +24,7 @@
#include <android/log.h>
#include "org_kiwix_kiwixlib_JNIKiwixReader.h"
#include "common/base64.h"
#include "tools/base64.h"
#include "reader.h"
#include "utils.h"
@@ -163,8 +163,7 @@ Java_org_kiwix_kiwixlib_JNIKiwixReader_getFavicon(JNIEnv* env, jobject obj)
std::string cMime;
READER->getFavicon(cContent, cMime);
favicon = c2jni(
base64_encode(reinterpret_cast<const unsigned char*>(cContent.c_str()),
cContent.length()),
base64_encode(cContent),
env);
} catch (std::exception& e) {
__android_log_print(ANDROID_LOG_ERROR, "kiwix", "Unable to get ZIM favicon");

View File

@@ -1,6 +1,4 @@
jni_generator = find_program('gen_kiwix.sh')
kiwix_jni = custom_target('jni',
input: ['org/kiwix/kiwixlib/JNIKiwix.java',
'org/kiwix/kiwixlib/JNIKiwixReader.java',
@@ -14,7 +12,7 @@ kiwix_jni = custom_target('jni',
'org_kiwix_kiwixlib_JNIKiwixReader.h',
'org_kiwix_kiwixlib_JNIKiwixSearcher.h',
'org_kiwix_kiwixlib_JNIKiwixSearcher_Result.h'],
command:[jni_generator, '@INPUT@']
command:['javac', '-d', '@OUTDIR@', '-h', '@OUTDIR@', '@INPUT@']
)
kiwix_sources += [

203
src/aria2.cpp Normal file
View File

@@ -0,0 +1,203 @@
#include "aria2.h"
#include "xmlrpc.h"
#include <sstream>
#include <thread>
#include <chrono>
#include <tools/otherTools.h>
#include <tools/pathTools.h>
#include <downloader.h> // For AriaError
#ifdef _WIN32
# define ARIA2_CMD "aria2c.exe"
#else
# define ARIA2_CMD "aria2c"
#endif
namespace kiwix {
Aria2::Aria2():
mp_aria(nullptr),
m_port(42042),
m_secret("kiwixariarpc"),
mp_curl(nullptr),
m_lock(PTHREAD_MUTEX_INITIALIZER)
{
m_downloadDir = getDataDirectory();
makeDirectory(m_downloadDir);
std::vector<const char*> callCmd;
std::string rpc_port = "--rpc-listen-port=" + to_string(m_port);
std::string download_dir = "--dir=" + getDataDirectory();
std::string session_file = appendToDirectory(getDataDirectory(), "kiwix.session");
std::string session = "--save-session=" + session_file;
std::string inputFile = "--input-file=" + session_file;
// std::string log_dir = "--log=\"" + logDir + "\"";
#ifdef _WIN32
int pid = GetCurrentProcessId();
#else
pid_t pid = getpid();
#endif
std::string stop_with_pid = "--stop-with-process=" + to_string(pid);
std::string rpc_secret = "--rpc-secret=" + m_secret;
m_secret = "token:"+m_secret;
std::string aria2cmd = appendToDirectory(
removeLastPathElement(getExecutablePath(), true, true),
ARIA2_CMD);
if (fileExists(aria2cmd)) {
// A local aria2c exe exists (packaged with kiwix-desktop), use it.
callCmd.push_back(aria2cmd.c_str());
} else {
// Try to use a potential installed aria2c.
callCmd.push_back(ARIA2_CMD);
}
callCmd.push_back("--enable-rpc");
callCmd.push_back(rpc_secret.c_str());
callCmd.push_back(rpc_port.c_str());
callCmd.push_back(download_dir.c_str());
if (fileExists(session_file)) {
callCmd.push_back(inputFile.c_str());
}
callCmd.push_back(session.c_str());
// callCmd.push_back(log_dir.c_str());
callCmd.push_back("--auto-save-interval=10");
callCmd.push_back(stop_with_pid.c_str());
callCmd.push_back("--allow-overwrite=true");
callCmd.push_back("--dht-entry-point=router.bittorrent.com:6881");
callCmd.push_back("--dht-entry-point6=router.bittorrent.com:6881");
callCmd.push_back("--quiet=true");
callCmd.push_back("--bt-enable-lpd=true");
callCmd.push_back("--always-resume=true");
callCmd.push_back("--max-concurrent-downloads=42");
callCmd.push_back("--rpc-max-request-size=6M");
callCmd.push_back("--file-allocation=none");
mp_aria = Subprocess::run(callCmd);
mp_curl = curl_easy_init();
curl_easy_setopt(mp_curl, CURLOPT_URL, "http://localhost/rpc");
curl_easy_setopt(mp_curl, CURLOPT_PORT, m_port);
curl_easy_setopt(mp_curl, CURLOPT_POST, 1L);
int watchdog = 50;
while(--watchdog) {
std::this_thread::sleep_for(std::chrono::microseconds(10000));
auto res = curl_easy_perform(mp_curl);
if (res == CURLE_OK) {
break;
}
}
if (!watchdog) {
curl_easy_cleanup(mp_curl);
throw std::runtime_error("Cannot connect to aria2c rpc");
}
}
Aria2::~Aria2()
{
curl_easy_cleanup(mp_curl);
}
void Aria2::close()
{
saveSession();
shutdown();
}
size_t write_callback_to_iss(char* ptr, size_t size, size_t nmemb, void* userdata)
{
auto str = static_cast<std::stringstream*>(userdata);
str->write(ptr, nmemb);
return nmemb;
}
std::string Aria2::doRequest(const MethodCall& methodCall)
{
pthread_mutex_lock(&m_lock);
auto requestContent = methodCall.toString();
std::stringstream stringstream;
CURLcode res;
curl_easy_setopt(mp_curl, CURLOPT_POSTFIELDSIZE, requestContent.size());
curl_easy_setopt(mp_curl, CURLOPT_POSTFIELDS, requestContent.c_str());
curl_easy_setopt(mp_curl, CURLOPT_WRITEFUNCTION, &write_callback_to_iss);
curl_easy_setopt(mp_curl, CURLOPT_WRITEDATA, &stringstream);
res = curl_easy_perform(mp_curl);
if (res == CURLE_OK) {
long response_code;
curl_easy_getinfo(mp_curl, CURLINFO_RESPONSE_CODE, &response_code);
pthread_mutex_unlock(&m_lock);
if (response_code != 200) {
throw std::runtime_error("Invalid return code from aria");
}
auto responseContent = stringstream.str();
MethodResponse response(responseContent);
if (response.isFault()) {
throw AriaError(response.getFault().getFaultString());
}
return responseContent;
}
pthread_mutex_unlock(&m_lock);
throw std::runtime_error("Cannot perform request");
}
std::string Aria2::addUri(const std::vector<std::string>& uris)
{
MethodCall methodCall("aria2.addUri", m_secret);
auto uriParams = methodCall.newParamValue().getArray();
for (auto& uri : uris) {
uriParams.addValue().set(uri);
}
auto ret = doRequest(methodCall);
MethodResponse response(ret);
return response.getParamValue(0).getAsS();
}
std::string Aria2::tellStatus(const std::string& gid, const std::vector<std::string>& statusKey)
{
MethodCall methodCall("aria2.tellStatus", m_secret);
methodCall.newParamValue().set(gid);
if (!statusKey.empty()) {
auto statusArray = methodCall.newParamValue().getArray();
for (auto& key : statusKey) {
statusArray.addValue().set(key);
}
}
return doRequest(methodCall);
}
std::vector<std::string> Aria2::tellActive()
{
MethodCall methodCall("aria2.tellActive", m_secret);
auto statusArray = methodCall.newParamValue().getArray();
statusArray.addValue().set(std::string("gid"));
statusArray.addValue().set(std::string("following"));
auto responseContent = doRequest(methodCall);
MethodResponse response(responseContent);
std::vector<std::string> activeGID;
int index = 0;
while(true) {
try {
auto structNode = response.getParamValue(0).getArray().getValue(index++).getStruct();
auto gidNode = structNode.getMember("gid");
activeGID.push_back(gidNode.getValue().getAsS());
} catch (InvalidRPCNode& e) { break; }
}
return activeGID;
}
void Aria2::saveSession()
{
MethodCall methodCall("aria2.saveSession", m_secret);
doRequest(methodCall);
std::cout << "session saved" << std::endl;
}
void Aria2::shutdown()
{
MethodCall methodCall("aria2.shutdown", m_secret);
doRequest(methodCall);
}
} // end namespace kiwix

46
src/aria2.h Normal file
View File

@@ -0,0 +1,46 @@
#ifndef KIWIXLIB_ARIA2_H_
#define KIWIXLIB_ARIA2_H_
#ifdef _WIN32
// winsock2.h need to be included before windows.h (included by curl.h)
# include <winsock2.h>
#endif
#include "subprocess.h"
#include "xmlrpc.h"
#include <memory>
#include <curl/curl.h>
#include <pthread.h>
namespace kiwix {
class Aria2
{
private:
std::unique_ptr<Subprocess> mp_aria;
int m_port;
std::string m_secret;
std::string m_downloadDir;
CURL* mp_curl;
pthread_mutex_t m_lock;
std::string doRequest(const MethodCall& methodCall);
public:
Aria2();
virtual ~Aria2();
void close();
std::string addUri(const std::vector<std::string>& uri);
std::string tellStatus(const std::string& gid, const std::vector<std::string>& statusKey);
std::vector<std::string> tellActive();
void saveSession();
void shutdown();
};
}; //end namespace kiwix
#endif // KIWIXLIB_ARIA2_H_

195
src/book.cpp Normal file
View File

@@ -0,0 +1,195 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "book.h"
#include "reader.h"
#include "tools/base64.h"
#include "tools/regexTools.h"
#include "tools/networkTools.h"
#include <pugixml.hpp>
namespace kiwix
{
/* Constructor */
Book::Book() : m_readOnly(false)
{
}
/* Destructor */
Book::~Book()
{
}
bool Book::update(const kiwix::Book& other)
{
if (m_readOnly)
return false;
m_readOnly = other.m_readOnly;
if (m_path.empty()) {
m_path = other.m_path;
}
if (m_url.empty()) {
m_url = other.m_url;
}
if (m_tags.empty()) {
m_tags = other.m_tags;
}
if (m_name.empty()) {
m_name = other.m_name;
}
if (m_faviconMimeType.empty()) {
m_favicon = other.m_favicon;
m_faviconMimeType = other.m_faviconMimeType;
}
return true;
}
void Book::update(const kiwix::Reader& reader)
{
m_path = reader.getZimFilePath();
m_id = reader.getId();
m_description = reader.getDescription();
m_language = reader.getLanguage();
m_date = reader.getDate();
m_creator = reader.getCreator();
m_publisher = reader.getPublisher();
m_title = reader.getTitle();
m_name = reader.getName();
m_tags = reader.getTags();
m_origId = reader.getOrigId();
m_articleCount = reader.getArticleCount();
m_mediaCount = reader.getMediaCount();
m_size = static_cast<uint64_t>(reader.getFileSize()) << 10;
reader.getFavicon(m_favicon, m_faviconMimeType);
}
#define ATTR(name) node.attribute(name).value()
void Book::updateFromXml(const pugi::xml_node& node, const std::string& baseDir)
{
m_id = ATTR("id");
std::string path = ATTR("path");
if (isRelativePath(path)) {
path = computeAbsolutePath(baseDir, path);
}
m_path = path;
m_title = ATTR("title");
m_name = ATTR("name");
m_tags = ATTR("tags");
m_description = ATTR("description");
m_language = ATTR("language");
m_date = ATTR("date");
m_creator = ATTR("creator");
m_publisher = ATTR("publisher");
m_url = ATTR("url");
m_origId = ATTR("origId");
m_articleCount = strtoull(ATTR("articleCount"), 0, 0);
m_mediaCount = strtoull(ATTR("mediaCount"), 0, 0);
m_size = strtoull(ATTR("size"), 0, 0) << 10;
m_favicon = base64_decode(ATTR("favicon"));
m_faviconMimeType = ATTR("faviconMimeType");
try {
m_downloadId = ATTR("downloadId");
} catch(...) {}
}
#undef ATTR
static std::string fromOpdsDate(const std::string& date)
{
//The opds date use the standard <YYYY>-<MM>-<DD>T<HH>:<mm>:<SS>Z
//and we want <YYYY>-<MM>-<DD>. That's easy, let's take the first 10 char
return date.substr(0, 10);
}
#define VALUE(name) node.child(name).child_value()
void Book::updateFromOpds(const pugi::xml_node& node, const std::string& urlHost)
{
m_id = VALUE("id");
if (!m_id.compare(0, 9, "urn:uuid:")) {
m_id.erase(0, 9);
}
m_title = VALUE("title");
m_description = VALUE("description");
m_language = VALUE("language");
m_date = fromOpdsDate(VALUE("updated"));
m_creator = node.child("author").child("name").child_value();
for(auto linkNode = node.child("link"); linkNode;
linkNode = linkNode.next_sibling("link")) {
std::string rel = linkNode.attribute("rel").value();
if (rel == "http://opds-spec.org/acquisition/open-access") {
m_url = linkNode.attribute("href").value();
m_size = strtoull(linkNode.attribute("length").value(), 0, 0);
}
if (rel == "http://opds-spec.org/image/thumbnail") {
m_faviconUrl = urlHost + linkNode.attribute("href").value();
m_faviconMimeType = linkNode.attribute("type").value();
}
}
}
#undef VALUE
std::string Book::getHumanReadableIdFromPath()
{
std::string id = m_path;
if (!id.empty()) {
kiwix::removeAccents(id);
#ifdef _WIN32
id = replaceRegex(id, "", "^.*\\\\");
#else
id = replaceRegex(id, "", "^.*/");
#endif
id = replaceRegex(id, "", "\\.zim[a-z]*$");
id = replaceRegex(id, "_", " ");
id = replaceRegex(id, "plus", "\\+");
}
return id;
}
void Book::setPath(const std::string& path)
{
m_path = isRelativePath(path)
? computeAbsolutePath(getCurrentDirectory(), path)
: path;
}
const std::string& Book::getFavicon() const {
if (m_favicon.empty() && !m_faviconUrl.empty()) {
try {
m_favicon = download(m_faviconUrl);
} catch(...) {
std::cerr << "Cannot download favicon from " << m_faviconUrl;
}
}
return m_favicon;
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2014 Emmanuel Engelhart <kelson@kiwix.org>
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -17,13 +17,31 @@
* MA 02110-1301, USA.
*/
#include <common/otherTools.h>
#include "bookmark.h"
void kiwix::sleep(unsigned int milliseconds)
#include <pugixml.hpp>
namespace kiwix
{
/* Constructor */
Bookmark::Bookmark()
{
#ifdef _WIN32
Sleep(milliseconds);
#else
usleep(1000 * milliseconds);
#endif
}
/* Destructor */
Bookmark::~Bookmark()
{
}
void Bookmark::updateFromXml(const pugi::xml_node& node)
{
auto bookNode = node.child("book");
m_bookId = bookNode.child("id").child_value();
m_bookTitle = bookNode.child("title").child_value();
m_language = bookNode.child("language").child_value();
m_date = bookNode.child("date").child_value();
m_title = node.child("title").child_value();
m_url = node.child("url").child_value();
}
}

View File

@@ -1,6 +1,3 @@
#mesondefine VERSION
#mesondefine ENABLE_CTPP2
#mesondefine ENABLE_LIBARIA2

View File

@@ -1,210 +0,0 @@
/*
* Copyright 2013 Renaud Gaudin <reg@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <ctpp2/CTPP2VMStringLoader.hpp>
namespace CTPP // C++ Template Engine
{
//
// Convert byte order
//
static void ConvertExecutable(VMExecutable * oCore)
{
// Code entry point
oCore -> entry_point = Swap32(oCore -> entry_point);
// Offset of code segment
oCore -> code_offset = Swap32(oCore -> code_offset);
// Code segment size
oCore -> code_size = Swap32(oCore -> code_size);
// Offset of static text segment
oCore -> syscalls_offset = Swap32(oCore -> syscalls_offset);
// Static text segment size
oCore -> syscalls_data_size = Swap32(oCore -> syscalls_data_size);
// Offset of static text index segment
oCore -> syscalls_index_offset = Swap32(oCore -> syscalls_index_offset);
// Static text index segment size
oCore -> syscalls_index_size = Swap32(oCore -> syscalls_index_size);
// Offset of static data segment
oCore -> static_data_offset = Swap32(oCore -> static_data_offset);
// Static data segment size
oCore -> static_data_data_size = Swap32(oCore -> static_data_data_size);
// Offset of static text segment
oCore -> static_text_offset = Swap32(oCore -> static_text_offset);
// Static text segment size
oCore -> static_text_data_size = Swap32(oCore -> static_text_data_size);
// Offset of static text index segment
oCore -> static_text_index_offset = Swap32(oCore -> static_text_index_offset);
// Static text index segment size
oCore -> static_text_index_size = Swap32(oCore -> static_text_index_size);
// Version 2.2+
// Offset of static data bit index
oCore -> static_data_bit_index_offset = Swap32(oCore -> static_data_bit_index_offset);
/// Offset of static data bit index
oCore -> static_data_bit_index_size = Swap32(oCore -> static_data_bit_index_size);
// Platform
oCore -> platform = Swap64(oCore -> platform);
// Ugly-jolly hack!
// ... dereferencing type-punned pointer will break strict-aliasing rules ...
UINT_64 iTMP;
memcpy(&iTMP, &(oCore -> ieee754double), sizeof(UINT_64));
iTMP = Swap64(iTMP);
memcpy(&(oCore -> ieee754double), &iTMP, sizeof(UINT_64));
// Cyclic Redundancy Check
oCore -> crc = 0;
// Convert data structures
// Convert code segment
VMInstruction * pInstructions = const_cast<VMInstruction *>(VMExecutable::GetCodeSeg(oCore));
UINT_32 iI = 0;
UINT_32 iSteps = oCore -> code_size / sizeof(VMInstruction);
for(iI = 0; iI < iSteps; ++iI)
{
pInstructions -> instruction = Swap32(pInstructions -> instruction);
pInstructions -> argument = Swap32(pInstructions -> argument);
pInstructions -> reserved = Swap64(pInstructions -> reserved);
++pInstructions;
}
// Convert syscalls index
TextDataIndex * pTextIndex = const_cast<TextDataIndex *>(VMExecutable::GetSyscallsIndexSeg(oCore));
iSteps = oCore -> syscalls_index_size / sizeof(TextDataIndex);
for(iI = 0; iI < iSteps; ++iI)
{
pTextIndex -> offset = Swap32(pTextIndex -> offset);
pTextIndex -> length = Swap32(pTextIndex -> length);
++pTextIndex;
}
// Convert static text index
pTextIndex = const_cast<TextDataIndex *>(VMExecutable::GetStaticTextIndexSeg(oCore));
iSteps = oCore -> static_text_index_size / sizeof(TextDataIndex);
for(iI = 0; iI < iSteps; ++iI)
{
pTextIndex -> offset = Swap32(pTextIndex -> offset);
pTextIndex -> length = Swap32(pTextIndex -> length);
++pTextIndex;
}
// Convert static data
StaticDataVar * pStaticDataVar = const_cast<StaticDataVar *>(VMExecutable::GetStaticDataSeg(oCore));
iSteps = oCore -> static_data_data_size / sizeof(StaticDataVar);
for(iI = 0; iI < iSteps; ++iI)
{
(*pStaticDataVar).i_data = Swap64((*pStaticDataVar).i_data);
++pStaticDataVar;
}
}
//
// Constructor
//
VMStringLoader::VMStringLoader(CCHAR_P rawContent, size_t rawContentSize)
{
oCore = (VMExecutable *)malloc(rawContentSize + 1);
memcpy(oCore, rawContent, rawContentSize);
if (oCore -> magic[0] == 'C' &&
oCore -> magic[1] == 'T' &&
oCore -> magic[2] == 'P' &&
oCore -> magic[3] == 'P')
{
// Check version
if (oCore -> version[0] >= 1)
{
// Platform-dependent data (byte order)
if (oCore -> platform == 0x4142434445464748ull)
{
#ifdef _DEBUG
fprintf(stderr, "Big/Little Endian conversion: Nothing to do\n");
#endif
// Nothing to do, only check crc
UINT_32 iCRC = oCore -> crc;
oCore -> crc = 0;
// Calculate CRC of file
// KELSON: next line used to refer to oStat.st_size
// changed it to rawContentSize
if (iCRC != crc32((UCCHAR_P)oCore, rawContentSize))
{
free(oCore);
throw CTPPLogicError("CRC checksum invalid");
}
}
// Platform-dependent data (byte order)
else if (oCore -> platform == 0x4847464544434241ull)
{
// Need to reconvert data
#ifdef _DEBUG
fprintf(stderr, "Big/Little Endian conversion: Need to reconvert core\n");
#endif
ConvertExecutable(oCore);
}
else
{
free(oCore);
throw CTPPLogicError("Conversion of middle-end architecture does not supported.");
}
// Check IEEE 754 format
if (oCore -> ieee754double != 15839800103804824402926068484019465486336.0)
{
free(oCore);
throw CTPPLogicError("IEEE 754 format is broken, cannot convert file");
}
}
pVMMemoryCore = new VMMemoryCore(oCore);
}
else
{
free(oCore);
throw CTPPLogicError("Not an CTPP bytecode file.");
}
}
//
// Get ready-to-run program
//
const VMMemoryCore * VMStringLoader::GetCore() const { return pVMMemoryCore; }
//
// A destructor
//
VMStringLoader::~VMStringLoader() throw()
{
delete pVMMemoryCore;
free(oCore);
}
} // namespace CTPP
// End.

View File

@@ -18,104 +18,137 @@
*/
#include "downloader.h"
#include "common/pathTools.h"
#include "tools/pathTools.h"
#include <algorithm>
#include <thread>
#include <chrono>
#ifndef _WIN32
# include <unistd.h>
#endif
#include <iostream>
#include "aria2.h"
#include "xmlrpc.h"
#include "tools/otherTools.h"
#include <pugixml.hpp>
namespace kiwix
{
pthread_mutex_t Downloader::globalLock = PTHREAD_MUTEX_INITIALIZER;
void Download::updateStatus(bool follow)
{
static std::vector<std::string> statusKey = {"status", "files", "totalLength",
"completedLength", "followedBy",
"downloadSpeed", "verifiedLength"};
std::string strStatus;
if(follow && !m_followedBy.empty()) {
strStatus = mp_aria->tellStatus(m_followedBy, statusKey);
} else {
strStatus = mp_aria->tellStatus(m_did, statusKey);
}
// std::cout << strStatus << std::endl;
MethodResponse response(strStatus);
if (response.isFault()) {
m_status = Download::K_UNKNOWN;
return;
}
auto structNode = response.getParams().getParam(0).getValue().getStruct();
auto _status = structNode.getMember("status").getValue().getAsS();
auto status = _status == "active" ? Download::K_ACTIVE
: _status == "waiting" ? Download::K_WAITING
: _status == "paused" ? Download::K_PAUSED
: _status == "error" ? Download::K_ERROR
: _status == "complete" ? Download::K_COMPLETE
: _status == "removed" ? Download::K_REMOVED
: Download::K_UNKNOWN;
if (status == K_COMPLETE) {
try {
auto followedByMember = structNode.getMember("followedBy");
m_followedBy = followedByMember.getValue().getArray().getValue(0).getAsS();
if (follow) {
status = K_ACTIVE;
updateStatus(true);
return;
}
} catch (InvalidRPCNode& e) { }
}
m_status = status;
m_totalLength = extractFromString<uint64_t>(structNode.getMember("totalLength").getValue().getAsS());
m_completedLength = extractFromString<uint64_t>(structNode.getMember("completedLength").getValue().getAsS());
m_downloadSpeed = extractFromString<uint64_t>(structNode.getMember("downloadSpeed").getValue().getAsS());
try {
auto verifiedLengthValue = structNode.getMember("verifiedLength").getValue();
m_verifiedLength = extractFromString<uint64_t>(verifiedLengthValue.getAsS());
} catch (InvalidRPCNode& e) { m_verifiedLength = 0; }
auto filesMember = structNode.getMember("files");
auto fileStruct = filesMember.getValue().getArray().getValue(0).getStruct();
m_path = fileStruct.getMember("path").getValue().getAsS();
auto urisArray = fileStruct.getMember("uris").getValue().getArray();
int index = 0;
m_uris.clear();
while(true) {
try {
auto uriNode = urisArray.getValue(index++).getStruct().getMember("uri");
m_uris.push_back(uriNode.getValue().getAsS());
} catch(InvalidRPCNode& e) { break; }
}
}
/* Constructor */
Downloader::Downloader()
Downloader::Downloader() :
mp_aria(new Aria2())
{
#ifdef ENABLE_LIBARIA2
aria2::SessionConfig config;
config.downloadEventCallback = Downloader::downloadEventCallback;
config.userData = this;
tmpDir = makeTmpDirectory();
aria2::KeyVals options;
options.push_back(std::pair<std::string, std::string>("dir", tmpDir));
session = aria2::sessionNew(options, config);
#endif
for (auto gid : mp_aria->tellActive()) {
m_knownDownloads[gid] = std::unique_ptr<Download>(new Download(mp_aria, gid));
m_knownDownloads[gid]->updateStatus();
}
}
/* Destructor */
Downloader::~Downloader()
{
#ifdef ENABLE_LIBARIA2
aria2::sessionFinal(session);
#endif
rmdir(tmpDir.c_str());
}
#ifdef ENABLE_LIBARIA2
int Downloader::downloadEventCallback(aria2::Session* session,
aria2::DownloadEvent event,
aria2::A2Gid gid,
void* userData)
void Downloader::close()
{
Downloader* downloader = static_cast<Downloader*>(userData);
auto fileHandle = downloader->fileHandle;
auto dh = aria2::getDownloadHandle(session, gid);
if (!dh) {
return 0;
}
switch (event) {
case aria2::EVENT_ON_DOWNLOAD_COMPLETE:
{
if (dh->getNumFiles() > 0) {
auto f = dh->getFile(1);
fileHandle->path = f.path;
fileHandle->success = true;
}
}
break;
case aria2::EVENT_ON_DOWNLOAD_ERROR:
{
fileHandle->success = false;
}
break;
default:
break;
}
aria2::deleteDownloadHandle(dh);
return 0;
mp_aria->close();
}
#endif
DownloadedFile Downloader::download(const std::string& url) {
pthread_mutex_lock(&globalLock);
DownloadedFile fileHandle;
#ifdef ENABLE_LIBARIA2
std::vector<std::string> Downloader::getDownloadIds() {
std::vector<std::string> ret;
for(auto& p:m_knownDownloads) {
ret.push_back(p.first);
}
return ret;
}
Download* Downloader::startDownload(const std::string& uri)
{
for (auto& p: m_knownDownloads) {
auto& d = p.second;
auto& uris = d->getUris();
if (std::find(uris.begin(), uris.end(), uri) != uris.end())
return d.get();
}
std::vector<std::string> uris = {uri};
auto gid = mp_aria->addUri(uris);
m_knownDownloads[gid] = std::unique_ptr<Download>(new Download(mp_aria, gid));
return m_knownDownloads[gid].get();
}
Download* Downloader::getDownload(const std::string& did)
{
try {
std::vector<std::string> uris = {url};
aria2::KeyVals options;
aria2::A2Gid gid;
int ret;
DownloadedFile fileHandle;
ret = aria2::addUri(session, &gid, uris, options);
if (ret < 0) {
std::cerr << "Failed to download" << std::endl;
} else {
this->fileHandle = &fileHandle;
aria2::run(session, aria2::RUN_DEFAULT);
return m_knownDownloads.at(did).get();
} catch(exception& e) {
for (auto gid : mp_aria->tellActive()) {
if (gid == did) {
m_knownDownloads[gid] = std::unique_ptr<Download>(new Download(mp_aria, gid));
return m_knownDownloads[gid].get();
}
}
} catch (...) {};
this->fileHandle = nullptr;
pthread_mutex_unlock(&globalLock);
#endif
return fileHandle;
throw e;
}
}
}

View File

@@ -18,137 +18,291 @@
*/
#include "library.h"
#include "book.h"
#include "libxml_dumper.h"
#include "tools/base64.h"
#include "tools/regexTools.h"
#include "tools/pathTools.h"
#include <pugixml.hpp>
#include <algorithm>
namespace kiwix
{
/* Constructor */
Book::Book() : readOnly(false)
{
}
/* Destructor */
Book::~Book()
{
}
/* Sort functions */
bool Book::sortByLastOpen(const kiwix::Book& a, const kiwix::Book& b)
{
return atoi(a.last.c_str()) > atoi(b.last.c_str());
}
bool Book::sortByTitle(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.title.c_str(), b.title.c_str()) < 0;
}
bool Book::sortByDate(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.date.c_str(), b.date.c_str()) > 0;
}
bool Book::sortBySize(const kiwix::Book& a, const kiwix::Book& b)
{
return atoi(a.size.c_str()) < atoi(b.size.c_str());
}
bool Book::sortByPublisher(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.publisher.c_str(), b.publisher.c_str()) < 0;
}
bool Book::sortByCreator(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.creator.c_str(), b.creator.c_str()) < 0;
}
bool Book::sortByLanguage(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.language.c_str(), b.language.c_str()) < 0;
}
std::string Book::getHumanReadableIdFromPath()
{
std::string id = pathAbsolute;
if (!id.empty()) {
kiwix::removeAccents(id);
#ifdef _WIN32
id = replaceRegex(id, "", "^.*\\\\");
#else
id = replaceRegex(id, "", "^.*/");
#endif
id = replaceRegex(id, "", "\\.zim[a-z]*$");
id = replaceRegex(id, "_", " ");
id = replaceRegex(id, "plus", "\\+");
}
return id;
}
/* Constructor */
Library::Library() : version(KIWIX_LIBRARY_VERSION)
Library::Library()
{
}
/* Destructor */
Library::~Library()
{
}
bool Library::addBook(const Book& book)
{
/* Try to find it */
std::vector<kiwix::Book>::iterator itr;
for (itr = this->books.begin(); itr != this->books.end(); ++itr) {
if (itr->id == book.id) {
if (!itr->readOnly) {
itr->readOnly = book.readOnly;
try {
auto& oldbook = m_books.at(book.getId());
oldbook.update(book);
return false;
} catch (std::out_of_range&) {
m_books[book.getId()] = book;
return true;
}
}
if (itr->path.empty()) {
itr->path = book.path;
}
void Library::addBookmark(const Bookmark& bookmark)
{
m_bookmarks.push_back(bookmark);
}
if (itr->pathAbsolute.empty()) {
itr->pathAbsolute = book.pathAbsolute;
}
bool Library::removeBookmark(const std::string& zimId, const std::string& url)
{
for(auto it=m_bookmarks.begin(); it!=m_bookmarks.end(); it++) {
if (it->getBookId() == zimId && it->getUrl() == url) {
m_bookmarks.erase(it);
return true;
}
}
return false;
}
if (itr->url.empty()) {
itr->url = book.url;
}
if (itr->tags.empty()) {
itr->tags = book.tags;
}
if (itr->name.empty()) {
itr->name = book.name;
}
bool Library::removeBookById(const std::string& id)
{
return m_books.erase(id) == 1;
}
if (itr->indexPath.empty()) {
itr->indexPath = book.indexPath;
itr->indexType = book.indexType;
}
Book& Library::getBookById(const std::string& id)
{
return m_books.at(id);
}
if (itr->indexPathAbsolute.empty()) {
itr->indexPathAbsolute = book.indexPathAbsolute;
itr->indexType = book.indexType;
}
unsigned int Library::getBookCount(const bool localBooks,
const bool remoteBooks)
{
unsigned int result = 0;
for (auto& pair: m_books) {
auto& book = pair.second;
if ((!book.getPath().empty() && localBooks)
|| (book.getPath().empty() && remoteBooks)) {
result++;
}
}
return result;
}
if (itr->faviconMimeType.empty()) {
itr->favicon = book.favicon;
itr->faviconMimeType = book.faviconMimeType;
}
bool Library::writeToFile(const std::string& path) {
auto baseDir = removeLastPathElement(path, true, false);
LibXMLDumper dumper(this);
dumper.setBaseDir(baseDir);
return writeTextFile(path, dumper.dumpLibXMLContent(getBooksIds()));
}
bool Library::writeBookmarksToFile(const std::string& path) {
LibXMLDumper dumper(this);
return writeTextFile(path, dumper.dumpLibXMLBookmark());
}
std::vector<std::string> Library::getBooksLanguages()
{
std::vector<std::string> booksLanguages;
std::map<std::string, bool> booksLanguagesMap;
for (auto& pair: m_books) {
auto& book = pair.second;
auto& language = book.getLanguage();
if (booksLanguagesMap.find(language) == booksLanguagesMap.end()) {
if (book.getOrigId().empty()) {
booksLanguagesMap[language] = true;
booksLanguages.push_back(language);
}
return false;
}
}
/* otherwise */
this->books.push_back(book);
return true;
return booksLanguages;
}
bool Library::removeBookByIndex(const unsigned int bookIndex)
std::vector<std::string> Library::getBooksCreators()
{
books.erase(books.begin() + bookIndex);
return true;
std::vector<std::string> booksCreators;
std::map<std::string, bool> booksCreatorsMap;
for (auto& pair: m_books) {
auto& book = pair.second;
auto& creator = book.getCreator();
if (booksCreatorsMap.find(creator) == booksCreatorsMap.end()) {
if (book.getOrigId().empty()) {
booksCreatorsMap[creator] = true;
booksCreators.push_back(creator);
}
}
}
return booksCreators;
}
std::vector<std::string> Library::getBooksPublishers()
{
std::vector<std::string> booksPublishers;
std::map<std::string, bool> booksPublishersMap;
for (auto& pair:m_books) {
auto& book = pair.second;
auto& publisher = book.getPublisher();
if (booksPublishersMap.find(publisher) == booksPublishersMap.end()) {
if (book.getOrigId().empty()) {
booksPublishersMap[publisher] = true;
booksPublishers.push_back(publisher);
}
}
}
return booksPublishers;
}
std::vector<std::string> Library::getBooksIds()
{
std::vector<std::string> bookIds;
for (auto& pair: m_books) {
bookIds.push_back(pair.first);
}
return bookIds;
}
std::vector<std::string> Library::filter(const std::string& search)
{
if (search.empty()) {
return getBooksIds();
}
std::vector<std::string> bookIds;
for(auto& pair:m_books) {
auto& book = pair.second;
if (matchRegex(book.getTitle(), "\\Q" + search + "\\E")
|| matchRegex(book.getDescription(), "\\Q" + search + "\\E")) {
bookIds.push_back(pair.first);
}
}
return bookIds;
}
template<supportedListSortBy sort>
struct Comparator {
Library* lib;
Comparator(Library* lib) : lib(lib) {}
bool operator() (const std::string& id1, const std::string& id2) {
return get_keys(id1) < get_keys(id2);
}
std::string get_keys(const std::string& id);
unsigned int get_keyi(const std::string& id);
};
template<>
std::string Comparator<TITLE>::get_keys(const std::string& id)
{
return lib->getBookById(id).getTitle();
}
template<>
unsigned int Comparator<SIZE>::get_keyi(const std::string& id)
{
return lib->getBookById(id).getSize();
}
template<>
bool Comparator<SIZE>::operator() (const std::string& id1, const std::string& id2)
{
return get_keyi(id1) < get_keyi(id2);
}
template<>
std::string Comparator<DATE>::get_keys(const std::string& id)
{
return lib->getBookById(id).getDate();
}
template<>
std::string Comparator<CREATOR>::get_keys(const std::string& id)
{
return lib->getBookById(id).getCreator();
}
template<>
std::string Comparator<PUBLISHER>::get_keys(const std::string& id)
{
return lib->getBookById(id).getPublisher();
}
std::vector<std::string> Library::listBooksIds(
int mode,
supportedListSortBy sortBy,
const std::string& search,
const std::string& language,
const std::string& creator,
const std::string& publisher,
size_t maxSize) {
std::vector<std::string> bookIds;
for(auto& pair:m_books) {
auto& book = pair.second;
auto local = !book.getPath().empty();
if (mode & LOCAL && !local)
continue;
if (mode & NOLOCAL && local)
continue;
auto valid = book.isPathValid();
if (mode & VALID && !valid)
continue;
if (mode & NOVALID && valid)
continue;
auto remote = !book.getUrl().empty();
if (mode & REMOTE && !remote)
continue;
if (mode & NOREMOTE && remote)
continue;
if (maxSize != 0 && book.getSize() > maxSize)
continue;
if (!language.empty() && book.getLanguage() != language)
continue;
if (!publisher.empty() && book.getPublisher() != publisher)
continue;
if (!creator.empty() && book.getCreator() != creator)
continue;
if (!search.empty() && !(matchRegex(book.getTitle(), "\\Q" + search + "\\E")
|| matchRegex(book.getDescription(), "\\Q" + search + "\\E")))
continue;
bookIds.push_back(pair.first);
}
switch(sortBy) {
case TITLE:
std::sort(bookIds.begin(), bookIds.end(), Comparator<TITLE>(this));
break;
case SIZE:
std::sort(bookIds.begin(), bookIds.end(), Comparator<SIZE>(this));
break;
case DATE:
std::sort(bookIds.begin(), bookIds.end(), Comparator<DATE>(this));
break;
case CREATOR:
std::sort(bookIds.begin(), bookIds.end(), Comparator<CREATOR>(this));
break;
case PUBLISHER:
std::sort(bookIds.begin(), bookIds.end(), Comparator<PUBLISHER>(this));
break;
default:
break;
}
return bookIds;
}
}

139
src/libxml_dumper.cpp Normal file
View File

@@ -0,0 +1,139 @@
/*
* Copyright 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "libxml_dumper.h"
#include "book.h"
#include <tools/base64.h>
#include <tools/stringTools.h>
#include <tools/otherTools.h>
namespace kiwix
{
/* Constructor */
LibXMLDumper::LibXMLDumper(Library* library)
: library(library)
{
}
/* Destructor */
LibXMLDumper::~LibXMLDumper()
{
}
#define ADD_ATTRIBUTE(node, name, value) { (node).append_attribute((name)) = (value).c_str(); }
#define ADD_ATTR_NOT_EMPTY(node, name, value) { if (!(value).empty()) ADD_ATTRIBUTE(node, name, value); }
void LibXMLDumper::handleBook(Book book, pugi::xml_node root_node) {
if (book.readOnly())
return;
auto entry_node = root_node.append_child("book");
ADD_ATTRIBUTE(entry_node, "id", book.getId());
if (!book.getPath().empty()) {
ADD_ATTRIBUTE(entry_node, "path", computeRelativePath(baseDir, book.getPath()));
}
if (book.getOrigId().empty()) {
ADD_ATTR_NOT_EMPTY(entry_node, "title", book.getTitle());
ADD_ATTR_NOT_EMPTY(entry_node, "name", book.getName());
ADD_ATTR_NOT_EMPTY(entry_node, "tags", book.getTags());
ADD_ATTR_NOT_EMPTY(entry_node, "description", book.getDescription());
ADD_ATTR_NOT_EMPTY(entry_node, "language", book.getLanguage());
ADD_ATTR_NOT_EMPTY(entry_node, "creator", book.getCreator());
ADD_ATTR_NOT_EMPTY(entry_node, "publisher", book.getPublisher());
ADD_ATTR_NOT_EMPTY(entry_node, "faviconMimeType", book.getFaviconMimeType());
if (!book.getFavicon().empty())
ADD_ATTRIBUTE(entry_node, "favicon", base64_encode(book.getFavicon()));
} else {
ADD_ATTRIBUTE(entry_node, "origId", book.getOrigId());
}
ADD_ATTR_NOT_EMPTY(entry_node, "date", book.getDate());
ADD_ATTR_NOT_EMPTY(entry_node, "url", book.getUrl());
if (book.getArticleCount())
ADD_ATTRIBUTE(entry_node, "articleCount", to_string(book.getArticleCount()));
if (book.getMediaCount())
ADD_ATTRIBUTE(entry_node, "mediaCount", to_string(book.getMediaCount()));
if (book.getSize())
ADD_ATTRIBUTE(entry_node, "size", to_string(book.getSize()>>10));
ADD_ATTR_NOT_EMPTY(entry_node, "downloadId", book.getDownloadId());
}
#define ADD_TEXT_ENTRY(node, child, value) (node).append_child((child)).append_child(pugi::node_pcdata).set_value((value).c_str())
void LibXMLDumper::handleBookmark(Bookmark bookmark, pugi::xml_node root_node) {
auto entry_node = root_node.append_child("bookmark");
auto book_node = entry_node.append_child("book");
try {
auto book = library->getBookById(bookmark.getBookId());
ADD_TEXT_ENTRY(book_node, "id", book.getId());
ADD_TEXT_ENTRY(book_node, "title", book.getTitle());
ADD_TEXT_ENTRY(book_node, "language", book.getLanguage());
ADD_TEXT_ENTRY(book_node, "date", book.getDate());
} catch (...) {
ADD_TEXT_ENTRY(book_node, "id", bookmark.getBookId());
ADD_TEXT_ENTRY(book_node, "title", bookmark.getBookTitle());
ADD_TEXT_ENTRY(book_node, "language", bookmark.getLanguage());
ADD_TEXT_ENTRY(book_node, "date", bookmark.getDate());
}
ADD_TEXT_ENTRY(entry_node, "title", bookmark.getTitle());
ADD_TEXT_ENTRY(entry_node, "url", bookmark.getUrl());
}
std::string LibXMLDumper::dumpLibXMLContent(const std::vector<std::string>& bookIds)
{
pugi::xml_document doc;
/* Add the library node */
pugi::xml_node libraryNode = doc.append_child("library");
libraryNode.append_attribute("version") = KIWIX_LIBRARY_VERSION;
if (library) {
for (auto& bookId: bookIds) {
handleBook(library->getBookById(bookId), libraryNode);
}
}
return nodeToString(libraryNode);
}
std::string LibXMLDumper::dumpLibXMLBookmark()
{
pugi::xml_document doc;
/* Add the library node */
pugi::xml_node bookmarksNode = doc.append_child("bookmarks");
if (library) {
for (auto& bookmark: library->getBookmarks()) {
handleBookmark(bookmark, bookmarksNode);
}
}
return nodeToString(bookmarksNode);
}
}

View File

@@ -18,80 +18,65 @@
*/
#include "manager.h"
#include "downloader.h"
#include <pugixml.hpp>
namespace kiwix
{
/* Constructor */
Manager::Manager() : writableLibraryPath("")
Manager::Manager(LibraryManipulator* manipulator):
writableLibraryPath(""),
manipulator(manipulator),
mustDeleteManipulator(false)
{
}
Manager::Manager(Library* library) :
writableLibraryPath(""),
manipulator(new DefaultLibraryManipulator(library)),
mustDeleteManipulator(true)
{
}
/* Destructor */
Manager::~Manager()
{
if (mustDeleteManipulator) {
delete manipulator;
}
}
bool Manager::parseXmlDom(const pugi::xml_document& doc,
const bool readOnly,
const string libraryPath)
const std::string& libraryPath)
{
pugi::xml_node libraryNode = doc.child("library");
if (strlen(libraryNode.attribute("current").value()))
this->setCurrentBookId(libraryNode.attribute("current").value());
string libraryVersion = libraryNode.attribute("version").value();
std::string libraryVersion = libraryNode.attribute("version").value();
for (pugi::xml_node bookNode = libraryNode.child("book"); bookNode;
bookNode = bookNode.next_sibling("book")) {
bool ok = true;
kiwix::Book book;
book.readOnly = readOnly;
book.id = bookNode.attribute("id").value();
book.path = bookNode.attribute("path").value();
book.last = (std::string(bookNode.attribute("last").value()) != "undefined"
? bookNode.attribute("last").value()
: "");
book.indexPath = bookNode.attribute("indexPath").value();
book.indexType = XAPIAN;
book.title = bookNode.attribute("title").value();
book.name = bookNode.attribute("name").value();
book.tags = bookNode.attribute("tags").value();
book.description = bookNode.attribute("description").value();
book.language = bookNode.attribute("language").value();
book.date = bookNode.attribute("date").value();
book.creator = bookNode.attribute("creator").value();
book.publisher = bookNode.attribute("publisher").value();
book.url = bookNode.attribute("url").value();
book.origId = bookNode.attribute("origId").value();
book.articleCount = bookNode.attribute("articleCount").value();
book.mediaCount = bookNode.attribute("mediaCount").value();
book.size = bookNode.attribute("size").value();
book.favicon = bookNode.attribute("favicon").value();
book.faviconMimeType = bookNode.attribute("faviconMimeType").value();
/* Check absolute and relative paths */
this->checkAndCleanBookPaths(book, libraryPath);
book.setReadOnly(readOnly);
book.updateFromXml(bookNode,
removeLastPathElement(libraryPath, true, false));
/* Update the book properties with the new importer */
if (libraryVersion.empty()
|| atoi(libraryVersion.c_str()) <= atoi(KIWIX_LIBRARY_VERSION)) {
if (!book.path.empty()) {
ok = this->readBookFromPath(book.pathAbsolute);
if (!book.getPath().empty()) {
this->readBookFromPath(book.getPath(), &book);
}
}
if (ok) {
library.addBook(book);
}
manipulator->addBookToLibrary(book);
}
return true;
}
bool Manager::readXml(const string& xml,
bool Manager::readXml(const std::string& xml,
const bool readOnly,
const string libraryPath)
const std::string& libraryPath)
{
pugi::xml_document doc;
pugi::xml_parse_result result
@@ -110,40 +95,24 @@ bool Manager::parseOpdsDom(const pugi::xml_document& doc, const std::string& url
{
pugi::xml_node libraryNode = doc.child("feed");
try {
m_totalBooks = strtoull(libraryNode.child("totalResults").child_value(), 0, 0);
m_startIndex = strtoull(libraryNode.child("startIndex").child_value(), 0, 0);
m_itemsPerPage = strtoull(libraryNode.child("itemsPerPage").child_value(), 0, 0);
m_hasSearchResult = true;
} catch(...) {
m_hasSearchResult = false;
}
for (pugi::xml_node entryNode = libraryNode.child("entry"); entryNode;
entryNode = entryNode.next_sibling("entry")) {
kiwix::Book book;
book.readOnly = false;
book.id = entryNode.child("id").child_value();
book.title = entryNode.child("title").child_value();
book.description = entryNode.child("summary").child_value();
book.language = entryNode.child("language").child_value();
book.date = entryNode.child("updated").child_value();
book.creator = entryNode.child("author").child("name").child_value();
for(pugi::xml_node linkNode = entryNode.child("link"); linkNode;
linkNode = linkNode.next_sibling("link")) {
std::string rel = linkNode.attribute("rel").value();
if (rel == "http://opds-spec.org/image/thumbnail") {
auto faviconUrl = urlHost + linkNode.attribute("href").value();
auto downloader = Downloader();
auto fileHandle = downloader.download(faviconUrl);
if (fileHandle.success) {
auto content = getFileContent(fileHandle.path);
book.favicon = base64_encode((const unsigned char*)content.data(), content.size());
book.faviconMimeType = linkNode.attribute("type").value();
} else {
std::cerr << "Cannot get favicon content from " << faviconUrl << std::endl;
}
} else if (rel == "http://opds-spec.org/acquisition/open-access") {
book.url = linkNode.attribute("href").value();
}
}
book.setReadOnly(false);
book.updateFromOpds(entryNode, urlHost);
/* Update the book properties with the new importer */
library.addBook(book);
manipulator->addBookToLibrary(book);
}
return true;
@@ -151,7 +120,7 @@ bool Manager::parseOpdsDom(const pugi::xml_document& doc, const std::string& url
bool Manager::readOpds(const string& content, const std::string& urlHost)
bool Manager::readOpds(const std::string& content, const std::string& urlHost)
{
pugi::xml_document doc;
pugi::xml_parse_result result
@@ -165,13 +134,13 @@ bool Manager::readOpds(const string& content, const std::string& urlHost)
return false;
}
bool Manager::readFile(const string path, const bool readOnly)
bool Manager::readFile(const std::string& path, const bool readOnly)
{
return this->readFile(path, path, readOnly);
}
bool Manager::readFile(const string nativePath,
const string UTF8Path,
bool Manager::readFile(const std::string& nativePath,
const std::string& UTF8Path,
const bool readOnly)
{
bool retVal = true;
@@ -194,149 +163,31 @@ bool Manager::readFile(const string nativePath,
return retVal;
}
bool Manager::writeFile(const string path)
{
pugi::xml_document doc;
/* Add the library node */
pugi::xml_node libraryNode = doc.append_child("library");
if (!getCurrentBookId().empty()) {
libraryNode.append_attribute("current") = getCurrentBookId().c_str();
}
if (!library.version.empty())
libraryNode.append_attribute("version") = library.version.c_str();
/* Add each book */
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (!itr->readOnly) {
this->checkAndCleanBookPaths(*itr, path);
pugi::xml_node bookNode = libraryNode.append_child("book");
bookNode.append_attribute("id") = itr->id.c_str();
if (!itr->path.empty()) {
bookNode.append_attribute("path") = itr->path.c_str();
}
if (!itr->last.empty() && itr->last != "undefined") {
bookNode.append_attribute("last") = itr->last.c_str();
}
if (!itr->indexPath.empty())
bookNode.append_attribute("indexPath") = itr->indexPath.c_str();
if (!itr->indexPath.empty() || !itr->indexPathAbsolute.empty()) {
if (itr->indexType == XAPIAN) {
bookNode.append_attribute("indexType") = "xapian";
}
}
if (itr->origId.empty()) {
if (!itr->title.empty())
bookNode.append_attribute("title") = itr->title.c_str();
if (!itr->name.empty())
bookNode.append_attribute("name") = itr->name.c_str();
if (!itr->tags.empty())
bookNode.append_attribute("tags") = itr->tags.c_str();
if (!itr->description.empty())
bookNode.append_attribute("description") = itr->description.c_str();
if (!itr->language.empty())
bookNode.append_attribute("language") = itr->language.c_str();
if (!itr->creator.empty())
bookNode.append_attribute("creator") = itr->creator.c_str();
if (!itr->publisher.empty())
bookNode.append_attribute("publisher") = itr->publisher.c_str();
if (!itr->favicon.empty())
bookNode.append_attribute("favicon") = itr->favicon.c_str();
if (!itr->faviconMimeType.empty())
bookNode.append_attribute("faviconMimeType")
= itr->faviconMimeType.c_str();
}
if (!itr->date.empty()) {
bookNode.append_attribute("date") = itr->date.c_str();
}
if (!itr->url.empty()) {
bookNode.append_attribute("url") = itr->url.c_str();
}
if (!itr->origId.empty())
bookNode.append_attribute("origId") = itr->origId.c_str();
if (!itr->articleCount.empty())
bookNode.append_attribute("articleCount") = itr->articleCount.c_str();
if (!itr->mediaCount.empty())
bookNode.append_attribute("mediaCount") = itr->mediaCount.c_str();
if (!itr->size.empty()) {
bookNode.append_attribute("size") = itr->size.c_str();
}
}
}
/* saving file */
doc.save_file(path.c_str());
return true;
}
bool Manager::setCurrentBookId(const string id)
{
if (library.current.empty() || library.current.top() != id) {
if (id.empty() && !library.current.empty()) {
library.current.pop();
} else {
library.current.push(id);
}
}
return true;
}
string Manager::getCurrentBookId() const
{
return library.current.empty() ? "" : library.current.top();
}
/* Add a book to the library. Return empty string if failed, book id otherwise
*/
string Manager::addBookFromPathAndGetId(const string pathToOpen,
const string pathToSave,
const string url,
const bool checkMetaData)
std::string Manager::addBookFromPathAndGetId(const std::string& pathToOpen,
const std::string& pathToSave,
const std::string& url,
const bool checkMetaData)
{
kiwix::Book book;
if (this->readBookFromPath(pathToOpen, &book)) {
if (pathToSave != pathToOpen) {
book.path = pathToSave;
book.pathAbsolute
= isRelativePath(pathToSave)
book.setPath(isRelativePath(pathToSave)
? computeAbsolutePath(
removeLastPathElement(writableLibraryPath, true, false),
pathToSave)
: pathToSave;
: pathToSave);
}
if (!checkMetaData
|| (checkMetaData && !book.title.empty() && !book.language.empty()
&& !book.date.empty())) {
book.url = url;
library.addBook(book);
return book.id;
|| (checkMetaData && !book.getTitle().empty() && !book.getLanguage().empty()
&& !book.getDate().empty())) {
book.setUrl(url);
manipulator->addBookToLibrary(book);
return book.getId();
}
}
@@ -345,9 +196,9 @@ string Manager::addBookFromPathAndGetId(const string pathToOpen,
/* Wrapper over Manager::addBookFromPath which return a bool instead of a string
*/
bool Manager::addBookFromPath(const string pathToOpen,
const string pathToSave,
const string url,
bool Manager::addBookFromPath(const std::string& pathToOpen,
const std::string& pathToSave,
const std::string& url,
const bool checkMetaData)
{
return !(
@@ -355,380 +206,42 @@ bool Manager::addBookFromPath(const string pathToOpen,
.empty());
}
bool Manager::readBookFromPath(const string path, kiwix::Book* book)
bool Manager::readBookFromPath(const std::string& path, kiwix::Book* book)
{
try {
kiwix::Reader* reader = new kiwix::Reader(path);
if (book != NULL) {
book->path = path;
book->pathAbsolute = path;
book->id = reader->getId();
book->description = reader->getDescription();
book->language = reader->getLanguage();
book->date = reader->getDate();
book->creator = reader->getCreator();
book->publisher = reader->getPublisher();
book->title = reader->getTitle();
book->name = reader->getName();
book->tags = reader->getTags();
book->origId = reader->getOrigId();
std::ostringstream articleCountStream;
articleCountStream << reader->getArticleCount();
book->articleCount = articleCountStream.str();
std::ostringstream mediaCountStream;
mediaCountStream << reader->getMediaCount();
book->mediaCount = mediaCountStream.str();
ostringstream convert;
convert << reader->getFileSize();
book->size = convert.str();
string favicon;
string faviconMimeType;
if (reader->getFavicon(favicon, faviconMimeType)) {
book->favicon = base64_encode(
reinterpret_cast<const unsigned char*>(favicon.c_str()),
favicon.length());
book->faviconMimeType = faviconMimeType;
}
}
delete reader;
kiwix::Reader reader(path);
book->update(reader);
book->setPathValid(true);
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
std::cerr << "Invalid " << path << " : " << e.what() << std::endl;
book->setPathValid(false);
return false;
}
return true;
}
bool Manager::removeBookByIndex(const unsigned int bookIndex)
bool Manager::readBookmarkFile(const std::string& path)
{
return this->library.removeBookByIndex(bookIndex);
}
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load_file(path.c_str());
bool Manager::removeBookById(const string id)
{
unsigned int bookIndex = 0;
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (itr->id == id) {
return this->library.removeBookByIndex(bookIndex);
}
bookIndex++;
}
return false;
}
vector<string> Manager::getBooksLanguages()
{
std::vector<string> booksLanguages;
std::vector<kiwix::Book>::iterator itr;
std::map<string, bool> booksLanguagesMap;
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortByLanguage);
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (booksLanguagesMap.find(itr->language) == booksLanguagesMap.end()) {
if (itr->origId.empty()) {
booksLanguagesMap[itr->language] = true;
booksLanguages.push_back(itr->language);
}
}
}
return booksLanguages;
}
vector<string> Manager::getBooksCreators()
{
std::vector<string> booksCreators;
std::vector<kiwix::Book>::iterator itr;
std::map<string, bool> booksCreatorsMap;
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortByCreator);
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (booksCreatorsMap.find(itr->creator) == booksCreatorsMap.end()) {
if (itr->origId.empty()) {
booksCreatorsMap[itr->creator] = true;
booksCreators.push_back(itr->creator);
}
}
}
return booksCreators;
}
vector<string> Manager::getBooksIds()
{
std::vector<string> booksIds;
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
booksIds.push_back(itr->id);
}
return booksIds;
}
vector<string> Manager::getBooksPublishers()
{
std::vector<string> booksPublishers;
std::vector<kiwix::Book>::iterator itr;
std::map<string, bool> booksPublishersMap;
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortByPublisher);
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (booksPublishersMap.find(itr->publisher) == booksPublishersMap.end()) {
if (itr->origId.empty()) {
booksPublishersMap[itr->publisher] = true;
booksPublishers.push_back(itr->publisher);
}
}
}
return booksPublishers;
}
kiwix::Library Manager::cloneLibrary()
{
return this->library;
}
bool Manager::getCurrentBook(Book& book)
{
string currentBookId = getCurrentBookId();
if (currentBookId.empty()) {
if (!result) {
return false;
} else {
getBookById(currentBookId, book);
return true;
}
}
bool Manager::getBookById(const string id, Book& book)
{
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (itr->id == id) {
book = *itr;
return true;
}
}
return false;
}
bool Manager::updateBookLastOpenDateById(const string id)
{
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (itr->id == id) {
char unixdate[12];
sprintf(unixdate, "%d", (int)time(NULL));
itr->last = unixdate;
return true;
}
}
return false;
}
pugi::xml_node libraryNode = doc.child("bookmarks");
bool Manager::setBookIndex(const string id,
const string path,
const supportedIndexType type)
{
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (itr->id == id) {
itr->indexPath = path;
itr->indexPathAbsolute
= isRelativePath(path)
? computeAbsolutePath(
removeLastPathElement(writableLibraryPath, true, false),
path)
: path;
itr->indexType = type;
return true;
}
}
for (pugi::xml_node node = libraryNode.child("bookmark"); node;
node = node.next_sibling("bookmark")) {
kiwix::Bookmark bookmark;
return false;
}
bookmark.updateFromXml(node);
bool Manager::setBookPath(const string id, const string path)
{
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (itr->id == id) {
itr->path = path;
itr->pathAbsolute
= isRelativePath(path)
? computeAbsolutePath(
removeLastPathElement(writableLibraryPath, true, false),
path)
: path;
return true;
}
}
return false;
}
void Manager::removeBookPaths()
{
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
itr->path = "";
itr->pathAbsolute = "";
}
}
unsigned int Manager::getBookCount(const bool localBooks,
const bool remoteBooks)
{
unsigned int result = 0;
std::vector<kiwix::Book>::iterator itr;
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if ((!itr->path.empty() && localBooks)
|| (itr->path.empty() && remoteBooks)) {
result++;
}
}
return result;
}
bool Manager::listBooks(const supportedListMode mode,
const supportedListSortBy sortBy,
const unsigned int maxSize,
const string language,
const string creator,
const string publisher,
const string search)
{
this->bookIdList.clear();
std::vector<kiwix::Book>::iterator itr;
/* Sort */
if (sortBy == TITLE) {
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortByTitle);
} else if (sortBy == SIZE) {
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortBySize);
} else if (sortBy == DATE) {
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortByDate);
} else if (sortBy == CREATOR) {
std::sort(
library.books.begin(), library.books.end(), kiwix::Book::sortByCreator);
} else if (sortBy == PUBLISHER) {
std::sort(library.books.begin(),
library.books.end(),
kiwix::Book::sortByPublisher);
}
/* Special sort for LASTOPEN */
if (mode == LASTOPEN) {
std::sort(library.books.begin(),
library.books.end(),
kiwix::Book::sortByLastOpen);
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
if (!itr->last.empty()) {
this->bookIdList.push_back(itr->id);
}
}
} else {
/* Generate the list of book id */
for (itr = library.books.begin(); itr != library.books.end(); ++itr) {
bool ok = true;
if (mode == LOCAL && itr->path.empty()) {
ok = false;
}
if (ok == true && mode == REMOTE
&& (!itr->path.empty() || itr->url.empty())) {
ok = false;
}
if (ok == true && maxSize != 0
&& (unsigned int)atoi(itr->size.c_str()) > maxSize * 1024 * 1024) {
ok = false;
}
if (ok == true && !language.empty()
&& !matchRegex(itr->language, language)) {
ok = false;
}
if (ok == true && !creator.empty() && itr->creator != creator) {
ok = false;
}
if (ok == true && !publisher.empty() && itr->publisher != publisher) {
ok = false;
}
if ((ok == true && !search.empty())
&& !(matchRegex(itr->title, "\\Q" + search + "\\E")
|| matchRegex(itr->description, "\\Q" + search + "\\E")
|| matchRegex(itr->language, "\\Q" + search + "\\E"))) {
ok = false;
}
if (ok == true) {
this->bookIdList.push_back(itr->id);
}
}
manipulator->addBookmarkToLibrary(bookmark);
}
return true;
}
Library Manager::filter(const std::string& search) {
Library library;
if (search.empty()) {
return library;
}
for(auto book:this->library.books) {
if (matchRegex(book.title, "\\Q" + search + "\\E")
|| matchRegex(book.description, "\\Q" + search + "\\E")) {
library.addBook(book);
}
}
return library;
}
void Manager::checkAndCleanBookPaths(Book& book, const string& libraryPath)
{
if (!book.path.empty()) {
if (isRelativePath(book.path)) {
book.pathAbsolute = computeAbsolutePath(
removeLastPathElement(libraryPath, true, false), book.path);
} else {
book.pathAbsolute = book.path;
book.path = computeRelativePath(
removeLastPathElement(libraryPath, true, false), book.pathAbsolute);
}
}
if (!book.indexPath.empty()) {
if (isRelativePath(book.indexPath)) {
book.indexPathAbsolute = computeAbsolutePath(
removeLastPathElement(libraryPath, true, false), book.indexPath);
} else {
book.indexPathAbsolute = book.indexPath;
book.indexPath
= computeRelativePath(removeLastPathElement(libraryPath, true, false),
book.indexPathAbsolute);
}
}
}
}

View File

@@ -1,24 +1,29 @@
kiwix_sources = [
'book.cpp',
'bookmark.cpp',
'library.cpp',
'manager.cpp',
'libxml_dumper.cpp',
'opds_dumper.cpp',
'downloader.cpp',
'reader.cpp',
'entry.cpp',
'searcher.cpp',
'common/base64.cpp',
'common/pathTools.cpp',
'common/regexTools.cpp',
'common/stringTools.cpp',
'common/networkTools.cpp',
'common/otherTools.cpp',
'xapian/htmlparse.cc',
'xapian/myhtmlparse.cc'
'subprocess.cpp',
'aria2.cpp',
'tools/base64.cpp',
'tools/pathTools.cpp',
'tools/regexTools.cpp',
'tools/stringTools.cpp',
'tools/networkTools.cpp',
'tools/otherTools.cpp',
]
kiwix_sources += lib_resources
if xapian_dep.found()
kiwix_sources += ['xapianSearcher.cpp']
if host_machine.system() == 'windows'
kiwix_sources += 'subprocess_windows.cpp'
else
kiwix_sources += 'subprocess_unix.cpp'
endif
if get_option('android')
@@ -28,11 +33,6 @@ else
install_dir = get_option('libdir')
endif
if has_ctpp2_dep
kiwix_sources += ['ctpp2/CTPP2VMStringLoader.cpp']
endif
config_h = configure_file(output : 'kiwix_config.h',
configuration : conf,
input : 'config.h.in')

View File

@@ -18,11 +18,14 @@
*/
#include "opds_dumper.h"
#include "book.h"
#include <tools/otherTools.h>
namespace kiwix
{
/* Constructor */
OPDSDumper::OPDSDumper(Library library)
OPDSDumper::OPDSDumper(Library* library)
: library(library)
{
}
@@ -31,24 +34,6 @@ OPDSDumper::~OPDSDumper()
{
}
struct xml_string_writer: pugi::xml_writer
{
std::string result;
virtual void write(const void* data, size_t size)
{
result.append(static_cast<const char*>(data), size);
}
};
std::string node_to_string(pugi::xml_node node)
{
xml_string_writer writer;
node.print(writer, " ");
return writer.result;
}
std::string gen_date_str()
{
auto now = time(0);
@@ -65,40 +50,56 @@ std::string gen_date_str()
return is.str();
}
static std::string gen_date_from_yyyy_mm_dd(const std::string& date)
{
std::stringstream is;
is << date << "T00:00::00:Z";
return is.str();
}
void OPDSDumper::setOpenSearchInfo(int totalResults, int startIndex, int count)
{
m_totalResults = totalResults;
m_startIndex = startIndex,
m_count = count;
m_isSearchResult = true;
}
#define ADD_TEXT_ENTRY(node, child, value) (node).append_child((child)).append_child(pugi::node_pcdata).set_value((value).c_str())
pugi::xml_node OPDSDumper::handleBook(Book book, pugi::xml_node root_node) {
auto entry_node = root_node.append_child("entry");
ADD_TEXT_ENTRY(entry_node, "title", book.title);
ADD_TEXT_ENTRY(entry_node, "id", "urn:uuid:"+book.id);
ADD_TEXT_ENTRY(entry_node, "title", book.getTitle());
ADD_TEXT_ENTRY(entry_node, "id", "urn:uuid:"+book.getId());
ADD_TEXT_ENTRY(entry_node, "icon", rootLocation + "/meta?name=favicon&content=" + book.getHumanReadableIdFromPath());
ADD_TEXT_ENTRY(entry_node, "updated", date);
ADD_TEXT_ENTRY(entry_node, "summary", book.description);
ADD_TEXT_ENTRY(entry_node, "updated", gen_date_from_yyyy_mm_dd(book.getDate()));
ADD_TEXT_ENTRY(entry_node, "summary", book.getDescription());
auto content_node = entry_node.append_child("link");
content_node.append_attribute("type") = "text/html";
content_node.append_attribute("href") = (rootLocation + "/" + book.getHumanReadableIdFromPath()).c_str();
auto author_node = entry_node.append_child("author");
ADD_TEXT_ENTRY(author_node, "name", book.creator);
ADD_TEXT_ENTRY(author_node, "name", book.getCreator());
if (! book.url.empty()) {
if (! book.getUrl().empty()) {
auto acquisition_link = entry_node.append_child("link");
acquisition_link.append_attribute("rel") = "http://opds-spec.org/acquisition/open-access";
acquisition_link.append_attribute("type") = "application/x-zim";
acquisition_link.append_attribute("href") = book.url.c_str();
acquisition_link.append_attribute("href") = book.getUrl().c_str();
acquisition_link.append_attribute("length") = to_string(book.getSize()).c_str();
}
if (! book.faviconMimeType.empty() ) {
if (! book.getFaviconMimeType().empty() ) {
auto image_link = entry_node.append_child("link");
image_link.append_attribute("rel") = "http://opds-spec.org/image/thumbnail";
image_link.append_attribute("type") = book.faviconMimeType.c_str();
image_link.append_attribute("type") = book.getFaviconMimeType().c_str();
image_link.append_attribute("href") = (rootLocation + "/meta?name=favicon&content=" + book.getHumanReadableIdFromPath()).c_str();
}
return entry_node;
}
string OPDSDumper::dumpOPDSFeed()
string OPDSDumper::dumpOPDSFeed(const std::vector<std::string>& bookIds)
{
date = gen_date_str();
pugi::xml_document doc;
@@ -112,6 +113,12 @@ string OPDSDumper::dumpOPDSFeed()
ADD_TEXT_ENTRY(root_node, "title", title);
ADD_TEXT_ENTRY(root_node, "updated", date);
if (m_isSearchResult) {
ADD_TEXT_ENTRY(root_node, "totalResults", to_string(m_totalResults));
ADD_TEXT_ENTRY(root_node, "startIndex", to_string(m_startIndex));
ADD_TEXT_ENTRY(root_node, "itemsPerPage", to_string(m_count));
}
auto self_link_node = root_node.append_child("link");
self_link_node.append_attribute("rel") = "self";
self_link_node.append_attribute("href") = "";
@@ -125,11 +132,13 @@ string OPDSDumper::dumpOPDSFeed()
search_link.append_attribute("href") = searchDescriptionUrl.c_str();
}
for (auto book: library.books) {
handleBook(book, root_node);
if (library) {
for (auto& bookId: bookIds) {
handleBook(library->getBookById(bookId), root_node);
}
}
return node_to_string(root_node);
return nodeToString(root_node);
}
}

View File

@@ -762,7 +762,7 @@ bool Reader::searchSuggestionsSmart(const string& prefix,
unsigned int suggestionsCount)
{
std::vector<std::string> variants = this->getTitleVariants(prefix);
bool retVal;
bool retVal = false;
this->suggestions.clear();
this->suggestionsOffset = this->suggestions.begin();
@@ -851,7 +851,7 @@ bool Reader::isCorrupted() const
unsigned int Reader::getFileSize() const
{
zim::File* file = this->getZimFileHandler();
zim::offset_type size = 0;
zim::size_type size = 0;
if (file != NULL) {
size = file->getFilesize();

View File

@@ -22,20 +22,12 @@
#include "searcher.h"
#include "reader.h"
#include "xapianSearcher.h"
#include <zim/search.h>
#ifdef ENABLE_CTPP2
#include <ctpp2/CDT.hpp>
#include <ctpp2/CTPP2FileLogger.hpp>
#include <ctpp2/CTPP2SimpleVM.hpp>
#include "ctpp2/CTPP2VMStringLoader.hpp"
#include <mustache.hpp>
#include "kiwixlib-resources.h"
using namespace CTPP;
#endif
#define MAX_SEARCH_LEN 140
namespace kiwix
@@ -61,42 +53,18 @@ class _Result : public Result
struct SearcherInternal {
const zim::Search* _search;
XapianSearcher* _xapianSearcher;
zim::Search::iterator current_iterator;
SearcherInternal() : _search(NULL), _xapianSearcher(NULL) {}
SearcherInternal() : _search(NULL) {}
~SearcherInternal()
{
if (_search != NULL) {
delete _search;
}
if (_xapianSearcher != NULL) {
delete _xapianSearcher;
}
}
};
/* Constructor */
Searcher::Searcher(const string& xapianDirectoryPath,
Reader* reader,
const string& humanReadableName)
: internal(new SearcherInternal()),
searchPattern(""),
protocolPrefix("zim://"),
searchProtocolPrefix("search://?"),
resultCountPerPage(0),
estimatedResultCount(0),
resultStart(0),
resultEnd(0),
contentHumanReadableId(humanReadableName)
{
loadICUExternalTables();
if (!reader || !reader->hasFulltextIndex()) {
internal->_xapianSearcher = new XapianSearcher(xapianDirectoryPath, reader);
}
this->humanReaderNames.push_back(humanReadableName);
}
Searcher::Searcher(const std::string& humanReadableName)
: internal(new SearcherInternal()),
searchPattern(""),
@@ -160,26 +128,19 @@ void Searcher::search(std::string& search,
this->resultStart = resultStart;
this->resultEnd = resultEnd;
string unaccentedSearch = removeAccents(search);
if (internal->_xapianSearcher) {
internal->_xapianSearcher->searchInIndex(
unaccentedSearch, resultStart, resultEnd, verbose);
this->estimatedResultCount
= internal->_xapianSearcher->results.get_matches_estimated();
} else {
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
if ( (*current)->hasFulltextIndex() ) {
zims.push_back((*current)->getZimFileHandler());
}
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
if ( (*current)->hasFulltextIndex() ) {
zims.push_back((*current)->getZimFileHandler());
}
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
return;
@@ -209,10 +170,6 @@ void Searcher::geo_search(float latitude, float longitude, float distance,
return;
}
if (internal->_xapianSearcher) {
return;
}
/* Avoid big researches */
this->resultCountPerPage = resultEnd - resultStart;
if (this->resultCountPerPage > MAX_SEARCH_LEN) {
@@ -244,18 +201,14 @@ void Searcher::geo_search(float latitude, float longitude, float distance,
void Searcher::restart_search()
{
if (internal->_xapianSearcher) {
internal->_xapianSearcher->restart_search();
} else if (internal->_search) {
if (internal->_search) {
internal->current_iterator = internal->_search->begin();
}
}
Result* Searcher::getNextResult()
{
if (internal->_xapianSearcher) {
return internal->_xapianSearcher->getNextResult();
} else if (internal->_search &&
if (internal->_search &&
internal->current_iterator != internal->_search->end()) {
Result* result = new _Result(internal->current_iterator);
internal->current_iterator++;
@@ -272,37 +225,31 @@ void Searcher::reset()
return;
}
void Searcher::suggestions(std::string& search, const bool verbose)
void Searcher::suggestions(std::string& searchPattern, const bool verbose)
{
this->reset();
if (verbose == true) {
cout << "Performing suggestion query `" << search << "`" << endl;
cout << "Performing suggestion query `" << searchPattern << "`" << endl;
}
this->searchPattern = search;
this->searchPattern = searchPattern;
this->resultStart = 0;
this->resultEnd = 10;
string unaccentedSearch = removeAccents(search);
string unaccentedSearch = removeAccents(searchPattern);
if (internal->_xapianSearcher) {
/* [TODO] Suggestion on a external database ?
* We do not support that. */
this->estimatedResultCount = 0;
} else {
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
zims.push_back((*current)->getZimFileHandler());
}
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
search->set_suggestion_mode(true);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
zims.push_back((*current)->getZimFileHandler());
}
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
search->set_suggestion_mode(true);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
/* Return the result count estimation */
@@ -363,46 +310,30 @@ int _Result::get_readerIndex()
{
return iterator.get_fileIndex();
}
#ifdef ENABLE_CTPP2
string Searcher::getHtml()
{
SimpleVM oSimpleVM(
1024, //iIMaxFunctions (default value)
4096, //iIMaxArgStackSize (default value)
4096, //iIMaxCodeStackSize (default value)
10240 * 2 //iIMaxSteps (default*2)
);
// Fill data
CDT oData;
CDT resultsCDT(CDT::ARRAY_VAL);
kainjow::mustache::data results{kainjow::mustache::data::type::list};
this->restart_search();
Result* p_result = NULL;
while ((p_result = this->getNextResult())) {
CDT result;
result["title"] = p_result->get_title();
result["url"] = p_result->get_url();
result["snippet"] = p_result->get_snippet();
result["contentId"] = humanReaderNames[p_result->get_readerIndex()];
if (p_result->get_size() >= 0) {
result["size"] = kiwix::beautifyInteger(p_result->get_size());
}
kainjow::mustache::data result;
result.set("title", p_result->get_title());
result.set("url", p_result->get_url());
result.set("snippet", p_result->get_snippet());
result.set("resultContentId", humanReaderNames[p_result->get_readerIndex()]);
if (p_result->get_wordCount() >= 0) {
result["wordCount"] = kiwix::beautifyInteger(p_result->get_wordCount());
result.set("wordCount", kiwix::beautifyInteger(p_result->get_wordCount()));
}
resultsCDT.PushBack(result);
results.push_back(result);
delete p_result;
}
this->restart_search();
oData["results"] = resultsCDT;
// pages
CDT pagesCDT(CDT::ARRAY_VAL);
kainjow::mustache::data pages{kainjow::mustache::data::type::list};
unsigned int pageStart
= this->resultStart / this->resultCountPerPage >= 5
@@ -418,48 +349,41 @@ string Searcher::getHtml()
}
for (unsigned int i = pageStart; i < pageStart + pageCount; i++) {
CDT page;
page["label"] = i + 1;
page["start"] = i * this->resultCountPerPage;
page["end"] = (i + 1) * this->resultCountPerPage;
kainjow::mustache::data page;
page.set("label", to_string(i + 1));
page.set("start", to_string(i * this->resultCountPerPage));
page.set("end", to_string((i + 1) * this->resultCountPerPage));
if (i * this->resultCountPerPage == this->resultStart) {
page["selected"] = true;
page.set("selected", true);
}
pagesCDT.PushBack(page);
pages.push_back(page);
}
oData["pages"] = pagesCDT;
oData["count"] = kiwix::beautifyInteger(this->estimatedResultCount);
oData["searchPattern"] = kiwix::encodeDiples(this->searchPattern);
oData["searchPatternEncoded"] = urlEncode(this->searchPattern);
oData["resultStart"] = this->resultStart + 1;
oData["resultEnd"] = (this->resultEnd > this->estimatedResultCount
? this->estimatedResultCount
: this->resultEnd);
oData["resultRange"] = this->resultCountPerPage;
oData["resultLastPageStart"]
= this->estimatedResultCount > this->resultCountPerPage
? std::round(this->estimatedResultCount / this->resultCountPerPage) * this->resultCountPerPage
: 0;
oData["protocolPrefix"] = this->protocolPrefix;
oData["searchProtocolPrefix"] = this->searchProtocolPrefix;
oData["contentId"] = this->contentHumanReadableId;
std::string template_str = RESOURCE::search_result_tmpl;
kainjow::mustache::mustache tmpl(template_str);
std::string template_ct2 = RESOURCE::results_ct2;
VMStringLoader oLoader(template_ct2.c_str(), template_ct2.size());
kainjow::mustache::data allData;
allData.set("results", results);
allData.set("pages", pages);
allData.set("hasResult", this->estimatedResultCount != 0);
allData.set("count", kiwix::beautifyInteger(this->estimatedResultCount));
allData.set("searchPattern", kiwix::encodeDiples(this->searchPattern));
allData.set("searchPatternEncoded", urlEncode(this->searchPattern));
allData.set("resultStart", to_string(this->resultStart + 1));
allData.set("resultEnd", to_string(min(this->resultEnd, this->estimatedResultCount)));
allData.set("resultRange", to_string(this->resultCountPerPage));
allData.set("resultLastPageStart", to_string(this->estimatedResultCount > this->resultCountPerPage
? round(this->estimatedResultCount / this->resultCountPerPage) * this->resultCountPerPage
: 0));
allData.set("lastResult", to_string(this->estimatedResultCount));
allData.set("protocolPrefix", this->protocolPrefix);
allData.set("searchProtocolPrefix", this->searchProtocolPrefix);
allData.set("contentId", this->contentHumanReadableId);
FileLogger oLogger(stderr);
// DEBUG only (write output to stdout)
// oSimpleVM.Run(oData, oLoader, stdout, oLogger);
std::string sResult;
oSimpleVM.Run(oData, oLoader, sResult, oLogger);
return sResult;
std::stringstream ss;
tmpl.render(allData, [&ss](const std::string& str) { ss << str; });
return ss.str();
}
#endif
}

40
src/subprocess.cpp Normal file
View File

@@ -0,0 +1,40 @@
#include "subprocess.h"
#ifdef _WIN32
# include "subprocess_windows.h"
#else
# include "subprocess_unix.h"
#endif
Subprocess::Subprocess(std::unique_ptr<SubprocessImpl> impl, commandLine_t& commandLine) :
mp_impl(std::move(impl))
{
mp_impl->run(commandLine);
}
Subprocess::~Subprocess()
{
mp_impl->kill();
}
std::unique_ptr<Subprocess> Subprocess::run(commandLine_t& commandLine)
{
#ifdef _WIN32
auto impl = std::unique_ptr<SubprocessImpl>(new WinImpl);
#else
auto impl = std::unique_ptr<UnixImpl>(new UnixImpl);
#endif
return std::unique_ptr<Subprocess>(new Subprocess(std::move(impl), commandLine));
}
bool Subprocess::isRunning()
{
return mp_impl->isRunning();
}
bool Subprocess::kill()
{
return mp_impl->kill();
}

36
src/subprocess.h Normal file
View File

@@ -0,0 +1,36 @@
#ifndef KIWIX_SUBPROCESS_H_
#define KIWIX_SUBPROCESS_H_
#include <string>
#include <memory>
#include <vector>
typedef std::vector<const char *> commandLine_t;
class SubprocessImpl
{
public:
virtual void run(commandLine_t& commandLine) = 0;
virtual bool kill() = 0;
virtual bool isRunning() = 0;
virtual ~SubprocessImpl() = default;
};
class Subprocess
{
private:
// Impl depends of the system (window, unix, ...)
std::unique_ptr<SubprocessImpl> mp_impl;
Subprocess(std::unique_ptr<SubprocessImpl> impl, commandLine_t& commandLine);
public:
static std::unique_ptr<Subprocess> run(commandLine_t& commandLine);
~Subprocess();
bool isRunning();
bool kill();
};
#endif // KIWIX_SUBPROCESS_H_

93
src/subprocess_unix.cpp Normal file
View File

@@ -0,0 +1,93 @@
#include "subprocess_unix.h"
#include <sys/types.h>
#include <unistd.h>
#include <iostream>
#include <signal.h>
#include <sys/wait.h>
#include <stdlib.h>
UnixImpl::UnixImpl():
m_pid(0),
m_running(false),
m_mutex(PTHREAD_MUTEX_INITIALIZER),
m_waitingThread()
{
}
UnixImpl::~UnixImpl()
{
kill();
// Android has no pthread_cancel :(
#ifdef __ANDROID__
pthread_kill(m_waitingThread, SIGUSR1);
#else
pthread_cancel(m_waitingThread);
#endif
}
#ifdef __ANDROID__
void thread_exit_handler(int sig) {
pthread_exit(0);
}
#endif
void* UnixImpl::waitForPID(void* _self)
{
#ifdef __ANDROID__
struct sigaction actions;
memset(&actions, 0, sizeof(actions));
sigemptyset(&actions.sa_mask);
actions.sa_flags = 0;
actions.sa_handler = thread_exit_handler;
sigaction(SIGUSR1, &actions, NULL);
#endif
UnixImpl* self = static_cast<UnixImpl*>(_self);
waitpid(self->m_pid, NULL, WEXITED);
pthread_mutex_lock(&self->m_mutex);
self->m_running = false;
pthread_mutex_unlock(&self->m_mutex);
return self;
}
void UnixImpl::run(commandLine_t& commandLine)
{
const char* binary = commandLine[0];
int pid = fork();
switch(pid) {
case -1:
std::cerr << "cannot fork" << std::endl;
break;
case 0:
commandLine.push_back(NULL);
if (execvp(binary, const_cast<char* const*>(commandLine.data()))) {
perror("Cannot launch\n");
exit(-1);
}
break;
default:
m_pid = pid;
m_running = true;
pthread_create(&m_waitingThread, NULL, waitForPID, this);
break;
}
}
bool UnixImpl::kill()
{
return (::kill(m_pid, SIGKILL) == 0);
}
bool UnixImpl::isRunning()
{
pthread_mutex_lock(&m_mutex);
bool ret = m_running;
pthread_mutex_unlock(&m_mutex);
return ret;
}

28
src/subprocess_unix.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef KIWIX_SUBPROCESS_UNIX_H_
#define KIWIX_SUBPROCESS_UNIX_H_
#include "subprocess.h"
#include <pthread.h>
class UnixImpl : public SubprocessImpl
{
private:
int m_pid;
bool m_running;
pthread_mutex_t m_mutex;
pthread_t m_waitingThread;
public:
UnixImpl();
virtual ~UnixImpl();
void run(commandLine_t& commandLine);
bool kill();
bool isRunning();
static void* waitForPID(void* self);
};
#endif //KIWIX_SUBPROCESS_UNIX_H_

View File

@@ -0,0 +1,93 @@
#include "subprocess_windows.h"
#include <windows.h>
#include <winbase.h>
#include <iostream>
#include <sstream>
WinImpl::WinImpl():
m_pid(0),
m_running(false),
m_handle(INVALID_HANDLE_VALUE)
{
InitializeCriticalSection(&m_criticalSection);
}
WinImpl::~WinImpl()
{
kill();
CloseHandle(m_handle);
DeleteCriticalSection(&m_criticalSection);
}
DWORD WINAPI WinImpl::waitForPID(void* _self)
{
WinImpl* self = static_cast<WinImpl*>(_self);
WaitForSingleObject(self->m_handle, INFINITE);
EnterCriticalSection(&self->m_criticalSection);
self->m_running = false;
LeaveCriticalSection(&self->m_criticalSection);
return 0;
}
std::unique_ptr<wchar_t[]> toWideChar(const std::string& value)
{
auto size = MultiByteToWideChar(CP_UTF8, 0,
value.c_str(), -1, nullptr, 0);
auto wdata = std::unique_ptr<wchar_t[]>(new wchar_t[size]);
auto ret = MultiByteToWideChar(CP_UTF8, 0,
value.c_str(), -1, wdata.get(), size);
if (0 == ret) {
std::ostringstream oss;
oss << "Cannot convert to wchar : " << GetLastError();
throw std::runtime_error(oss.str());
}
return wdata;
}
void WinImpl::run(commandLine_t& commandLine)
{
STARTUPINFOW startInfo = {0};
PROCESS_INFORMATION procInfo;
startInfo.cb = sizeof(startInfo);
std::ostringstream oss;
for(auto& item: commandLine) {
oss << item << " ";
}
auto wCommandLine = toWideChar(oss.str());
if (CreateProcessW(
NULL,
wCommandLine.get(),
NULL,
NULL,
false,
CREATE_NO_WINDOW,
NULL,
NULL,
&startInfo,
&procInfo)) {
m_pid = procInfo.dwProcessId;
m_handle = procInfo.hProcess;
CloseHandle(procInfo.hThread);
m_running = true;
CreateThread(NULL, 0, &waitForPID, this, 0, NULL );
}
}
bool WinImpl::kill()
{
return TerminateProcess(m_handle, 0);
}
bool WinImpl::isRunning()
{
EnterCriticalSection(&m_criticalSection);
bool ret = m_running;
LeaveCriticalSection(&m_criticalSection);
return ret;
}

28
src/subprocess_windows.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef KIWIX_SUBPROCESS_WINDOWS_H_
#define KIWIX_SUBPROCESS_WINDOWS_H_
#include "subprocess.h"
#include <windows.h>
#include <synchapi.h>
class WinImpl : public SubprocessImpl
{
private:
int m_pid;
bool m_running;
HANDLE m_handle;
CRITICAL_SECTION m_criticalSection;
public:
WinImpl();
virtual ~WinImpl();
void run(commandLine_t& commandLine);
bool kill();
bool isRunning();
static DWORD WINAPI waitForPID(void* self);
};
#endif //KIWIX_SUBPROCESS_WINDOWS_H_

View File

@@ -24,7 +24,7 @@
René Nyffenegger rene.nyffenegger@adp-gmbh.ch
*/
#include <common/base64.h>
#include <tools/base64.h>
#include <iostream>
static const std::string base64_chars =
@@ -37,8 +37,10 @@ static inline bool is_base64(unsigned char c) {
return (isalnum(c) || (c == '+') || (c == '/'));
}
std::string base64_encode(unsigned char const* bytes_to_encode, unsigned int in_len) {
std::string base64_encode(const std::string& inString) {
std::string ret;
auto in_len = inString.size();
const unsigned char* bytes_to_encode = reinterpret_cast<const unsigned char*>(inString.data());
int i = 0;
int j = 0;
unsigned char char_array_3[3];

View File

@@ -17,8 +17,27 @@
* MA 02110-1301, USA.
*/
#include <common/networkTools.h>
#include <tools/networkTools.h>
#ifdef _WIN32
#include <winsock2.h>
#include <ws2tcpip.h>
#else
#include <net/if.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>
#endif
#include <curl/curl.h>
#include <sstream>
#include <iostream>
std::map<std::string, std::string> kiwix::getNetworkInterfaces()
@@ -160,3 +179,31 @@ std::string kiwix::getBestPublicIp()
return "127.0.0.1";
}
size_t write_callback_to_iss(char* ptr, size_t size, size_t nmemb, void* userdata)
{
auto str = static_cast<std::stringstream*>(userdata);
str->write(ptr, nmemb);
return nmemb;
}
std::string kiwix::download(const std::string& url) {
auto curl = curl_easy_init();
std::stringstream ss;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPGET, 1L);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_callback_to_iss);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &ss);
auto res = curl_easy_perform(curl);
if (res != CURLE_OK) {
curl_easy_cleanup(curl);
throw std::runtime_error("Cannot perform request");
}
long response_code;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_cleanup(curl);
if (response_code != 200) {
throw std::runtime_error("Invalid return code from server");
}
return ss.str();
}

326
src/tools/otherTools.cpp Normal file
View File

@@ -0,0 +1,326 @@
/*
* Copyright 2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <tools/otherTools.h>
#include <map>
static std::map<std::string, std::string> codeisomapping {
//a
{ "ad", "and" },
{ "ae", "are" },
{ "af", "afg" },
{ "ag", "atg" },
{ "ai", "aia" },
{ "al", "alb" },
{ "am", "arm" },
{ "an", "ant" },
{ "ao", "ago" },
{ "aq", "ata" },
{ "ar", "arg" },
{ "as", "asm" },
{ "at", "aut" },
{ "au", "aus" },
{ "aw", "abw" },
{ "ax", "ala" },
{ "az", "aze" },
//b
{ "ba", "bih" },
{ "bb", "brb" },
{ "bd", "bgd" },
{ "be", "bel" },
{ "bf", "bfa" },
{ "bg", "bgr" },
{ "bh", "bhr" },
{ "bi", "bdi" },
{ "bj", "ben" },
{ "bl", "blm" },
{ "bn", "brn" },
{ "bm", "bmu" },
{ "bo", "bol" },
{ "br", "bra" },
{ "bs", "bhs" },
{ "bt", "btn" },
{ "bv", "bvt" },
{ "bw", "bwa" },
{ "by", "blr" },
{ "bz", "blz" },
//c
{ "ca", "can" },
{ "cc", "cck" },
{ "cd", "cod" },
{ "cf", "caf" },
{ "cg", "cog" },
{ "ch", "che" },
{ "ci", "civ" },
{ "ck", "cok" },
{ "cl", "chl" },
{ "cm", "cmr" },
{ "cn", "chn" },
{ "co", "col" },
{ "cr", "cri" },
{ "cu", "cub" },
{ "cv", "cpv" },
{ "cx", "cxr" },
{ "cy", "cyp" },
{ "cz", "cze" },
//d
{ "de", "deu" },
{ "dj", "dji" },
{ "dk", "dnk" },
{ "dm", "dma" },
{ "do", "dom" },
{ "dz", "dza" },
//e
{ "ec", "ecu" },
{ "ee", "est" },
{ "eg", "egy" },
{ "eh", "esh" },
{ "en", "eng" },
{ "er", "eri" },
{ "es", "esp" },
{ "et", "eth" },
//f
{ "fi", "fin" },
{ "fj", "fji" },
{ "fk", "flk" },
{ "fm", "fsm" },
{ "fo", "fro" },
{ "fr", "fra" },
//g
{ "ga", "gab" },
{ "gb", "gbr" },
{ "gd", "grd" },
{ "ge", "geo" },
{ "gf", "guf" },
{ "gg", "ggy" },
{ "gh", "gha" },
{ "gi", "gib" },
{ "gl", "grl" },
{ "gm", "gmb" },
{ "gn", "gin" },
{ "gp", "glp" },
{ "gq", "gnq" },
{ "gr", "grc" },
{ "gs", "sgs" },
{ "gt", "gtm" },
{ "gu", "gum" },
{ "gw", "gnb" },
{ "gy", "guy" },
//h
{ "hk", "hkg" },
{ "hm", "hmd" },
{ "hn", "hnd" },
{ "hr", "hrv" },
{ "ht", "hti" },
{ "hu", "hun" },
//i
{ "id", "idn" },
{ "ie", "irl" },
{ "il", "isr" },
{ "im", "imn" },
{ "in", "ind" },
{ "io", "iot" },
{ "iq", "irq" },
{ "ir", "irn" },
{ "is", "isl" },
{ "it", "ita" },
//j
{ "je", "jey" },
{ "jm", "jam" },
{ "jo", "jor" },
{ "jp", "jpn" },
//k
{ "ke", "ken" },
{ "kg", "kgz" },
{ "kh", "khm" },
{ "ki", "kir" },
{ "km", "com" },
{ "kn", "kna" },
{ "kp", "prk" },
{ "kr", "kor" },
{ "kw", "kwt" },
{ "ky", "cym" },
{ "kz", "kaz" },
//l
{ "la", "lao" },
{ "lb", "lbn" },
{ "lc", "lca" },
{ "li", "lie" },
{ "lk", "lka" },
{ "lr", "lbr" },
{ "ls", "lso" },
{ "lt", "ltu" },
{ "lu", "lux" },
{ "lv", "lva" },
{ "ly", "lby" },
//m
{ "ma", "mar" },
{ "mc", "mco" },
{ "md", "mda" },
{ "me", "mne" },
{ "mf", "maf" },
{ "mg", "mdg" },
{ "mh", "mhl" },
{ "mk", "mkd" },
{ "ml", "mli" },
{ "mm", "mmr" },
{ "mn", "mng" },
{ "mo", "mac" },
{ "mp", "mnp" },
{ "mq", "mtq" },
{ "mr", "mrt" },
{ "ms", "msr" },
{ "mt", "mlt" },
{ "mu", "mus" },
{ "mv", "mdv" },
{ "mw", "mwi" },
{ "mx", "mex" },
{ "my", "mys" },
{ "mz", "moz" },
//n
{ "na", "nam" },
{ "nc", "ncl" },
{ "ne", "ner" },
{ "nf", "nfk" },
{ "ng", "nga" },
{ "ni", "nic" },
{ "nl", "nld" },
{ "no", "nor" },
{ "np", "npl" },
{ "nr", "nru" },
{ "nu", "niu" },
{ "nz", "nzl" },
//o
{ "om", "omn" },
//p
{ "pa", "pan" },
{ "pe", "per" },
{ "pf", "pyf" },
{ "pg", "png" },
{ "ph", "phl" },
{ "pk", "pak" },
{ "pl", "pol" },
{ "pm", "spm" },
{ "pn", "pcn" },
{ "pr", "pri" },
{ "ps", "pse" },
{ "pt", "prt" },
{ "pw", "plw" },
{ "py", "pry" },
//q
{ "qa", "qat" },
//r
{ "re", "reu" },
{ "ro", "rou" },
{ "rs", "srb" },
{ "ru", "rus" },
{ "rw", "rwa" },
//s
{ "sa", "sau" },
{ "sb", "slb" },
{ "sc", "syc" },
{ "sd", "sdn" },
{ "se", "swe" },
{ "sg", "sgp" },
{ "sh", "shn" },
{ "si", "svn" },
{ "sj", "sjm" },
{ "sk", "svk" },
{ "sl", "sle" },
{ "sm", "smr" },
{ "sn", "sen" },
{ "so", "som" },
{ "sr", "sur" },
{ "ss", "ssd" },
{ "st", "stp" },
{ "sv", "slv" },
{ "sy", "syr" },
{ "sz", "swz" },
//t
{ "tc", "tca" },
{ "td", "tcd" },
{ "tf", "atf" },
{ "tg", "tgo" },
{ "th", "tha" },
{ "tj", "tjk" },
{ "tk", "tkl" },
{ "tl", "tls" },
{ "tm", "tkm" },
{ "tn", "tun" },
{ "to", "ton" },
{ "tr", "tur" },
{ "tt", "tto" },
{ "tv", "tuv" },
{ "tw", "twn" },
{ "tz", "tza" },
//u
{ "ua", "ukr" },
{ "ug", "uga" },
{ "um", "umi" },
{ "us", "usa" },
{ "uy", "ury" },
{ "uz", "uzb" },
//v
{ "va", "vat" },
{ "vc", "vct" },
{ "ve", "ven" },
{ "vg", "vgb" },
{ "vi", "vir" },
{ "vn", "vnm" },
{ "vu", "vut" },
//w
{ "wf", "wlf" },
{ "ws", "wsm" },
//y
{ "ye", "yem" },
{ "yt", "myt" },
// z
{ "za", "zaf" },
{ "zm", "zmb" },
{ "zw", "zwe" }
};
void kiwix::sleep(unsigned int milliseconds)
{
#ifdef _WIN32
Sleep(milliseconds);
#else
usleep(1000 * milliseconds);
#endif
}
struct XmlStringWriter: pugi::xml_writer
{
std::string result;
virtual void write(const void* data, size_t size){
result.append(static_cast<const char*>(data), size);
}
};
std::string kiwix::nodeToString(pugi::xml_node node)
{
XmlStringWriter writer;
node.print(writer, " ");
return writer.result;
}
std::string kiwix::converta2toa3(const std::string& a2code){
return codeisomapping.at(a2code);
}

View File

@@ -17,7 +17,7 @@
* MA 02110-1301, USA.
*/
#include <common/pathTools.h>
#include <tools/pathTools.h>
#ifdef __APPLE__
#include <limits.h>
@@ -35,9 +35,9 @@
#endif
#ifdef _WIN32
#define SEPARATOR "\\"
const std::string SEPARATOR("\\");
#else
#define SEPARATOR "/"
const std::string SEPARATOR("/");
#include <unistd.h>
#endif
@@ -66,9 +66,7 @@ string computeRelativePath(const string path, const string absolutePath)
while (commonCount < pathParts.size()
&& commonCount < absolutePathParts.size()
&& pathParts[commonCount] == absolutePathParts[commonCount]) {
if (!pathParts[commonCount].empty()) {
commonCount++;
}
}
string relativePath;
@@ -76,18 +74,17 @@ string computeRelativePath(const string path, const string absolutePath)
/* On Windows you have a token more because the root is represented
by a letter */
if (commonCount == 0) {
relativePath = "../";
relativePath = ".." + SEPARATOR;
}
#endif
for (unsigned int i = commonCount; i < pathParts.size(); i++) {
relativePath += "../";
relativePath += ".." + SEPARATOR;
}
for (unsigned int i = commonCount; i < absolutePathParts.size(); i++) {
relativePath += absolutePathParts[i];
relativePath += i + 1 < absolutePathParts.size() ? "/" : "";
relativePath += i + 1 < absolutePathParts.size() ? SEPARATOR : "";
}
return relativePath;
}
@@ -310,3 +307,29 @@ string getCurrentDirectory()
free(a_cwd);
return s_cwd;
}
string getDataDirectory()
{
#ifdef _WIN32
char* cDataDir = ::getenv("APPDATA");
#else
char* cDataDir = ::getenv("KIWIX_DATA_DIR");
#endif
std::string dataDir = cDataDir==nullptr ? "" : cDataDir;
if (!dataDir.empty())
return dataDir;
#ifdef _WIN32
cDataDir = ::getenv("USERPROFILE");
dataDir = cDataDir==nullptr ? getCurrentDirectory() : cDataDir;
#else
cDataDir = ::getenv("XDG_DATA_HOME");
dataDir = cDataDir==nullptr ? "" : cDataDir;
if (dataDir.empty()) {
cDataDir = ::getenv("HOME");
dataDir = cDataDir==nullptr ? getCurrentDirectory() : cDataDir;
dataDir = appendToDirectory(dataDir, ".local");
dataDir = appendToDirectory(dataDir, "share");
}
#endif
return appendToDirectory(dataDir, "kiwix");
}

View File

@@ -17,7 +17,7 @@
* MA 02110-1301, USA.
*/
#include <common/regexTools.h>
#include <tools/regexTools.h>
std::map<std::string, icu::RegexMatcher*> regexCache;

View File

@@ -17,7 +17,7 @@
* MA 02110-1301, USA.
*/
#include <common/stringTools.h>
#include <tools/stringTools.h>
#include <unicode/normlzr.h>
#include <unicode/rep.h>
@@ -57,10 +57,8 @@ std::string kiwix::removeAccents(const std::string& text)
return unaccentedText;
}
#ifndef __ANDROID__
/* Prepare integer for display */
std::string kiwix::beautifyInteger(const unsigned int number)
std::string kiwix::beautifyInteger(uint64_t number)
{
std::stringstream numberStream;
numberStream << number;
@@ -75,14 +73,19 @@ std::string kiwix::beautifyInteger(const unsigned int number)
return numberString;
}
std::string kiwix::beautifyFileSize(const unsigned int number)
std::string kiwix::beautifyFileSize(uint64_t number)
{
if (number > 1024 * 1024) {
return kiwix::beautifyInteger(number / (1024 * 1024)) + " GB";
} else {
return kiwix::beautifyInteger(number / 1024 != 0 ? number / 1024 : 1)
+ " MB";
}
std::stringstream ss;
ss << std::fixed << std::setprecision(2);
if (number>>30)
ss << (number/(1024.0*1024*1024)) << " GB";
else if (number>>20)
ss << (number/(1024.0*1024)) << " MB";
else if (number>>10)
ss << (number/1024.0) << " KB";
else
ss << number << " B";
return ss.str();
}
void kiwix::printStringInHexadecimal(icu::UnicodeString s)
@@ -133,8 +136,6 @@ std::string kiwix::encodeDiples(const std::string& str)
return result;
}
#endif
/* urlEncode() based on javascript encodeURI() &
encodeURIComponent(). Mostly code from rstudio/httpuv (GPLv3) */

View File

@@ -1,373 +0,0 @@
/* htmlparse.cc: simple HTML parser for omega indexer
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2001 Ananova Ltd
* Copyright 2002,2006,2007,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
// #include <config.h>
#include "htmlparse.h"
#include <xapian.h>
// #include "utf8convert.h"
#include <algorithm>
#include <ctype.h>
#include <cstring>
#include <stdio.h>
#include <stdlib.h>
using namespace std;
inline void
lowercase_string(string &str)
{
for (string::iterator i = str.begin(); i != str.end(); ++i) {
*i = tolower(static_cast<unsigned char>(*i));
}
}
map<string, unsigned int> HtmlParser::named_ents;
inline static bool
p_notdigit(char c)
{
return !isdigit(static_cast<unsigned char>(c));
}
inline static bool
p_notxdigit(char c)
{
return !isxdigit(static_cast<unsigned char>(c));
}
inline static bool
p_notalnum(char c)
{
return !isalnum(static_cast<unsigned char>(c));
}
inline static bool
p_notwhitespace(char c)
{
return !isspace(static_cast<unsigned char>(c));
}
inline static bool
p_nottag(char c)
{
return !isalnum(static_cast<unsigned char>(c)) &&
c != '.' && c != '-' && c != ':'; // ':' for XML namespaces.
}
inline static bool
p_whitespacegt(char c)
{
return isspace(static_cast<unsigned char>(c)) || c == '>';
}
inline static bool
p_whitespaceeqgt(char c)
{
return isspace(static_cast<unsigned char>(c)) || c == '=' || c == '>';
}
bool
HtmlParser::get_parameter(const string & param, string & value)
{
map<string, string>::const_iterator i = parameters.find(param);
if (i == parameters.end()) return false;
value = i->second;
return true;
}
HtmlParser::HtmlParser()
{
static const struct ent { const char *n; unsigned int v; } ents[] = {
#include "namedentities.h"
{ NULL, 0 }
};
if (named_ents.empty()) {
const struct ent *i = ents;
while (i->n) {
named_ents[string(i->n)] = i->v;
++i;
}
}
}
void
HtmlParser::decode_entities(string &s)
{
// We need a const_iterator version of s.end() - otherwise the
// find() and find_if() templates don't work...
string::const_iterator amp = s.begin(), s_end = s.end();
while ((amp = find(amp, s_end, '&')) != s_end) {
unsigned int val = 0;
string::const_iterator end, p = amp + 1;
if (p != s_end && *p == '#') {
p++;
if (p != s_end && (*p == 'x' || *p == 'X')) {
// hex
p++;
end = find_if(p, s_end, p_notxdigit);
sscanf(s.substr(p - s.begin(), end - p).c_str(), "%x", &val);
} else {
// number
end = find_if(p, s_end, p_notdigit);
val = atoi(s.substr(p - s.begin(), end - p).c_str());
}
} else {
end = find_if(p, s_end, p_notalnum);
string code = s.substr(p - s.begin(), end - p);
map<string, unsigned int>::const_iterator i;
i = named_ents.find(code);
if (i != named_ents.end()) val = i->second;
}
if (end < s_end && *end == ';') end++;
if (val) {
string::size_type amp_pos = amp - s.begin();
if (val < 0x80) {
s.replace(amp_pos, end - amp, 1u, char(val));
} else {
// Convert unicode value val to UTF-8.
char seq[4];
unsigned len = Xapian::Unicode::nonascii_to_utf8(val, seq);
s.replace(amp_pos, end - amp, seq, len);
}
s_end = s.end();
// We've modified the string, so the iterators are no longer
// valid...
amp = s.begin() + amp_pos + 1;
} else {
amp = end;
}
}
}
void
HtmlParser::parse_html(const string &body)
{
in_script = false;
parameters.clear();
string::const_iterator start = body.begin();
while (true) {
// Skip through until we find an HTML tag, a comment, or the end of
// document. Ignore isolated occurrences of `<' which don't start
// a tag or comment.
string::const_iterator p = start;
while (true) {
p = find(p, body.end(), '<');
if (p == body.end()) break;
unsigned char ch = *(p + 1);
// Tag, closing tag, or comment (or SGML declaration).
if ((!in_script && isalpha(ch)) || ch == '/' || ch == '!') break;
if (ch == '?') {
// PHP code or XML declaration.
// XML declaration is only valid at the start of the first line.
// FIXME: need to deal with BOMs...
if (p != body.begin() || body.size() < 20) break;
// XML declaration looks something like this:
// <?xml version="1.0" encoding="UTF-8"?>
if (p[2] != 'x' || p[3] != 'm' || p[4] != 'l') break;
if (strchr(" \t\r\n", p[5]) == NULL) break;
string::const_iterator decl_end = find(p + 6, body.end(), '?');
if (decl_end == body.end()) break;
// Default charset for XML is UTF-8.
charset = "UTF-8";
string decl(p + 6, decl_end);
size_t enc = decl.find("encoding");
if (enc == string::npos) break;
enc = decl.find_first_not_of(" \t\r\n", enc + 8);
if (enc == string::npos || enc == decl.size()) break;
if (decl[enc] != '=') break;
enc = decl.find_first_not_of(" \t\r\n", enc + 1);
if (enc == string::npos || enc == decl.size()) break;
if (decl[enc] != '"' && decl[enc] != '\'') break;
char quote = decl[enc++];
size_t enc_end = decl.find(quote, enc);
if (enc != string::npos)
charset = decl.substr(enc, enc_end - enc);
break;
}
p++;
}
// Process text up to start of tag.
if (p > start) {
string text = body.substr(start - body.begin(), p - start);
// convert_to_utf8(text, charset);
decode_entities(text);
process_text(text);
}
if (p == body.end()) break;
start = p + 1;
if (start == body.end()) break;
if (*start == '!') {
if (++start == body.end()) break;
if (++start == body.end()) break;
// comment or SGML declaration
if (*(start - 1) == '-' && *start == '-') {
++start;
string::const_iterator close = find(start, body.end(), '>');
// An unterminated comment swallows rest of document
// (like Netscape, but unlike MSIE IIRC)
if (close == body.end()) break;
p = close;
// look for -->
while (p != body.end() && (*(p - 1) != '-' || *(p - 2) != '-'))
p = find(p + 1, body.end(), '>');
if (p != body.end()) {
// Check for htdig's "ignore this bit" comments.
if (p - start == 15 && string(start, p - 2) == "htdig_noindex") {
string::size_type i;
i = body.find("<!--/htdig_noindex-->", p + 1 - body.begin());
if (i == string::npos) break;
start = body.begin() + i + 21;
continue;
}
// If we found --> skip to there.
start = p;
} else {
// Otherwise skip to the first > we found (as Netscape does).
start = close;
}
} else {
// just an SGML declaration, perhaps giving the DTD - ignore it
start = find(start - 1, body.end(), '>');
if (start == body.end()) break;
}
++start;
} else if (*start == '?') {
if (++start == body.end()) break;
// PHP - swallow until ?> or EOF
start = find(start + 1, body.end(), '>');
// look for ?>
while (start != body.end() && *(start - 1) != '?')
start = find(start + 1, body.end(), '>');
// unterminated PHP swallows rest of document (rather arbitrarily
// but it avoids polluting the database when things go wrong)
if (start != body.end()) ++start;
} else {
// opening or closing tag
int closing = 0;
if (*start == '/') {
closing = 1;
start = find_if(start + 1, body.end(), p_notwhitespace);
}
p = start;
start = find_if(start, body.end(), p_nottag);
string tag = body.substr(p - body.begin(), start - p);
// convert tagname to lowercase
lowercase_string(tag);
if (closing) {
closing_tag(tag);
if (in_script && tag == "script") in_script = false;
/* ignore any bogus parameters on closing tags */
p = find(start, body.end(), '>');
if (p == body.end()) break;
start = p + 1;
} else {
// FIXME: parse parameters lazily.
while (start < body.end() && *start != '>') {
string name, value;
p = find_if(start, body.end(), p_whitespaceeqgt);
name.assign(body, start - body.begin(), p - start);
p = find_if(p, body.end(), p_notwhitespace);
start = p;
if (start != body.end() && *start == '=') {
start = find_if(start + 1, body.end(), p_notwhitespace);
p = body.end();
int quote = *start;
if (quote == '"' || quote == '\'') {
start++;
p = find(start, body.end(), quote);
}
if (p == body.end()) {
// unquoted or no closing quote
p = find_if(start, body.end(), p_whitespacegt);
}
value.assign(body, start - body.begin(), p - start);
start = find_if(p, body.end(), p_notwhitespace);
if (!name.empty()) {
// convert parameter name to lowercase
lowercase_string(name);
// in case of multiple entries, use the first
// (as Netscape does)
parameters.insert(make_pair(name, value));
}
}
}
#if 0
cout << "<" << tag;
map<string, string>::const_iterator x;
for (x = parameters.begin(); x != parameters.end(); x++) {
cout << " " << x->first << "=\"" << x->second << "\"";
}
cout << ">\n";
#endif
opening_tag(tag);
parameters.clear();
// In <script> tags we ignore opening tags to avoid problems
// with "a<b".
if (tag == "script") in_script = true;
if (start != body.end() && *start == '>') ++start;
}
}
}
}

View File

@@ -1,49 +0,0 @@
/* htmlparse.h: simple HTML parser for omega indexer
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2002,2006,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
#ifndef OMEGA_INCLUDED_HTMLPARSE_H
#define OMEGA_INCLUDED_HTMLPARSE_H
#include <string>
#include <map>
using std::string;
using std::map;
class HtmlParser {
map<string, string> parameters;
protected:
void decode_entities(string &s);
bool in_script;
string charset;
static map<string, unsigned int> named_ents;
bool get_parameter(const string & param, string & value);
public:
virtual void process_text(const string &/*text*/) { }
virtual void opening_tag(const string &/*tag*/) { }
virtual void closing_tag(const string &/*tag*/) { }
virtual void parse_html(const string &text);
HtmlParser();
virtual ~HtmlParser() { }
};
#endif // OMEGA_INCLUDED_HTMLPARSE_H

View File

@@ -1,302 +0,0 @@
/* myhtmlparse.cc: subclass of HtmlParser for extracting text.
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2002,2003,2004,2006,2007,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
// #include <config.h>
#include "myhtmlparse.h"
// #include "utf8convert.h"
#include <ctype.h>
#include <string.h>
inline void
lowercase_string(string &str)
{
for (string::iterator i = str.begin(); i != str.end(); ++i) {
*i = tolower(static_cast<unsigned char>(*i));
}
}
void
MyHtmlParser::parse_html(const string &text, const string &charset_,
bool charset_from_meta_)
{
charset = charset_;
charset_from_meta = charset_from_meta_;
HtmlParser::parse_html(text);
}
void
MyHtmlParser::process_text(const string &text)
{
if (!text.empty() && !in_script_tag && !in_style_tag) {
string::size_type b = text.find_first_not_of(WHITESPACE);
if (b) pending_space = true;
while (b != string::npos) {
if (pending_space && !dump.empty()) dump += ' ';
string::size_type e = text.find_first_of(WHITESPACE, b);
pending_space = (e != string::npos);
if (!pending_space) {
dump.append(text.data() + b, text.size() - b);
return;
}
dump.append(text.data() + b, e - b);
b = text.find_first_not_of(WHITESPACE, e + 1);
}
}
}
void
MyHtmlParser::opening_tag(const string &tag)
{
if (tag.empty()) return;
switch (tag[0]) {
case 'a':
if (tag == "address") pending_space = true;
break;
case 'b':
if (tag == "body") {
dump.resize(0);
break;
}
if (tag == "blockquote" || tag == "br") pending_space = true;
break;
case 'c':
if (tag == "center") pending_space = true;
break;
case 'd':
if (tag == "dd" || tag == "dir" || tag == "div" || tag == "dl" ||
tag == "dt") pending_space = true;
break;
case 'e':
if (tag == "embed") pending_space = true;
break;
case 'f':
if (tag == "fieldset" || tag == "form") pending_space = true;
break;
case 'h':
// hr, and h1, ..., h6
if (tag.length() == 2 && strchr("r123456", tag[1]))
pending_space = true;
break;
case 'i':
if (tag == "iframe" || tag == "img" || tag == "isindex" ||
tag == "input") pending_space = true;
break;
case 'k':
if (tag == "keygen") pending_space = true;
break;
case 'l':
if (tag == "legend" || tag == "li" || tag == "listing")
pending_space = true;
break;
case 'm':
if (tag == "meta") {
string content;
if (get_parameter("content", content)) {
string name;
if (get_parameter("name", name)) {
lowercase_string(name);
if (name == "description") {
if (sample.empty()) {
swap(sample, content);
// convert_to_utf8(sample, charset);
decode_entities(sample);
}
} else if (name == "keywords") {
if (!keywords.empty()) keywords += ' ';
// convert_to_utf8(content, charset);
decode_entities(content);
keywords += content;
} else if (name == "robots") {
decode_entities(content);
lowercase_string(content);
if (content.find("none") != string::npos ||
content.find("noindex") != string::npos) {
indexing_allowed = false;
throw true;
}
}
break;
}
// If the current charset came from a meta tag, don't
// force reparsing again!
if (charset_from_meta) break;
string hdr;
if (get_parameter("http-equiv", hdr)) {
lowercase_string(hdr);
if (hdr == "content-type") {
lowercase_string(content);
size_t start = content.find("charset=");
if (start == string::npos) break;
start += 8;
if (start == content.size()) break;
size_t end = start;
if (content[start] != '"') {
while (end < content.size()) {
unsigned char ch = content[end];
if (ch <= 32 || ch >= 127 ||
strchr(";()<>@,:\\\"/[]?={}", ch))
break;
++end;
}
} else {
++start;
++end;
while (end < content.size()) {
unsigned char ch = content[end];
if (ch == '"') break;
if (ch == '\\') content.erase(end, 1);
++end;
}
}
string newcharset(content, start, end - start);
if (charset != newcharset) {
throw newcharset;
}
}
}
break;
}
if (charset_from_meta) break;
string newcharset;
if (get_parameter("charset", newcharset)) {
// HTML5 added: <meta charset="...">
lowercase_string(newcharset);
if (charset != newcharset) {
throw newcharset;
}
}
break;
}
if (tag == "marquee" || tag == "menu" || tag == "multicol")
pending_space = true;
break;
case 'o':
if (tag == "ol" || tag == "option") pending_space = true;
break;
case 'p':
if (tag == "p" || tag == "pre" || tag == "plaintext")
pending_space = true;
break;
case 'q':
if (tag == "q") pending_space = true;
break;
case 's':
if (tag == "style") {
in_style_tag = true;
break;
}
if (tag == "script") {
in_script_tag = true;
break;
}
if (tag == "select") pending_space = true;
break;
case 't':
if (tag == "table" || tag == "td" || tag == "textarea" ||
tag == "th") pending_space = true;
break;
case 'u':
if (tag == "ul") pending_space = true;
break;
case 'x':
if (tag == "xmp") pending_space = true;
break;
}
}
void
MyHtmlParser::closing_tag(const string &tag)
{
if (tag.empty()) return;
switch (tag[0]) {
case 'a':
if (tag == "address") pending_space = true;
break;
case 'b':
if (tag == "body") {
throw true;
}
if (tag == "blockquote" || tag == "br") pending_space = true;
break;
case 'c':
if (tag == "center") pending_space = true;
break;
case 'd':
if (tag == "dd" || tag == "dir" || tag == "div" || tag == "dl" ||
tag == "dt") pending_space = true;
break;
case 'f':
if (tag == "fieldset" || tag == "form") pending_space = true;
break;
case 'h':
// hr, and h1, ..., h6
if (tag.length() == 2 && strchr("r123456", tag[1]))
pending_space = true;
break;
case 'i':
if (tag == "iframe") pending_space = true;
break;
case 'l':
if (tag == "legend" || tag == "li" || tag == "listing")
pending_space = true;
break;
case 'm':
if (tag == "marquee" || tag == "menu") pending_space = true;
break;
case 'o':
if (tag == "ol" || tag == "option") pending_space = true;
break;
case 'p':
if (tag == "p" || tag == "pre") pending_space = true;
break;
case 'q':
if (tag == "q") pending_space = true;
break;
case 's':
if (tag == "style") {
in_style_tag = false;
break;
}
if (tag == "script") {
in_script_tag = false;
break;
}
if (tag == "select") pending_space = true;
break;
case 't':
if (tag == "title") {
if (title.empty()) swap(title, dump);
break;
}
if (tag == "table" || tag == "td" || tag == "textarea" ||
tag == "th") pending_space = true;
break;
case 'u':
if (tag == "ul") pending_space = true;
break;
case 'x':
if (tag == "xmp") pending_space = true;
break;
}
}

View File

@@ -1,66 +0,0 @@
/* myhtmlparse.h: subclass of HtmlParser for extracting text
*
* Copyright 1999,2000,2001 BrightStation PLC
* Copyright 2002,2003,2004,2006,2008 Olly Betts
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
* USA
*/
#ifndef OMEGA_INCLUDED_MYHTMLPARSE_H
#define OMEGA_INCLUDED_MYHTMLPARSE_H
#include "htmlparse.h"
// FIXME: Should we include \xa0 which is non-breaking space in iso-8859-1, but
// not in all charsets and perhaps spans of all \xa0 should become a single
// \xa0?
#define WHITESPACE " \t\n\r"
class MyHtmlParser : public HtmlParser {
public:
bool in_script_tag;
bool in_style_tag;
bool pending_space;
bool indexing_allowed;
bool charset_from_meta;
string title, sample, keywords, dump;
void process_text(const string &text);
void opening_tag(const string &tag);
void closing_tag(const string &tag);
using HtmlParser::parse_html;
void parse_html(const string &text, const string &charset_,
bool charset_from_meta_);
MyHtmlParser() :
in_script_tag(false),
in_style_tag(false),
pending_space(false),
indexing_allowed(true),
charset_from_meta(false) { }
void reset() {
in_script_tag = false;
in_style_tag = false;
pending_space = false;
indexing_allowed = true;
charset_from_meta = false;
title.resize(0);
sample.resize(0);
keywords.resize(0);
dump.resize(0);
}
};
#endif // OMEGA_INCLUDED_MYHTMLPARSE_H

View File

@@ -1,279 +0,0 @@
/* namedentities.h: named HTML entities.
*
* Copyright (C) 2006,2007 Olly Betts
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef OMEGA_INCLUDED_NAMEDENTITIES_H
#define OMEGA_INCLUDED_NAMEDENTITIES_H
// Names and values from: "Character entity references in HTML 4"
// http://www.w3.org/TR/html4/sgml/entities.html
{ "quot", 34 },
{ "amp", 38 },
{ "apos", 39 }, // Not in HTML 4 list but used in OpenOffice XML.
{ "lt", 60 },
{ "gt", 62 },
{ "nbsp", 160 },
{ "iexcl", 161 },
{ "cent", 162 },
{ "pound", 163 },
{ "curren", 164 },
{ "yen", 165 },
{ "brvbar", 166 },
{ "sect", 167 },
{ "uml", 168 },
{ "copy", 169 },
{ "ordf", 170 },
{ "laquo", 171 },
{ "not", 172 },
{ "shy", 173 },
{ "reg", 174 },
{ "macr", 175 },
{ "deg", 176 },
{ "plusmn", 177 },
{ "sup2", 178 },
{ "sup3", 179 },
{ "acute", 180 },
{ "micro", 181 },
{ "para", 182 },
{ "middot", 183 },
{ "cedil", 184 },
{ "sup1", 185 },
{ "ordm", 186 },
{ "raquo", 187 },
{ "frac14", 188 },
{ "frac12", 189 },
{ "frac34", 190 },
{ "iquest", 191 },
{ "Agrave", 192 },
{ "Aacute", 193 },
{ "Acirc", 194 },
{ "Atilde", 195 },
{ "Auml", 196 },
{ "Aring", 197 },
{ "AElig", 198 },
{ "Ccedil", 199 },
{ "Egrave", 200 },
{ "Eacute", 201 },
{ "Ecirc", 202 },
{ "Euml", 203 },
{ "Igrave", 204 },
{ "Iacute", 205 },
{ "Icirc", 206 },
{ "Iuml", 207 },
{ "ETH", 208 },
{ "Ntilde", 209 },
{ "Ograve", 210 },
{ "Oacute", 211 },
{ "Ocirc", 212 },
{ "Otilde", 213 },
{ "Ouml", 214 },
{ "times", 215 },
{ "Oslash", 216 },
{ "Ugrave", 217 },
{ "Uacute", 218 },
{ "Ucirc", 219 },
{ "Uuml", 220 },
{ "Yacute", 221 },
{ "THORN", 222 },
{ "szlig", 223 },
{ "agrave", 224 },
{ "aacute", 225 },
{ "acirc", 226 },
{ "atilde", 227 },
{ "auml", 228 },
{ "aring", 229 },
{ "aelig", 230 },
{ "ccedil", 231 },
{ "egrave", 232 },
{ "eacute", 233 },
{ "ecirc", 234 },
{ "euml", 235 },
{ "igrave", 236 },
{ "iacute", 237 },
{ "icirc", 238 },
{ "iuml", 239 },
{ "eth", 240 },
{ "ntilde", 241 },
{ "ograve", 242 },
{ "oacute", 243 },
{ "ocirc", 244 },
{ "otilde", 245 },
{ "ouml", 246 },
{ "divide", 247 },
{ "oslash", 248 },
{ "ugrave", 249 },
{ "uacute", 250 },
{ "ucirc", 251 },
{ "uuml", 252 },
{ "yacute", 253 },
{ "thorn", 254 },
{ "yuml", 255 },
{ "OElig", 338 },
{ "oelig", 339 },
{ "Scaron", 352 },
{ "scaron", 353 },
{ "Yuml", 376 },
{ "fnof", 402 },
{ "circ", 710 },
{ "tilde", 732 },
{ "Alpha", 913 },
{ "Beta", 914 },
{ "Gamma", 915 },
{ "Delta", 916 },
{ "Epsilon", 917 },
{ "Zeta", 918 },
{ "Eta", 919 },
{ "Theta", 920 },
{ "Iota", 921 },
{ "Kappa", 922 },
{ "Lambda", 923 },
{ "Mu", 924 },
{ "Nu", 925 },
{ "Xi", 926 },
{ "Omicron", 927 },
{ "Pi", 928 },
{ "Rho", 929 },
{ "Sigma", 931 },
{ "Tau", 932 },
{ "Upsilon", 933 },
{ "Phi", 934 },
{ "Chi", 935 },
{ "Psi", 936 },
{ "Omega", 937 },
{ "alpha", 945 },
{ "beta", 946 },
{ "gamma", 947 },
{ "delta", 948 },
{ "epsilon", 949 },
{ "zeta", 950 },
{ "eta", 951 },
{ "theta", 952 },
{ "iota", 953 },
{ "kappa", 954 },
{ "lambda", 955 },
{ "mu", 956 },
{ "nu", 957 },
{ "xi", 958 },
{ "omicron", 959 },
{ "pi", 960 },
{ "rho", 961 },
{ "sigmaf", 962 },
{ "sigma", 963 },
{ "tau", 964 },
{ "upsilon", 965 },
{ "phi", 966 },
{ "chi", 967 },
{ "psi", 968 },
{ "omega", 969 },
{ "thetasym", 977 },
{ "upsih", 978 },
{ "piv", 982 },
{ "ensp", 8194 },
{ "emsp", 8195 },
{ "thinsp", 8201 },
{ "zwnj", 8204 },
{ "zwj", 8205 },
{ "lrm", 8206 },
{ "rlm", 8207 },
{ "ndash", 8211 },
{ "mdash", 8212 },
{ "lsquo", 8216 },
{ "rsquo", 8217 },
{ "sbquo", 8218 },
{ "ldquo", 8220 },
{ "rdquo", 8221 },
{ "bdquo", 8222 },
{ "dagger", 8224 },
{ "Dagger", 8225 },
{ "bull", 8226 },
{ "hellip", 8230 },
{ "permil", 8240 },
{ "prime", 8242 },
{ "Prime", 8243 },
{ "lsaquo", 8249 },
{ "rsaquo", 8250 },
{ "oline", 8254 },
{ "frasl", 8260 },
{ "euro", 8364 },
{ "image", 8465 },
{ "weierp", 8472 },
{ "real", 8476 },
{ "trade", 8482 },
{ "alefsym", 8501 },
{ "larr", 8592 },
{ "uarr", 8593 },
{ "rarr", 8594 },
{ "darr", 8595 },
{ "harr", 8596 },
{ "crarr", 8629 },
{ "lArr", 8656 },
{ "uArr", 8657 },
{ "rArr", 8658 },
{ "dArr", 8659 },
{ "hArr", 8660 },
{ "forall", 8704 },
{ "part", 8706 },
{ "exist", 8707 },
{ "empty", 8709 },
{ "nabla", 8711 },
{ "isin", 8712 },
{ "notin", 8713 },
{ "ni", 8715 },
{ "prod", 8719 },
{ "sum", 8721 },
{ "minus", 8722 },
{ "lowast", 8727 },
{ "radic", 8730 },
{ "prop", 8733 },
{ "infin", 8734 },
{ "ang", 8736 },
{ "and", 8743 },
{ "or", 8744 },
{ "cap", 8745 },
{ "cup", 8746 },
{ "int", 8747 },
{ "there4", 8756 },
{ "sim", 8764 },
{ "cong", 8773 },
{ "asymp", 8776 },
{ "ne", 8800 },
{ "equiv", 8801 },
{ "le", 8804 },
{ "ge", 8805 },
{ "sub", 8834 },
{ "sup", 8835 },
{ "nsub", 8836 },
{ "sube", 8838 },
{ "supe", 8839 },
{ "oplus", 8853 },
{ "otimes", 8855 },
{ "perp", 8869 },
{ "sdot", 8901 },
{ "lceil", 8968 },
{ "rceil", 8969 },
{ "lfloor", 8970 },
{ "rfloor", 8971 },
{ "lang", 9001 },
{ "rang", 9002 },
{ "loz", 9674 },
{ "spades", 9824 },
{ "clubs", 9827 },
{ "hearts", 9829 },
{ "diams", 9830 },
#endif // OMEGA_INCLUDED_NAMEDENTITIES_H

View File

@@ -1,231 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "xapianSearcher.h"
#include <sys/types.h>
#include <unicode/locid.h>
#ifndef _WIN32
# include <unistd.h>
#endif
#include <zim/article.h>
#include <zim/error.h>
#include <zim/file.h>
#include <zim/zim.h>
#include "xapian/myhtmlparse.h"
#include <vector>
namespace kiwix
{
std::map<std::string, int> read_valuesmap(const std::string& s)
{
std::map<std::string, int> result;
std::vector<std::string> elems = split(s, ";");
for (std::vector<std::string>::iterator elem = elems.begin();
elem != elems.end();
elem++) {
std::vector<std::string> tmp_elems = split(*elem, ":");
result.insert(
std::pair<std::string, int>(tmp_elems[0], atoi(tmp_elems[1].c_str())));
}
return result;
}
/* Constructor */
XapianSearcher::XapianSearcher(const string& xapianDirectoryPath,
Reader* reader)
: reader(reader)
{
this->openIndex(xapianDirectoryPath);
}
/* Open Xapian readable database */
void XapianSearcher::openIndex(const string& directoryPath)
{
this->readableDatabase = Xapian::Database(directoryPath);
this->valuesmap
= read_valuesmap(this->readableDatabase.get_metadata("valuesmap"));
this->language = this->readableDatabase.get_metadata("language");
this->stopwords = this->readableDatabase.get_metadata("stopwords");
setup_queryParser();
}
/* Close Xapian writable database */
void XapianSearcher::closeIndex()
{
return;
}
void XapianSearcher::setup_queryParser()
{
queryParser.set_database(readableDatabase);
if (!language.empty()) {
/* Build ICU Local object to retrieve ISO-639 language code (from
ISO-639-3) */
icu::Locale languageLocale(language.c_str());
/* Configuring language base steemming */
try {
stemmer = Xapian::Stem(languageLocale.getLanguage());
queryParser.set_stemmer(stemmer);
queryParser.set_stemming_strategy(Xapian::QueryParser::STEM_ALL);
} catch (...) {
std::cout << "No steemming for language '" << languageLocale.getLanguage()
<< "'" << std::endl;
}
}
if (!stopwords.empty()) {
std::string stopWord;
std::istringstream file(this->stopwords);
while (std::getline(file, stopWord, '\n')) {
this->stopper.add(stopWord);
}
queryParser.set_stopper(&(this->stopper));
}
}
/* Search strings in the database */
void XapianSearcher::searchInIndex(string& search,
const unsigned int resultStart,
const unsigned int resultEnd,
const bool verbose)
{
/* Create the query */
Xapian::Query query = queryParser.parse_query(search);
/* Create the enquire object */
Xapian::Enquire enquire(this->readableDatabase);
enquire.set_query(query);
/* Get the results */
this->results = enquire.get_mset(resultStart, resultEnd - resultStart);
this->current_result = this->results.begin();
}
/* Get next result */
Result* XapianSearcher::getNextResult()
{
if (this->current_result != this->results.end()) {
XapianResult* result = new XapianResult(this, this->current_result);
this->current_result++;
return result;
}
return NULL;
}
void XapianSearcher::restart_search()
{
this->current_result = this->results.begin();
}
XapianResult::XapianResult(XapianSearcher* searcher,
Xapian::MSetIterator& iterator)
: searcher(searcher), iterator(iterator), document(iterator.get_document())
{
}
std::string XapianResult::get_url()
{
return document.get_data();
}
std::string XapianResult::get_title()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
return document.get_value(0);
} else if (searcher->valuesmap.find("title") != searcher->valuesmap.end()) {
return document.get_value(searcher->valuesmap["title"]);
}
return "";
}
int XapianResult::get_score()
{
return iterator.get_percent();
}
std::string XapianResult::get_snippet()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
std::string stored_snippet = document.get_value(1);
if (!stored_snippet.empty()) {
return stored_snippet;
}
/* Let's continue here, and see if we can genenate one */
} else if (searcher->valuesmap.find("snippet") != searcher->valuesmap.end()) {
return document.get_value(searcher->valuesmap["snippet"]);
}
/* No reader, no snippet */
if (!searcher->reader) {
return "";
}
/* Get the content of the article to generate a snippet.
We parse it and use the html dump to avoid remove html tags in the
content and be able to nicely cut the text at random place. */
MyHtmlParser htmlParser;
std::string content = get_content();
if (content.empty()) {
return content;
}
try {
htmlParser.parse_html(content, "UTF-8", true);
} catch (...) {
}
return searcher->results.snippet(htmlParser.dump, 500);
}
std::string XapianResult::get_content()
{
if (!searcher->reader) {
return "";
}
auto entry = searcher->reader->getEntryFromEncodedPath(get_url());
return entry.getContent();
}
int XapianResult::get_size()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
return document.get_value(2).empty() == true
? -1
: atoi(document.get_value(2).c_str());
} else if (searcher->valuesmap.find("size") != searcher->valuesmap.end()) {
return atoi(document.get_value(searcher->valuesmap["size"]).c_str());
}
/* The size is never used. Do we really want to get the content and
calculate the size ? */
return -1;
}
int XapianResult::get_wordCount()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
return document.get_value(3).empty() == true
? -1
: atoi(document.get_value(3).c_str());
} else if (searcher->valuesmap.find("wordcount")
!= searcher->valuesmap.end()) {
return atoi(document.get_value(searcher->valuesmap["wordcount"]).c_str());
}
return -1;
}
} // Kiwix namespace

264
src/xmlrpc.h Normal file
View File

@@ -0,0 +1,264 @@
#ifndef KIWIX_XMLRPC_H_
#define KIWIX_XMLRPC_H_
#include <tools/otherTools.h>
namespace kiwix {
class InvalidRPCNode : public std::runtime_error {
public:
InvalidRPCNode(const std::string& msg) : std::runtime_error(msg) {};
};
class Struct;
class Array;
class Value {
pugi::xml_node m_value;
public:
Value(pugi::xml_node value) : m_value(value) { }
void set(int value) {
if (!m_value.child("int"))
m_value.append_child("int");
m_value.child("int").text().set(value);
};
int getAsI() const {
if (!m_value.child("int"))
throw InvalidRPCNode("Type Error");
return m_value.child("int").text().as_int();
}
void set(bool value) {
if (!m_value.child("boolean"))
m_value.append_child("boolean");
m_value.child("boolean").text().set(value);
};
int getAsB() const {
if (!m_value.child("boolean"))
throw InvalidRPCNode("Type Error");
return m_value.child("boolean").text().as_bool();
}
void set(const std::string& value) {
if (!m_value.child("string"))
m_value.append_child("string");
m_value.child("string").text().set(value.c_str());
};
std::string getAsS() const {
if (!m_value.child("string"))
throw InvalidRPCNode("Type Error");
return m_value.child("string").text().as_string();
}
void set(double value) {
if (!m_value.child("double"))
m_value.append_child("double");
m_value.child("double").text().set(value);
};
double getAsD() const {
if (!m_value.child("double"))
throw InvalidRPCNode("Type Error");
return m_value.child("double").text().as_double();
}
inline Struct getStruct();
inline Array getArray();
};
class Array {
pugi::xml_node m_array;
public:
Array(pugi::xml_node array) : m_array(array) {
if (!m_array.child("data"))
m_array.append_child("data");
}
Value addValue() {
auto value = m_array.child("data").append_child("value");
return Value(value);
}
Value getValue(int index) const {
auto value = m_array.child("data").child("value");
while(index && value) {
value = value.next_sibling();
index--;
}
if (0==index) {
return Value(value);
} else {
throw InvalidRPCNode("Index error");
}
}
};
class Member {
pugi::xml_node m_member;
public:
Member(pugi::xml_node member) : m_member(member) { }
Value getValue() const {
return Value(m_member.child("value"));
};
};
class Struct {
pugi::xml_node m_struct;
public:
Struct(pugi::xml_node _struct) : m_struct(_struct) { }
Member getMember(const std::string& name) const {
for(auto member=m_struct.first_child(); member; member=member.next_sibling()) {
std::string member_name = member.child("name").text().get();
if (member_name == name) {
return Member(member);
}
}
throw InvalidRPCNode("Key Error");
}
Member addMember(const std::string& name) {
auto member = m_struct.append_child("member");
member.append_child("name").text().set(name.c_str());
member.append_child("value");
return Member(member);
}
};
class Fault : public Struct {
public:
Fault(pugi::xml_node fault) : Struct(fault) {};
int getFaultCode() const {
return getMember("faultCode").getValue().getAsI();
}
std::string getFaultString() const {
return getMember("faultString").getValue().getAsS();
}
};
Struct Value::getStruct() {
if (!m_value.child("struct"))
m_value.append_child("struct");
return Struct(m_value.child("struct"));
}
Array Value::getArray() {
if (!m_value.child("array"))
m_value.append_child("array");
return Array(m_value.child("array"));
}
class Param {
pugi::xml_node m_param;
public:
Param(pugi::xml_node param) : m_param(param) {
if (!m_param.child("value"))
m_param.append_child("value");
};
Value getValue() const {
return Value(m_param.child("value"));
};
};
class Params {
pugi::xml_node m_params;
public:
Params(pugi::xml_node params) : m_params(params) {};
Param addParam() {
auto param = m_params.append_child("param");
return Param(param);
}
Param getParam(int index) const {
auto param = m_params.child("param");
while(index && param) {
param = param.next_sibling();
index--;
}
if (0==index) {
return Param(param);
} else {
throw InvalidRPCNode("Index Error");
}
}
};
class MethodCall {
pugi::xml_document m_doc;
public:
MethodCall(const std::string& methodName, const std::string& secret) {
auto mCall = m_doc.append_child("methodCall");
mCall.append_child("methodName").text().set(methodName.c_str());
mCall.append_child("params");
if (!secret.empty()) {
getParams().addParam().getValue().set(secret);
}
}
Params getParams() const {
return Params(m_doc.child("methodCall").child("params"));
}
Value newParamValue() {
return getParams().addParam().getValue();
}
std::string toString() const {
return nodeToString(m_doc);
}
};
class MethodResponse {
pugi::xml_document m_doc;
public:
MethodResponse(const std::string& content) {
m_doc.load_buffer(content.c_str(), content.size());
}
Params getParams() const {
auto params = m_doc.child("methodResponse").child("params");
if (!params)
throw InvalidRPCNode("No params");
return Params(params);
}
Value getParamValue(int index) const {
return getParams().getParam(index).getValue();
}
bool isFault() const {
return (!!m_doc.child("methodResponse").child("fault"));
}
Fault getFault() const {
auto fault = m_doc.child("methodResponse").child("fault");
if (!fault)
throw InvalidRPCNode("No fault");
return Fault(fault.child("value").child("struct"));
}
};
};
#endif // KIWIX_XMLRPC_H_

View File

@@ -1,15 +1,5 @@
ctpp2c = find_program('ctpp2c', required:false)
if ctpp2c.found()
search_result_template = custom_target('result_template',
input: 'results.tmpl',
output: 'results.ct2',
command: [intermediate_ctpp2c, ctpp2c, '@INPUT@', '@OUTPUT@']
)
resources_list = 'resources_list_ctpp2.txt'
lib_resources = custom_target('resources',
lib_resources = custom_target('resources',
input: 'resources_list.txt',
output: ['kiwixlib-resources.cpp', 'kiwixlib-resources.h'],
command:[res_compiler,
@@ -17,8 +7,5 @@ if ctpp2c.found()
'--hfile', '@OUTPUT1@',
'--source_dir', '@OUTDIR@',
'@INPUT@'],
depends: [search_result_template]
)
else
lib_resources = []
endif
depend_files: files('search_result.tmpl')
)

View File

@@ -1 +1 @@
results.ct2
search_result.tmpl

View File

@@ -91,68 +91,67 @@
}
</style>
<title>Search: <TMPL_var searchPattern></title>
<title>Search: {{searchPattern}}</title>
</head>
<body bgcolor="white">
<div class="header">
<TMPL_if results>
{{#hasResult}}
Results
<b>
<TMPL_var resultStart>-<TMPL_var resultEnd>
{{resultStart}}-{{resultEnd}}
</b> of <b>
<TMPL_var count>
{{count}}
</b> for <b>
<TMPL_var searchPattern>
{{searchPattern}}
</b>
<TMPL_else>
No results were found for <b><TMPL_var searchPattern></b>
</TMPL_if>
{{/hasResult}}
{{^hasResult}}
No results were found for <b>{{searchPattern}}</b>
{{/hasResult}}
</div>
<div class="results">
<ul>
<TMPL_foreach results as result>
{{#results}}
<li>
<a href="<TMPL_var protocolPrefix><TMPL_var result.contentId>/<TMPL_var result.url>">
<TMPL_var result.title>
<a href="{{protocolPrefix}}{{resultContentId}}/{{url}}">
{{title}}
</a>
<cite>
<TMPL_if result.snippet>
<TMPL_var result.snippet>...
</TMPL_if>
</cite>
<TMPL_if wordCount>
<div class="informations"><TMPL_var wordCount> words</div>
</TMPL_if>
{{#snippet}}
<cite>{{>snippet}}...</cite>
{{/snippet}}
{{#wordCount}}
<div class="informations">{{wordCount}} words</div>
{{/wordCount}}
</li>
</TMPL_foreach>
{{/results}}
</ul>
</div>
<div class="footer">
<ul>
<TMPL_if (resultLastPageStart>0)>
{{#resultLastPageStart}}
<li>
<a href="<TMPL_var searchProtocolPrefix>pattern=<TMPL_var searchPatternEncoded><TMPL_if contentId>&content=<TMPL_var contentId></TMPL_if>&start=0&end=<TMPL_var resultRange>">
<a href="{{searchProtocolPrefix}}pattern={{searchPatternEncoded}}{{#contentId}}&content={{.}}{{/contentId}}&start=0&end={{resultRange}}">
</a>
</li>
</TMPL_if>
<TMPL_foreach pages as page>
{{/resultLastPageStart}}
{{#pages}}
<li>
<a <TMPL_if page.selected>class="selected"</TMPL_if>
href="<TMPL_var searchProtocolPrefix>pattern=<TMPL_var searchPatternEncoded><TMPL_if contentId>&content=<TMPL_var contentId></TMPL_if>&start=<TMPL_var page.start>&end=<TMPL_var page.end>">
<TMPL_var page.label>
<a {{#selected}}class="selected"{{/selected}}
href="{{searchProtocolPrefix}}pattern={{searchPatternEncoded}}{{#contentId}}&content={{.}}{{/contentId}}&start={{start}}&end={{end}}">
{{label}}
</a>
</li>
</TMPL_foreach>
<TMPL_if (resultLastPageStart>0)>
{{/pages}}
{{#resultLastPageStart}}
<li>
<a href="<TMPL_var searchProtocolPrefix>pattern=<TMPL_var searchPatternEncoded><TMPL_if contentId>&content=<TMPL_var contentId></TMPL_if>&start=<TMPL_var resultLastPageStart>&end=<TMPL_var (resultLastPageStart+resultRange)>">
<a href="{{searchProtocolPrefix}}pattern={{searchPatternEncoded}}{{#contentId}}&content={{.}}{{/contentId}}&start={{resultLastPageStart}}&end={{lastResult}}">
</a>
</li>
</TMPL_if>
{{/resultLastPageStart}}
</ul>
</div>
</body>

View File

@@ -35,6 +35,7 @@ then
else
export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/x86_64-linux-gnu/pkgconfig
fi
meson . build -Dctpp2-install-prefix=${INSTALL_DIR} ${MESON_OPTION}
export CPPFLAGS="-I${INSTALL_DIR}/include"
meson . build ${MESON_OPTION}
cd build
ninja

View File

@@ -9,14 +9,16 @@ ARCHIVE_NAME=deps_${TRAVIS_OS_NAME}_${PLATFORM}_${REPO_NAME}.tar.gz
cd $HOME
if [[ "$TRAVIS_OS_NAME" == "osx" ]]
then
brew update
brew upgrade python3
pip3 install meson==0.43.0
pip3 install meson
wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-mac.zip
unzip ninja-mac.zip ninja
else
pip3 install --user meson==0.43.0
wget https://bootstrap.pypa.io/get-pip.py
python3.5 get-pip.py --user
python3.5 -m pip install --user --upgrade pip
python3.5 -m pip install --user meson
wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
unzip ninja-linux.zip ninja