Compare commits

...

90 Commits
0.1 ... 1.0.2

Author SHA1 Message Date
Matthieu Gautier
00c8f55cc4 Update name of the project in meson.build.
The right name is `kiwix-lib`, not `kiwixlib`.
Also update the version to be able to do a small fix release.
2018-02-01 16:10:54 +01:00
Matthieu Gautier
f0bcb1960b Merge pull request #93 from kiwix/pkg_config_version
Fix version in pkg_config.
2017-10-23 18:14:56 +02:00
Matthieu Gautier
d4f0344d9d Fix version in pkg_config. 2017-10-23 15:20:56 +02:00
Matthieu Gautier
48078c809b Merge pull request #92 from kiwix/new_version
New release 1.0.0
2017-10-23 10:11:38 +02:00
Matthieu Gautier
3134ab6b56 New release 1.0.0 2017-10-20 15:19:59 +02:00
Matthieu Gautier
41e3707f1b Merge pull request #90 from kiwix/fix_resource_script
[resource_compiler] Make the exception public.
2017-10-10 14:17:44 +02:00
Matthieu Gautier
d801ff36f6 [resource_compiler] Make the exception public.
This is useless to raise an exception if the exception in not published
in the header.
2017-10-10 13:55:43 +02:00
Matthieu Gautier
5623fedfd0 Merge pull request #91 from kiwix/static_deps
Build with static argument when building for android.
2017-10-10 13:55:20 +02:00
Matthieu Gautier
25a05cc64a Build with static dependencies when building for android or static. 2017-10-10 10:48:48 +02:00
Matthieu Gautier
192a249d23 Merge pull request #88 from kiwix/legoktm-patch2
Rename compile_resources.py to less generic name
2017-09-26 18:02:42 +02:00
Kunal Mehta
5c118a87a1 Rename compile_resources.py to less generic name 2017-09-26 17:56:55 +02:00
Matthieu Gautier
ba35f097d9 Merge pull request #89 from kiwix/use_sudo
Use sudo to install pip3 packages.
2017-09-26 17:56:27 +02:00
Matthieu Gautier
093e8c0498 Use sudo to install pip3 packages. 2017-09-26 17:50:05 +02:00
Matthieu Gautier
8b90221866 Merge pull request #80 from kiwix/no_search_on_splitted
Claims that multi part zim has no embedded full text index.
2017-08-15 14:33:35 -04:00
Matthieu Gautier
5c2280e7c7 Claims that multi part zim has no embedded full text index.
We cannot search into an embedded fulltext index if the zim is multipart.
Instead of crashing, let's pretend we have no fulltext index.
2017-08-15 14:19:53 -04:00
Matthieu Gautier
ebd3f622ff Merge pull request #81 from kiwix/no_ctpp2
Allow kiwix-lib to compile without ctpp2c.
2017-08-14 11:19:43 -04:00
Chris Li
cf93c8719f Allow kiwix-lib to compile without ctpp2c.
ctpp2c is used to pre-compile the template resource.
However, on OSX, ctpp2c seems to be difficult to compile, as we don't need
ctpp2 at all on OSX/iOS, lets just stop to force the use of ctpp2c.
2017-08-14 10:42:16 -04:00
Matthieu Gautier
a794849993 Merge pull request #79 from kiwix/fix_get_html
Always set the humanReadableName with the readable in kiwix-search.
2017-08-10 09:45:35 -04:00
Matthieu Gautier
1ff1bf6168 Always set the humanReadableName with the readable in kiwix-search.
We always need a humanReadableName associated with a content to search in.
Do not separate the two values (human readable name and zim) in two
different functions.

This way, we avoid miss-use of the Searcher who could lead to segfault.
2017-08-10 09:20:11 -04:00
Matthieu Gautier
b6e51055a3 Merge pull request #75 from kiwix/kelson42-patch-licensing
Fix license header #73
2017-08-07 14:00:56 +01:00
Kelson
d17e94fd9c Fix license header #73 2017-08-07 11:49:48 +02:00
Kelson
44a282fa4c Merge pull request #74 from kiwix/get_content_title
getContent* methods also allow to get the title.
2017-08-02 21:02:27 +02:00
Matthieu Gautier
d3acae1fd2 getContent* methods also allow to get the title.
Add a `title` write argument to `getContent*` methods.
This argument is filled with the title of the content get.

Also update the JNI accordingly.

Related to kiwix/kiwix-android#214
2017-07-26 11:03:32 +02:00
Kelson
cbb1018a02 Merge pull request #72 from kiwix/jni_licensing_cleaning
Jni licensing cleaning
2017-07-22 21:02:12 +02:00
kelson42
1d1dfbf4da Fix JNI licensing #71 2017-07-22 09:29:28 +02:00
kelson42
b163351b2e Merge branch 'master' of https://github.com/kiwix/kiwix-lib 2017-07-19 22:04:45 +02:00
Kelson
e531c353a6 Merge pull request #69 from kiwix/new_authors
New authors
2017-07-19 22:04:22 +02:00
kelson42
c363933bf4 Create AUTHORS file #48 2017-07-19 21:56:20 +02:00
kelson42
5d46f28926 Create AUTHORS file #48 2017-07-19 21:55:05 +02:00
Kelson
9fa2cfc66b Merge pull request #68 from kiwix/workding_fix
Fix wording problem
2017-07-19 21:49:04 +02:00
kelson42
b6a58d1684 Fix wording problem 2017-07-19 21:39:38 +02:00
kelson42
e3780a2d77 Fix workding problem 2017-07-19 21:38:28 +02:00
Matthieu Gautier
473b62c9b8 Merge pull request #66 from kiwix/multisearch
Multisearch
2017-07-18 16:07:46 +02:00
Matthieu Gautier
bc5f4f5de4 Use right contentId to generate the article url in search template.
As we do multisearch, we must use the associated contentID of the result
to generate the url.
2017-07-18 10:04:40 +02:00
Matthieu Gautier
9cc329dbd2 Support multi-zims search in kiwix-lib.
All the code was already in zimlib.
It is mainly a update of the code using zimlib.

No JNI change for now to not break the API.
2017-07-18 10:04:40 +02:00
Matthieu Gautier
3991e648ed Be able to get the reader index from a search result. 2017-07-17 18:16:11 +02:00
Matthieu Gautier
8d39b0b343 Search result objects now have a get_content method.
This was not necessary when searching in only one zim file as `url` was
enough to get the article (and so the content).

If we want to search in several zim in the same time, we need a way to get
the content directly.
2017-07-17 18:16:11 +02:00
Matthieu Gautier
4a51dd9e00 Fix memory link.
If a `searcher` is already created we must delete it.
If we set the pointer to NULL before, we will never delete it.
2017-07-17 18:16:11 +02:00
Matthieu Gautier
c56e1f0446 Merge pull request #62 from kiwix/suggestion
Suggestions now use xapian database when available.
2017-07-17 17:57:36 +02:00
Matthieu Gautier
d0371cd133 Suggestions now use xapian database when available.
If a embedded fulltext database is present, suggestion will search in it :
 - insensitive case search.
 - search for terms in the middle of the title.
 - xapian will try to complete the last word of the query (as if a '*'
   were added at the end)
2017-07-17 17:17:13 +02:00
Matthieu Gautier
57720ca57b Merge pull request #65 from kiwix/generate_ctpp2_template
Do not crash if no source_dir is given.
2017-07-17 09:59:06 +02:00
Matthieu Gautier
c5b291e1ed Do not crash if no source_dir is given. 2017-07-12 18:35:38 +02:00
Matthieu Gautier
baf254f1aa Merge pull request #64 from kiwix/generate_ctpp2_template
Use ctpp2c to generate template from source instead of use generated one.
2017-07-12 15:50:24 +02:00
Matthieu Gautier
64cc69f6ae Use ctpp2c to generate template from source instead of use generated one.
Fixes #50.
2017-07-12 15:45:44 +02:00
Matthieu Gautier
6da3604df6 Merge pull request #63 from kiwix/remove_unused_tree_h
Removed unused tree.h
2017-07-12 10:19:05 +02:00
Emmanuel Engelhart
89afabc4cd Removed unused tree.h 2017-07-11 20:15:11 +02:00
Matthieu Gautier
80f6d0bf46 Merge pull request #61 from kiwix/code_format
Format all the code using clang-format.
2017-07-11 17:24:19 +02:00
Matthieu Gautier
f76e9d2dbf Format all the code using clang-format.
Add a script `format_code.sh` to easily format the code.
2017-07-05 15:22:34 +02:00
Matthieu Gautier
a205ff00c8 Merge pull request #59 from kiwix/v0.2
Dump the version to 0.2.0
2017-06-27 14:41:27 +02:00
Matthieu Gautier
96f199a327 Dump the version to 0.2.0
Time to make a release.
2017-06-27 14:26:13 +02:00
Matthieu Gautier
0be3aa9d38 Merge pull request #56 from swills/src_reader.cpp_build_fix
Fix type error in build
2017-06-16 15:30:11 +02:00
Steve Wills
4f57e765e5 Fix type error in build
Compilation fails on clang 3.4.1 (and presumably later, tho I haven't tested) with

```
src/reader.cpp:131:59: error: no viable conversion from 'iterator' (aka '__map_iterator<typename __base::iterator>') to 'std::map<std::string, unsigned int>::const_iterator' (aka '__map_const_iterator<typename __base::const_iterator>')
      std::map<std::string, unsigned int>::const_iterator it = counterMap.find("text/html");
                                                          ^    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/v1/map:713:29: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'iterator' (aka '__map_iterator<typename __base::iterator>') to 'const std::__1::__map_const_iterator<std::__1::__tree_const_iterator<std::__1::__value_type<std::__1::basic_string<char>, unsigned int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char>, unsigned int>, void *> *, long> > &' for 1st argument
class _LIBCPP_TYPE_VIS_ONLY __map_const_iterator
                            ^
```

because we are not using the right type for the map iterator. As we are using
C++11, let's use `auto` and make compiler set the right type for us.
2017-06-16 08:47:59 -04:00
Matthieu Gautier
2bcd43af98 Merge pull request #55 from kiwix/fix_android_dirname
Use the cpu name instead of cpu_family as install dir name.
2017-06-14 10:26:28 +02:00
Matthieu Gautier
eb2c750431 Use the android abi instead of cpu_family as install dir name.
Android will look in specific repository to find native libs.
We need to use the `android_abi` name defined in the cross_compilation
file instead of `cpu_family`.
2017-06-14 10:20:18 +02:00
Matthieu Gautier
7132775d67 Merge pull request #54 from kiwix/searchinxapian
Re-add xapian searcher in kiwix-lib.
2017-05-24 18:28:39 +02:00
Matthieu Gautier
c44b2acb56 Re-add xapian searcher in kiwix-lib.
libzim only know how to read embedded full text index in a zim file.
This is nice as we want to embedded the full text index in zim file and
not have separated full text index.

However, we still have some zim+separated index we have to read.
So we have to support the search in separated index for a while.
2017-05-24 16:08:00 +02:00
Matthieu Gautier
0343c23f82 Merge pull request #53 from kiwix/zim_no_countermeta
Check that 'M/Counter' exists before trying to read it.
2017-05-23 17:37:07 +02:00
Matthieu Gautier
7005b65901 Check that 'M/Counter' exists before trying to read it.
Some old zim files may not have a 'Counter` metadata article.
We have to handle this correctly.
2017-05-23 15:26:01 +02:00
Kelson
d360b9143c Merge pull request #51 from kiwix/declare_ctpp2_include_path
Meson 'ctpp2_include_path' has to be declared
2017-05-19 09:10:31 +02:00
Kelson
9963c73150 Meson 'ctpp2_include_path' has to be declared 2017-05-16 16:23:50 +02:00
Matthieu Gautier
41d6f9884c Merge pull request #47 from kiwix/no_ssh_key
Get dependencies from http server, not from ssh.
2017-04-24 17:17:44 +02:00
Matthieu Gautier
8823880348 Get dependencies from http server, not from ssh.
`kiwix-build` now publish intermediate dependencies archives in a
http accessible location.

Let's use this location instead of `scp` the archives.
2017-04-24 17:10:41 +02:00
Matthieu Gautier
ac169558c4 Merge pull request #46 from kiwix/update_android
Update kiwix-lib to new kiwix-android way of building.
2017-04-24 17:01:39 +02:00
Matthieu Gautier
2e43b7e82d Update kiwix-lib to new kiwix-android way of building.
`kiwix-android` is using `kiwix-lib` as an external java application now.
So we need `kiwix-lib` build system to also install application files
(manifest, resources, ..).
2017-04-24 16:37:31 +02:00
Matthieu Gautier
4485cc8d0f Merge pull request #42 from kiwix/search_in_libzim
Search in libzim
2017-04-11 13:26:23 +02:00
Matthieu Gautier
3be4d92c53 Correctly check if we are compiling for linux or not.
In C++11 `linux` is not a reserved word, so compilators do not define it.
A correct way to check if we are compiling for linux is to check for
`__linux__`.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
44a77f5846 Update android jni wrapper to new API. 2017-04-10 14:28:25 +02:00
Matthieu Gautier
9abdc6ce02 Move to c++11.
Zimlib move to c++11 and so, we need a c++11 compiler.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
5ca419bee7 Use the new search API in zimlib.
We do not use xapian anymore. This is all handled by zimlib.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
37f29da63e Beautify a bit the code.
No real change. Just do less code or use higher level API.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
94670847ef Use const when possible in the reader.
Most read operation do not modify the content. So let's use const
as far as possible.
2017-04-10 14:28:25 +02:00
Matthieu Gautier
93b53cc6d0 Merge pull request #43 from kiwix/travisci
Add travis
2017-04-10 14:27:36 +02:00
Matthieu Gautier
cf273a06b4 Add TravisCI.
Now the project is build on every PR using TravisCI.

The project dependencies are get from the archive generated by kiwix-build.
2017-04-10 14:05:58 +02:00
Julian Harty
43e9763091 Merge pull request #41 from kiwix/less_header
Move unicode headers in cpp.
2017-04-06 16:18:02 +02:00
Matthieu Gautier
ef661a2e25 Move unicode headers in cpp.
Unicode headers ends by defining the DONE symbol in a enum.
It can clash with other includes.
(For instance the httpd.h from apache who use `#define DONE -2`).

Both project should not declare such common symbols publicly but we have
to do with them anyway.
2017-04-06 16:17:00 +02:00
Matthieu Gautier
7baa1b9e62 Merge pull request #40 from kiwix/no_indexer
Remove the indexer functionnality from kiwix-lib.
2017-04-06 15:38:43 +02:00
Matthieu Gautier
e28dbe7c7e Remove the indexer functionnality from kiwix-lib.
This is not used anymore.
2017-04-06 15:35:30 +02:00
Matthieu Gautier
2906202056 Merge pull request #39 from kiwix/fix_indexer
Do not use remove readStopWords method.
2017-04-06 13:24:53 +02:00
Matthieu Gautier
ce6c782b66 Do not use remove readStopWords method.
Commit b8d950c removes this symbol.
The indexer is not used anymore and will be soon removed.
So for now, just remove the call to readStopWords until we totally
remove the indexer code.
2017-04-06 13:20:59 +02:00
Matthieu Gautier
9771506985 Merge pull request #35 from kiwix/stem_stop
Let's use stem and stop words information (if) present in the database.
2017-04-04 17:07:48 +02:00
Matthieu Gautier
b8d950c1a0 Use the stop words stored in the database to configure the queryparser.
To properly search in the xapian database, we need to use the same
stop words that the ones used during the indexing.
2017-04-04 17:06:49 +02:00
Matthieu Gautier
998db0eb2b Use the language stored in the database to configure the queryparser.
To properly search in the xapian database, we need a stemmer using the
same language that the one used during the indexing.
2017-04-04 17:06:49 +02:00
Kelson
46fab22a73 Merge pull request #37 from kiwix/fix_android
The `Result` class is not in the `kiwix` namespace. (fix android build)
2017-03-30 07:44:44 +02:00
Matthieu Gautier
72e41082ca The Result class is not in the kiwix namespace.
The commit 83d2725 adapt the jni wrapper to the new search API but try to
use the `Result` class from the `kiwix` namespace but `Result` is not in
the namespace.

A correct fix would be to move `Result` in `kiwix` but it also change the
API for other tools (kiwix-tools). As we will move the search
functionality in `zimlib` it is better to just do this silly fix and
update the API latter when moving the search functionality.
2017-03-29 17:12:23 +02:00
Kelson
c06a041100 Merge pull request #36 from kiwix/no_cpp11
Remove C++11 syntax introduced by commit 9be2abe.
2017-03-28 19:48:30 +02:00
Matthieu Gautier
cecb65e314 Remove C++11 syntax introduced by commit 9be2abe.
The `for( auto elem: elems)` syntax is a C++11 syntax.
We are not using C++11 (even if it would be good idea).
This works on recent compiler (on Fedora 25) but fails on older one
(on Travis).
2017-03-28 17:14:25 +02:00
Matthieu Gautier
62d26c27ff Merge pull request #33 from kiwix/snippets
Snippets
2017-03-28 11:37:45 +02:00
Matthieu Gautier
074c1bcffa Try to generate the snippet if it is not present in the database.
We generate the snippet from the content of the article in the zim so
we need to have a access to the reader.
2017-03-21 16:28:03 +01:00
Matthieu Gautier
9be2abedf3 Check if a valuemaps metadata is available in the database and use it.
This way, we do not make assumption of where the values are stored.
2017-03-21 16:26:03 +01:00
Matthieu Gautier
83d27255cf Do not create all the results at once. Be a bit lazy.
We don't need to generate a vector of result when we do a search.
We better to just keep the handle to the current MSetIterator and
generate the wanted values when needed.
2017-03-21 16:20:17 +01:00
53 changed files with 3267 additions and 5714 deletions

12
.clang-format Normal file
View File

@@ -0,0 +1,12 @@
BasedOnStyle: Google
BinPackArguments: false
BinPackParameters: false
BreakBeforeBinaryOperators: All
BreakBeforeBraces: Linux
DerivePointerAlignment: false
SpacesInContainerLiterals: false
Standard: Cpp11
AllowShortFunctionsOnASingleLine: Inline
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false

13
.travis.yml Normal file
View File

@@ -0,0 +1,13 @@
language: cpp
dist: trusty
sudo: required
cache: ccache
install: travis/install_deps.sh
script: travis/compile.sh
env:
- PLATFORM="native_static"
- PLATFORM="native_dyn"
- PLATFORM="win32_static"
- PLATFORM="win32_dyn"
- PLATFORM="android_arm"
- PLATFORM="android_arm64"

17
AUTHORS Normal file
View File

@@ -0,0 +1,17 @@
Automactic <christopherliqd@gmail.com>
Ayoub DARDORY <ayoubuto@gmail.com>
Cristian Patrasciuc <cristip@google.com>
Dattaz <taz@dattaz.fr>
Elad Keyshawn <elad.keyshawn@gmail.com>
Emmanuel Engelhart <kelson@kiwix.org>
Isaac <mhutti1@gmail.com>
jleow00 <leow.yonghan.jerome@gmail.com>
Julian Harty <julianharty@gmail.com>
Kiran Mathew Koshy <kiranmathewkoshy@gmail.com>
Kunal Mehta <legoktm@member.fsf.org>
Matthieu Gautier <mgautier@kymeria.fr>
Rashiq Ahmad <rashiq.z@gmail.com>
Renaud Gaudin <reg@kiwix.org>
Shivam <ssarodia@gmail.com>
Steve Wills <steve@mouf.net>
Synhershko <synhershko@users.sourceforge.net>

31
ChangeLog Normal file
View File

@@ -0,0 +1,31 @@
kiwix-lib 1.0.0
===============
* Correctly regenerate template resource using cttp2c at compilation time.
* Suggestion use xapian database when available
* Support multi-zim search in kiwix-lib (a search can now search on several
embedded database in zims in the same time)
* Fix some wording
* Fix license issues
* Add out argument to jni getContent* method to get the title of article in
the same time we get the content
* Rename `compile_resources.py` script to `kiwix-compile-resources`
* Use static lib when building for android or in "static mode"
* Make the ResourceNotFound exception public
kiwix-lib 0.2.0
===============
* Generate the snippet from the article content if the snippet is not
directly in the database.
This provide better snippets as they now depending of the query.
* Use the stopwords and the language stored in the fulltext index database to
parse the user query.
* Remove the indexer functionnality.
* Move to C++11 standard.
* Use the fulltext search of the zimlib.
We still have the fulltext search code in kiwix-lib to be able to search in
fulltext index by side of a zim file. (To be remove in the future)
* Few API hanges
* Change a lot of `Reader` methods to const methods.
* Fix some crashes.

36
format_code.sh Executable file
View File

@@ -0,0 +1,36 @@
#!/usr/bin/bash
files=(
"include/library.h"
"include/common/stringTools.h"
"include/common/pathTools.h"
"include/common/otherTools.h"
"include/common/regexTools.h"
"include/common/networkTools.h"
"include/manager.h"
"include/reader.h"
"include/kiwix.h"
"include/xapianSearcher.h"
"include/searcher.h"
"src/library.cpp"
"src/android/kiwix.cpp"
"src/android/org/kiwix/kiwixlib/JNIKiwixBool.java"
"src/android/org/kiwix/kiwixlib/JNIKiwix.java"
"src/android/org/kiwix/kiwixlib/JNIKiwixString.java"
"src/android/org/kiwix/kiwixlib/JNIKiwixInt.java"
"src/searcher.cpp"
"src/common/pathTools.cpp"
"src/common/regexTools.cpp"
"src/common/otherTools.cpp"
"src/common/networkTools.cpp"
"src/common/stringTools.cpp"
"src/xapianSearcher.cpp"
"src/manager.cpp"
"src/reader.cpp"
)
for i in "${files[@]}"
do
echo $i
clang-format -i -style=file $i
done

View File

@@ -24,25 +24,26 @@
#include <winsock2.h>
#include <ws2tcpip.h>
#else
#include <net/if.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <net/if.h>
#include <netdb.h>
#include <unistd.h>
#endif
#include <iostream>
#include <vector>
#include <string>
#include <map>
#include <string>
#include <vector>
namespace kiwix {
std::map<std::string, std::string> getNetworkInterfaces();
std::string getBestPublicIp();
namespace kiwix
{
std::map<std::string, std::string> getNetworkInterfaces();
std::string getBestPublicIp();
}
#endif

View File

@@ -26,8 +26,9 @@
#include <unistd.h>
#endif
namespace kiwix {
void sleep(unsigned int milliseconds);
namespace kiwix
{
void sleep(unsigned int milliseconds);
}
#endif

View File

@@ -20,18 +20,18 @@
#ifndef KIWIX_PATHTOOLS_H
#define KIWIX_PATHTOOLS_H
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fstream>
#include <ios>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#include <fstream>
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <ios>
#include <limits.h>
#ifdef _WIN32
#include <direct.h>
@@ -41,20 +41,21 @@
using namespace std;
bool isRelativePath(const string &path);
bool isRelativePath(const string& path);
string computeAbsolutePath(const string path, const string relativePath);
string computeRelativePath(const string path, const string absolutePath);
string removeLastPathElement(const string path, const bool removePreSeparator = false,
const bool removePostSeparator = false);
string appendToDirectory(const string &directoryPath, const string &filename);
string removeLastPathElement(const string path,
const bool removePreSeparator = false,
const bool removePostSeparator = false);
string appendToDirectory(const string& directoryPath, const string& filename);
unsigned int getFileSize(const string &path);
string getFileSizeAsString(const string &path);
bool fileExists(const string &path);
bool makeDirectory(const string &path);
bool copyFile(const string &sourcePath, const string &destPath);
string getLastPathElement(const string &path);
unsigned int getFileSize(const string& path);
string getFileSizeAsString(const string& path);
bool fileExists(const string& path);
bool makeDirectory(const string& path);
bool copyFile(const string& sourcePath, const string& destPath);
string getLastPathElement(const string& path);
string getExecutablePath();
string getCurrentDirectory();
bool writeTextFile(const string &path, const string &content);
bool writeTextFile(const string& path, const string& content);
#endif

View File

@@ -22,11 +22,15 @@
#include <unicode/regex.h>
#include <unicode/ucnv.h>
#include <string>
#include <map>
#include <string>
bool matchRegex(const std::string &content, const std::string &regex);
std::string replaceRegex(const std::string &content, const std::string &replacement, const std::string &regex);
std::string appendToFirstOccurence(const std::string &content, const std::string regex, const std::string &replacement);
bool matchRegex(const std::string& content, const std::string& regex);
std::string replaceRegex(const std::string& content,
const std::string& replacement,
const std::string& regex);
std::string appendToFirstOccurence(const std::string& content,
const std::string regex,
const std::string& replacement);
#endif

View File

@@ -20,52 +20,48 @@
#ifndef KIWIX_STRINGTOOLS_H
#define KIWIX_STRINGTOOLS_H
#include <unicode/translit.h>
#include <unicode/normlzr.h>
#include <unicode/unistr.h>
#include <unicode/rep.h>
#include <unicode/uniset.h>
#include <unicode/ustring.h>
#include <unicode/ucnv.h>
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include "pathTools.h"
namespace kiwix {
namespace kiwix
{
#ifndef __ANDROID__
std::string beautifyInteger(const unsigned int number);
std::string beautifyFileSize(const unsigned int number);
std::string urlEncode(const std::string &c);
void printStringInHexadecimal(const char *s);
void printStringInHexadecimal(UnicodeString s);
void stringReplacement(std::string& str, const std::string& oldStr, const std::string& newStr);
std::string encodeDiples(const std::string& str);
std::string beautifyInteger(const unsigned int number);
std::string beautifyFileSize(const unsigned int number);
std::string urlEncode(const std::string& c);
void printStringInHexadecimal(const char* s);
void printStringInHexadecimal(UnicodeString s);
void stringReplacement(std::string& str,
const std::string& oldStr,
const std::string& newStr);
std::string encodeDiples(const std::string& str);
#endif
std::string removeAccents(const std::string &text);
void loadICUExternalTables();
std::string urlDecode(const std::string &c);
std::string removeAccents(const std::string& text);
void loadICUExternalTables();
std::string urlDecode(const std::string& c);
std::vector<std::string> split(const std::string&, const std::string&);
std::vector<std::string> split(const char*, const char*);
std::vector<std::string> split(const std::string&, const char*);
std::vector<std::string> split(const char*, const std::string&);
std::vector<std::string> split(const std::string&, const std::string&);
std::vector<std::string> split(const char*, const char*);
std::vector<std::string> split(const std::string&, const char*);
std::vector<std::string> split(const char*, const std::string&);
std::string ucAll(const std::string &word);
std::string lcAll(const std::string &word);
std::string ucFirst(const std::string &word);
std::string lcFirst(const std::string &word);
std::string toTitle(const std::string &word);
std::string ucAll(const std::string& word);
std::string lcAll(const std::string& word);
std::string ucFirst(const std::string& word);
std::string lcFirst(const std::string& word);
std::string toTitle(const std::string& word);
std::string normalize(const std::string &word);
std::string normalize(const std::string& word);
}
#endif

View File

File diff suppressed because it is too large Load Diff

View File

@@ -1,173 +0,0 @@
/*
* Copyright 2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_INDEXER_H
#define KIWIX_INDEXER_H
#include <string>
#include <vector>
#include <stack>
#include <queue>
#include <fstream>
#include <iostream>
#include <sstream>
#include <pthread.h>
#include "common/stringTools.h"
#include "common/otherTools.h"
#include <zim/file.h>
#include <zim/article.h>
#include <zim/fileiterator.h>
#include "reader.h"
using namespace std;
namespace kiwix {
struct indexerToken {
string url;
string accentedTitle;
string title;
string keywords;
string content;
string snippet;
string size;
string wordCount;
};
class Indexer {
typedef void (* ProgressCallback)(const unsigned int processedArticleCount, const unsigned int totalArticleCount);
public:
Indexer();
virtual ~Indexer();
bool start(const string zimPath, const string indexPath, ProgressCallback callback = NULL);
bool stop();
bool isRunning();
unsigned int getProgression();
void setVerboseFlag(const bool value);
protected:
virtual void indexingPrelude(const string indexPath) = 0;
virtual void index(const string &url,
const string &title,
const string &unaccentedTitle,
const string &keywords,
const string &content,
const string &snippet,
const string &size,
const string &wordCount) = 0;
virtual void flush() = 0;
virtual void indexingPostlude(const string indexPath) = 0;
/* Stop words */
std::vector<std::string> stopWords;
void readStopWords(const string languageCode);
/* Others */
unsigned int countWords(const string &text);
/* Boost factor */
unsigned int keywordsBoostFactor;
inline unsigned int getTitleBoostFactor(const unsigned int contentLength) {
return contentLength / 500 + 1;
}
/* Verbose */
pthread_mutex_t verboseMutex;
bool getVerboseFlag();
bool verboseFlag;
private:
ProgressCallback progressCallback;
pthread_mutex_t threadIdsMutex;
/* Article extraction */
pthread_t articleExtractor;
pthread_mutex_t articleExtractorRunningMutex;
static void *extractArticles(void *ptr);
bool articleExtractorRunningFlag;
bool isArticleExtractorRunning();
void articleExtractorRunning(bool value);
/* Article parsing */
pthread_t articleParser;
pthread_mutex_t articleParserRunningMutex;
static void *parseArticles(void *ptr);
bool articleParserRunningFlag;
bool isArticleParserRunning();
void articleParserRunning(bool value);
/* Index writting */
pthread_t articleIndexer;
pthread_mutex_t articleIndexerRunningMutex;
static void *indexArticles(void *ptr);
bool articleIndexerRunningFlag;
bool isArticleIndexerRunning();
void articleIndexerRunning(bool value);
/* To parse queue */
std::queue<indexerToken> toParseQueue;
pthread_mutex_t toParseQueueMutex;
void pushToParseQueue(indexerToken &token);
bool popFromToParseQueue(indexerToken &token);
bool isToParseQueueEmpty();
/* To index queue */
std::queue<indexerToken> toIndexQueue;
pthread_mutex_t toIndexQueueMutex;
void pushToIndexQueue(indexerToken &token);
bool popFromToIndexQueue(indexerToken &token);
bool isToIndexQueueEmpty();
/* Article Count & Progression */
unsigned int articleCount;
pthread_mutex_t articleCountMutex;
void setArticleCount(const unsigned int articleCount);
unsigned int getArticleCount();
/* Progression */
unsigned int progression;
pthread_mutex_t progressionMutex;
void setProgression(const unsigned int progression);
/* getProgression() is public */
/* ZIM path */
pthread_mutex_t zimPathMutex;
string zimPath;
void setZimPath(const string path);
string getZimPath();
/* Index path */
pthread_mutex_t indexPathMutex;
string indexPath;
void setIndexPath(const string path);
string getIndexPath();
/* ZIM id */
pthread_mutex_t zimIdMutex;
string zimId;
void setZimId(const string id);
string getZimId();
};
}
#endif

View File

@@ -22,5 +22,4 @@
#include "library.h"
#endif

View File

@@ -22,86 +22,85 @@
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>
#include <vector>
#include <stack>
#include <string>
#include <vector>
#include "common/stringTools.h"
#include "common/regexTools.h"
#include "common/stringTools.h"
#define KIWIX_LIBRARY_VERSION "20110515"
using namespace std;
namespace kiwix {
namespace kiwix
{
enum supportedIndexType { UNKNOWN, XAPIAN };
enum supportedIndexType { UNKNOWN, XAPIAN };
class Book
{
public:
Book();
~Book();
class Book {
static bool sortByLastOpen(const Book& a, const Book& b);
static bool sortByTitle(const Book& a, const Book& b);
static bool sortBySize(const Book& a, const Book& b);
static bool sortByDate(const Book& a, const Book& b);
static bool sortByCreator(const Book& a, const Book& b);
static bool sortByPublisher(const Book& a, const Book& b);
static bool sortByLanguage(const Book& a, const Book& b);
string getHumanReadableIdFromPath();
public:
Book();
~Book();
string id;
string path;
string pathAbsolute;
string last;
string indexPath;
string indexPathAbsolute;
supportedIndexType indexType;
string title;
string description;
string language;
string creator;
string publisher;
string date;
string url;
string name;
string tags;
string origId;
string articleCount;
string mediaCount;
bool readOnly;
string size;
string favicon;
string faviconMimeType;
};
static bool sortByLastOpen(const Book &a, const Book &b);
static bool sortByTitle(const Book &a, const Book &b);
static bool sortBySize(const Book &a, const Book &b);
static bool sortByDate(const Book &a, const Book &b);
static bool sortByCreator(const Book &a, const Book &b);
static bool sortByPublisher(const Book &a, const Book &b);
static bool sortByLanguage(const Book &a, const Book &b);
string getHumanReadableIdFromPath();
class Library
{
public:
Library();
~Library();
string id;
string path;
string pathAbsolute;
string last;
string indexPath;
string indexPathAbsolute;
supportedIndexType indexType;
string title;
string description;
string language;
string creator;
string publisher;
string date;
string url;
string name;
string tags;
string origId;
string articleCount;
string mediaCount;
bool readOnly;
string size;
string favicon;
string faviconMimeType;
};
class Library {
public:
Library();
~Library();
string version;
bool addBook(const Book &book);
bool removeBookByIndex(const unsigned int bookIndex);
vector <kiwix::Book> books;
/*
* 'current' is the variable storing the current content/book id
* in the library. This is used to be able to load per default a
* content. As Kiwix may work with many library XML files, you may
* have "current" defined many time with different values. The
* last XML file read has the priority, Although we do not have an
* library object for each file, we want to be able to fallback to
* an 'old' current book if the one which should be load
* failed. That is the reason why we need a stack here
*/
stack<string> current;
};
string version;
bool addBook(const Book& book);
bool removeBookByIndex(const unsigned int bookIndex);
vector<kiwix::Book> books;
/*
* 'current' is the variable storing the current content/book id
* in the library. This is used to be able to load per default a
* content. As Kiwix may work with many library XML files, you may
* have "current" defined many time with different values. The
* last XML file read has the priority, Although we do not have an
* library object for each file, we want to be able to fallback to
* an 'old' current book if the one which should be load
* failed. That is the reason why we need a stack here
*/
stack<string> current;
};
}
#endif

View File

@@ -20,73 +20,89 @@
#ifndef KIWIX_MANAGER_H
#define KIWIX_MANAGER_H
#include <string>
#include <sstream>
#include <time.h>
#include <sstream>
#include <string>
#include <pugixml.hpp>
#include "common/base64.h"
#include "common/regexTools.h"
#include "common/pathTools.h"
#include "common/regexTools.h"
#include "library.h"
#include "reader.h"
using namespace std;
namespace kiwix {
namespace kiwix
{
enum supportedListMode { LASTOPEN, REMOTE, LOCAL };
enum supportedListSortBy { TITLE, SIZE, DATE, CREATOR, PUBLISHER };
enum supportedListMode { LASTOPEN, REMOTE, LOCAL };
enum supportedListSortBy { TITLE, SIZE, DATE, CREATOR, PUBLISHER };
class Manager
{
public:
Manager();
~Manager();
class Manager {
bool readFile(const string path, const bool readOnly = true);
bool readFile(const string nativePath,
const string UTF8Path,
const bool readOnly = true);
bool readXml(const string xml,
const bool readOnly = true,
const string libraryPath = "");
bool writeFile(const string path);
bool removeBookByIndex(const unsigned int bookIndex);
bool removeBookById(const string id);
bool setCurrentBookId(const string id);
string getCurrentBookId();
bool setBookIndex(const string id,
const string path,
const supportedIndexType type);
bool setBookIndex(const string id, const string path);
bool setBookPath(const string id, const string path);
string addBookFromPathAndGetId(const string pathToOpen,
const string pathToSave = "",
const string url = "",
const bool checkMetaData = false);
bool addBookFromPath(const string pathToOpen,
const string pathToSave = "",
const string url = "",
const bool checkMetaData = false);
Library cloneLibrary();
bool getBookById(const string id, Book& book);
bool getCurrentBook(Book& book);
unsigned int getBookCount(const bool localBooks, const bool remoteBooks);
bool updateBookLastOpenDateById(const string id);
void removeBookPaths();
bool listBooks(const supportedListMode mode,
const supportedListSortBy sortBy,
const unsigned int maxSize,
const string language,
const string creator,
const string publisher,
const string search);
vector<string> getBooksLanguages();
vector<string> getBooksCreators();
vector<string> getBooksPublishers();
vector<string> getBooksIds();
public:
Manager();
~Manager();
string writableLibraryPath;
bool readFile(const string path, const bool readOnly = true);
bool readFile(const string nativePath, const string UTF8Path, const bool readOnly = true);
bool readXml(const string xml, const bool readOnly = true, const string libraryPath = "");
bool writeFile(const string path);
bool removeBookByIndex(const unsigned int bookIndex);
bool removeBookById(const string id);
bool setCurrentBookId(const string id);
string getCurrentBookId();
bool setBookIndex(const string id, const string path, const supportedIndexType type);
bool setBookIndex(const string id, const string path);
bool setBookPath(const string id, const string path);
string addBookFromPathAndGetId(const string pathToOpen, const string pathToSave = "", const string url = "",
const bool checkMetaData = false);
bool addBookFromPath(const string pathToOpen, const string pathToSave = "", const string url = "",
const bool checkMetaData = false);
Library cloneLibrary();
bool getBookById(const string id, Book &book);
bool getCurrentBook(Book &book);
unsigned int getBookCount(const bool localBooks, const bool remoteBooks);
bool updateBookLastOpenDateById(const string id);
void removeBookPaths();
bool listBooks(const supportedListMode mode, const supportedListSortBy sortBy, const unsigned int maxSize,
const string language, const string creator, const string publisher, const string search);
vector<string> getBooksLanguages();
vector<string> getBooksCreators();
vector<string> getBooksPublishers();
vector<string> getBooksIds();
vector<std::string> bookIdList;
string writableLibraryPath;
protected:
kiwix::Library library;
vector<std::string> bookIdList;
protected:
kiwix::Library library;
bool readBookFromPath(const string path, Book *book = NULL);
bool parseXmlDom(const pugi::xml_document &doc, const bool readOnly, const string libraryPath);
private:
void checkAndCleanBookPaths(Book &book, const string &libraryPath);
};
bool readBookFromPath(const string path, Book* book = NULL);
bool parseXmlDom(const pugi::xml_document& doc,
const bool readOnly,
const string libraryPath);
private:
void checkAndCleanBookPaths(Book& book, const string& libraryPath);
};
}
#endif

View File

@@ -5,12 +5,8 @@ headers = [
'searcher.h'
]
if not get_option('android')
headers += ['indexer.h']
endif
if xapian_dep.found()
headers += ['xapianIndexer.h', 'xapianSearcher.h']
headers += ['xapianSearcher.h']
endif
install_headers(headers, subdir:'kiwix')
@@ -22,7 +18,6 @@ install_headers(
'common/pathTools.h',
'common/regexTools.h',
'common/stringTools.h',
'common/tree.h',
subdir:'kiwix/common'
)

View File

@@ -20,85 +20,110 @@
#ifndef KIWIX_READER_H
#define KIWIX_READER_H
#include <zim/zim.h>
#include <zim/file.h>
#include <zim/article.h>
#include <zim/fileiterator.h>
#include <stdio.h>
#include <string>
#include <zim/article.h>
#include <zim/file.h>
#include <zim/fileiterator.h>
#include <zim/zim.h>
#include <exception>
#include <sstream>
#include <map>
#include <sstream>
#include <string>
#include "common/pathTools.h"
#include "common/stringTools.h"
using namespace std;
namespace kiwix {
namespace kiwix
{
class Reader
{
public:
Reader(const string zimFilePath);
~Reader();
class Reader {
void reset();
unsigned int getArticleCount() const;
unsigned int getMediaCount() const;
unsigned int getGlobalCount() const;
string getZimFilePath() const;
string getId() const;
string getRandomPageUrl() const;
string getFirstPageUrl() const;
string getMainPageUrl() const;
bool getMetatag(const string& url, string& content) const;
string getTitle() const;
string getDescription() const;
string getLanguage() const;
string getName() const;
string getTags() const;
string getDate() const;
string getCreator() const;
string getPublisher() const;
string getOrigId() const;
bool getFavicon(string& content, string& mimeType) const;
bool getPageUrlFromTitle(const string& title, string& url) const;
bool getMimeTypeByUrl(const string& url, string& mimeType) const;
bool getContentByUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
bool getContentByEncodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType,
string& baseUrl) const;
bool getContentByEncodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
bool getContentByDecodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType,
string& baseUrl) const;
bool getContentByDecodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
bool searchSuggestions(const string& prefix,
unsigned int suggestionsCount,
const bool reset = true);
bool searchSuggestionsSmart(const string& prefix,
unsigned int suggestionsCount);
bool urlExists(const string& url) const;
bool hasFulltextIndex() const;
std::vector<std::string> getTitleVariants(const std::string& title) const;
bool getNextSuggestion(string& title);
bool getNextSuggestion(string& title, string& url);
bool canCheckIntegrity() const;
bool isCorrupted() const;
bool parseUrl(const string& url, char* ns, string& title) const;
unsigned int getFileSize() const;
zim::File* getZimFileHandler() const;
bool getArticleObjectByDecodedUrl(const string& url,
zim::Article& article) const;
public:
Reader(const string zimFilePath);
~Reader();
protected:
zim::File* zimFileHandler;
zim::size_type firstArticleOffset;
zim::size_type lastArticleOffset;
zim::size_type currentArticleOffset;
zim::size_type nsACount;
zim::size_type nsICount;
std::string zimFilePath;
void reset();
unsigned int getArticleCount();
unsigned int getMediaCount();
unsigned int getGlobalCount();
string getZimFilePath();
string getId();
string getRandomPageUrl();
string getFirstPageUrl();
string getMainPageUrl();
bool getMetatag(const string &url, string &content);
string getTitle();
string getDescription();
string getLanguage();
string getName();
string getTags();
string getDate();
string getCreator();
string getPublisher();
string getOrigId();
bool getFavicon(string &content, string &mimeType);
bool getPageUrlFromTitle(const string &title, string &url);
bool getMimeTypeByUrl(const string &url, string &mimeType);
bool getContentByUrl(const string &url, string &content, unsigned int &contentLength, string &contentType);
bool getContentByEncodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType, string &baseUrl);
bool getContentByEncodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType);
bool getContentByDecodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType, string &baseUrl);
bool getContentByDecodedUrl(const string &url, string &content, unsigned int &contentLength, string &contentType);
bool searchSuggestions(const string &prefix, unsigned int suggestionsCount, const bool reset = true);
bool searchSuggestionsSmart(const string &prefix, unsigned int suggestionsCount);
bool urlExists(const string &url);
bool hasFulltextIndex();
std::vector<std::string> getTitleVariants(const std::string &title);
bool getNextSuggestion(string &title);
bool getNextSuggestion(string &title, string &url);
bool canCheckIntegrity();
bool isCorrupted();
bool parseUrl(const string &url, char *ns, string &title);
unsigned int getFileSize();
zim::File* getZimFileHandler();
bool getArticleObjectByDecodedUrl(const string &url, zim::Article &article);
protected:
zim::File* zimFileHandler;
zim::size_type firstArticleOffset;
zim::size_type lastArticleOffset;
zim::size_type currentArticleOffset;
zim::size_type nsACount;
zim::size_type nsICount;
std::string zimFilePath;
std::vector< std::vector<std::string> > suggestions;
std::vector< std::vector<std::string> >::iterator suggestionsOffset;
private:
std::map<std::string, unsigned int> parseCounterMetadata();
};
std::vector<std::vector<std::string>> suggestions;
std::vector<std::vector<std::string>>::iterator suggestionsOffset;
private:
std::map<const std::string, unsigned int> parseCounterMetadata() const;
};
}
#endif

View File

@@ -22,69 +22,83 @@
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <unicode/putil.h>
#include <algorithm>
#include <vector>
#include <locale>
#include <cctype>
#include <locale>
#include <string>
#include <vector>
#include <vector>
#include "common/pathTools.h"
#include "common/stringTools.h"
#include <unicode/putil.h>
#include "kiwix_config.h"
using namespace std;
struct Result
namespace kiwix
{
string url;
string title;
int score;
string snippet;
int wordCount;
int size;
class Reader;
class Result
{
public:
virtual ~Result(){};
virtual std::string get_url() = 0;
virtual std::string get_title() = 0;
virtual int get_score() = 0;
virtual std::string get_snippet() = 0;
virtual std::string get_content() = 0;
virtual int get_wordCount() = 0;
virtual int get_size() = 0;
virtual int get_readerIndex() = 0;
};
namespace kiwix {
struct SearcherInternal;
class Searcher
{
public:
Searcher();
Searcher(const string& xapianDirectoryPath,
Reader* reader,
const string& humanReadableName);
~Searcher();
class Searcher {
public:
Searcher();
virtual ~Searcher();
void search(std::string &search, unsigned int resultStart,
unsigned int resultEnd, const bool verbose=false);
bool getNextResult(string &url, string &title, unsigned int &score);
unsigned int getEstimatedResultCount();
bool setProtocolPrefix(const std::string prefix);
bool setSearchProtocolPrefix(const std::string prefix);
void reset();
void setContentHumanReadableId(const string &contentHumanReadableId);
void add_reader(Reader* reader, const std::string& humanReaderName);
void search(std::string& search,
unsigned int resultStart,
unsigned int resultEnd,
const bool verbose = false);
void suggestions(std::string& search, const bool verbose = false);
Result* getNextResult();
void restart_search();
unsigned int getEstimatedResultCount();
bool setProtocolPrefix(const std::string prefix);
bool setSearchProtocolPrefix(const std::string prefix);
void reset();
#ifdef ENABLE_CTPP2
string getHtml();
string getHtml();
#endif
protected:
std::string beautifyInteger(const unsigned int number);
virtual void closeIndex() = 0;
virtual void searchInIndex(string &search, const unsigned int resultStart,
const unsigned int resultEnd, const bool verbose=false) = 0;
std::vector<Result> results;
std::vector<Result>::iterator resultOffset;
std::string searchPattern;
std::string protocolPrefix;
std::string searchProtocolPrefix;
std::string template_ct2;
unsigned int resultCountPerPage;
unsigned int estimatedResultCount;
unsigned int resultStart;
unsigned int resultEnd;
std::string contentHumanReadableId;
};
protected:
std::string beautifyInteger(const unsigned int number);
void closeIndex();
void searchInIndex(string& search,
const unsigned int resultStart,
const unsigned int resultEnd,
const bool verbose = false);
std::vector<Reader*> readers;
std::vector<std::string> humanReaderNames;
SearcherInternal* internal;
std::string searchPattern;
std::string protocolPrefix;
std::string searchProtocolPrefix;
unsigned int resultCountPerPage;
unsigned int estimatedResultCount;
unsigned int resultStart;
unsigned int resultEnd;
std::string contentHumanReadableId;
};
}
#endif

View File

@@ -1,56 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_XAPIAN_INDEXER_H
#define KIWIX_XAPIAN_INDEXER_H
#include <xapian.h>
#include "indexer.h"
using namespace std;
namespace kiwix {
class XapianIndexer : public Indexer {
public:
XapianIndexer();
protected:
void indexingPrelude(const string indexPath);
void index(const string &url,
const string &title,
const string &unaccentedTitle,
const string &keywords,
const string &content,
const string &snippet,
const string &size,
const string &wordCount);
void flush();
void indexingPostlude(const string indexPath);
Xapian::WritableDatabase writableDatabase;
Xapian::Stem stemmer;
Xapian::SimpleStopper stopper;
Xapian::TermGenerator indexer;
};
}
#endif

View File

@@ -21,34 +21,78 @@
#define KIWIX_XAPIAN_SEARCHER_H
#include <xapian.h>
#include "reader.h"
#include "searcher.h"
#include <map>
#include <string>
using namespace std;
namespace kiwix {
namespace kiwix
{
class XapianSearcher;
class NoXapianIndexInZim: public exception {
virtual const char* what() const throw() {
return "There is no fulltext index in the zim file";
}
};
class XapianResult : public Result
{
public:
XapianResult(XapianSearcher* searcher, Xapian::MSetIterator& iterator);
virtual ~XapianResult(){};
class XapianSearcher : public Searcher {
public:
XapianSearcher(const string &xapianDirectoryPath);
virtual ~XapianSearcher() {};
void searchInIndex(string &search, const unsigned int resultStart, const unsigned int resultEnd,
const bool verbose=false);
virtual std::string get_url();
virtual std::string get_title();
virtual int get_score();
virtual std::string get_snippet();
virtual std::string get_content();
virtual int get_wordCount();
virtual int get_size();
virtual int get_readerIndex() { return 0; };
protected:
void closeIndex();
void openIndex(const string &xapianDirectoryPath);
private:
XapianSearcher* searcher;
Xapian::MSetIterator iterator;
Xapian::Document document;
};
Xapian::Database readableDatabase;
Xapian::Stem stemmer;
};
class NoXapianIndexInZim : public exception
{
virtual const char* what() const throw()
{
return "There is no fulltext index in the zim file";
}
};
class XapianSearcher
{
friend class XapianResult;
public:
XapianSearcher(const string& xapianDirectoryPath, Reader* reader);
virtual ~XapianSearcher(){};
void searchInIndex(string& search,
const unsigned int resultStart,
const unsigned int resultEnd,
const bool verbose = false);
virtual Result* getNextResult();
void restart_search();
Xapian::MSet results;
protected:
void closeIndex();
void openIndex(const string& xapianDirectoryPath);
void setup_queryParser();
Reader* reader;
Xapian::Database readableDatabase;
std::string language;
std::string stopwords;
Xapian::QueryParser queryParser;
Xapian::Stem stemmer;
Xapian::SimpleStopper stopper;
Xapian::MSetIterator current_result;
std::map<std::string, int> valuesmap;
};
}
#endif

View File

@@ -4,7 +4,7 @@ includedir=${prefix}/include
Name: libkiwix
Description: A library that contains a lot of things used by used by other kiwix programs
Version: 1.0
Version: @version@
Requires: @requires@
Libs: -L${libdir} -lkiwix @extra_libs@
Cflags: -I${includedir}/ @extra_cflags@

View File

@@ -1,16 +1,19 @@
project('kiwixlib', 'cpp',
version : '0.1.0',
license : 'GPL')
project('kiwix-lib', 'cpp',
version : '1.0.2',
license : 'GPL',
default_options : ['c_std=c11', 'cpp_std=c++11'])
compiler = meson.get_compiler('cpp')
find_library_in_compiler = meson.version().version_compare('>=0.31.0')
static_deps = get_option('android') or get_option('default_library') == 'static'
thread_dep = dependency('threads')
libicu_dep = dependency('icu-i18n')
libzim_dep = dependency('libzim')
pugixml_dep = dependency('pugixml')
libicu_dep = dependency('icu-i18n', static:static_deps)
libzim_dep = dependency('libzim', version : '>=3.0.0', static:static_deps)
pugixml_dep = dependency('pugixml', static:static_deps)
ctpp2_include_path = ''
has_ctpp2_dep = false
ctpp2_prefix_install = get_option('ctpp2-install-prefix')
ctpp2_link_args = []
@@ -61,7 +64,7 @@ else
endif
endif
xapian_dep = dependency('xapian-core', required:false)
xapian_dep = dependency('xapian-core', required:false, static:static_deps)
all_deps = [thread_dep, libicu_dep, libzim_dep, xapian_dep, pugixml_dep]
if has_ctpp2_dep
@@ -98,6 +101,7 @@ pkg_conf.set('prefix', get_option('prefix'))
pkg_conf.set('requires', ' '.join(pkg_requires))
pkg_conf.set('extra_libs', ' '.join(extra_libs))
pkg_conf.set('extra_cflags', extra_cflags)
pkg_conf.set('version', meson.project_version())
configure_file(output : 'kiwix.pc',
configuration : pkg_conf,
input : 'kiwix.pc.in',

8
scripts/ctpp2c.sh Executable file
View File

@@ -0,0 +1,8 @@
#!/usr/bin/env bash
ctpp2c=$1
SOURCE=$(pwd)/$2
DEST=$3
$ctpp2c $SOURCE $DEST

View File

@@ -1,5 +1,24 @@
#!/usr/bin/env python3
'''
Copyright 2016 Matthieu Gautier <mgautier@kymeria.fr>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or any
later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.
'''
import argparse
import os.path
import re
@@ -38,12 +57,21 @@ extern const std::string {identifier};
{namespaces_close}"""
class Resource:
def __init__(self, base_dir, filename):
def __init__(self, base_dirs, filename):
filename = filename.strip()
self.filename = filename
self.identifier = full_identifier(filename)
with open(os.path.join(base_dir, filename), 'rb') as f:
self.data = f.read()
found = False
for base_dir in base_dirs:
try:
with open(os.path.join(base_dir, filename), 'rb') as f:
self.data = f.read()
found = True
break
except FileNotFoundError:
continue
if not found:
raise Exception("Impossible to found {}".format(filename))
def dump_impl(self):
nb_row = len(self.data)//16 + (1 if len(self.data) % 16 else 0)
@@ -78,16 +106,8 @@ master_c_template = """//This file is automaically generated. Do not modify it.
#include <stdlib.h>
#include <fstream>
#include <stdexcept>
#include "{include_file}"
class ResourceNotFound : public std::runtime_error {{
public:
ResourceNotFound(const std::string& what_arg):
std::runtime_error(what_arg)
{{ }};
}};
static std::string init_resource(const char* name, const unsigned char* content, int len)
{{
char * resPath = getenv(name);
@@ -125,11 +145,19 @@ master_h_template = """//This file is automaically generated. Do not modify it.
#define KIWIX_{BASENAME}
#include <string>
#include <stdexcept>
namespace RESOURCE {{
{RESOURCES}
}};
class ResourceNotFound : public std::runtime_error {{
public:
ResourceNotFound(const std::string& what_arg):
std::runtime_error(what_arg)
{{ }};
}};
const std::string& getResource_{basename}(const std::string& name);
#define getResource(a) (getResource_{basename}(a))
@@ -151,13 +179,18 @@ if __name__ == "__main__":
help='The Cpp file name to generate')
parser.add_argument('--hfile',
help='The h file name to generate')
parser.add_argument('--source_dir',
help="Additional directory where to look for resources.",
action='append')
parser.add_argument('resource_file',
help='The list of resources to compile.')
args = parser.parse_args()
base_dir = os.path.dirname(os.path.realpath(args.resource_file))
source_dir = args.source_dir or []
with open(args.resource_file, 'r') as f:
resources = [Resource(base_dir, filename) for filename in f.readlines()]
resources = [Resource([base_dir]+source_dir, filename)
for filename in f.readlines()]
h_identifier = to_identifier(os.path.basename(args.hfile))
with open(args.hfile, 'w') as f:

View File

@@ -1,4 +1,5 @@
res_compiler = find_program('compile_resources.py')
res_compiler = find_program('kiwix-compile-resources')
intermediate_ctpp2c = find_program('ctpp2c.sh')
install_data(res_compiler.path(), install_dir:get_option('bindir'))

View File

@@ -0,0 +1,13 @@
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="kiwix.org.kiwixlib"
>
<application android:allowBackup="true"
android:label="@string/app_name"
android:supportsRtl="true"
>
</application>
</manifest>

View File

@@ -1,3 +1,22 @@
/*
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include <jni.h>
#include "org_kiwix_kiwixlib_JNIKiwix.h"
@@ -7,80 +26,87 @@
#include <iostream>
#include <string>
#include "unicode/putil.h"
#include "reader.h"
#include "xapianSearcher.h"
#include "common/base64.h"
#include "reader.h"
#include "searcher.h"
#include "unicode/putil.h"
#include <android/log.h>
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, "kiwix", __VA_ARGS__)
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, "kiwix", __VA_ARGS__)
#include <xapian.h>
#include <zim/zim.h>
#include <zim/file.h>
#include <zim/article.h>
#include <zim/error.h>
#include <zim/file.h>
#include <zim/zim.h>
/* global variables */
kiwix::Reader *reader = NULL;
kiwix::XapianSearcher *searcher = NULL;
kiwix::Reader* reader = NULL;
kiwix::Searcher* searcher = NULL;
static pthread_mutex_t readerLock = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t searcherLock = PTHREAD_MUTEX_INITIALIZER;
/* c2jni type conversion functions */
jboolean c2jni(const bool &val) {
jboolean c2jni(const bool& val)
{
return val ? JNI_TRUE : JNI_FALSE;
}
jstring c2jni(const std::string &val, JNIEnv *env) {
jstring c2jni(const std::string& val, JNIEnv* env)
{
return env->NewStringUTF(val.c_str());
}
jint c2jni(const int val) {
jint c2jni(const int val)
{
return (jint)val;
}
jint c2jni(const unsigned val) {
jint c2jni(const unsigned val)
{
return (unsigned)val;
}
/* jni2c type conversion functions */
bool jni2c(const jboolean &val) {
bool jni2c(const jboolean& val)
{
return val == JNI_TRUE;
}
std::string jni2c(const jstring &val, JNIEnv *env) {
std::string jni2c(const jstring& val, JNIEnv* env)
{
return std::string(env->GetStringUTFChars(val, 0));
}
int jni2c(const jint val) {
int jni2c(const jint val)
{
return (int)val;
}
/* Method to deal with variable passed by reference */
void setStringObjValue(const std::string &value, const jobject obj, JNIEnv *env) {
void setStringObjValue(const std::string& value, const jobject obj, JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "Ljava/lang/String;");
env->SetObjectField(obj, objFid, c2jni(value, env));
}
void setIntObjValue(const int value, const jobject obj, JNIEnv *env) {
void setIntObjValue(const int value, const jobject obj, JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "I");
env->SetIntField(obj, objFid, value);
}
void setBoolObjValue(const bool value, const jobject obj, JNIEnv *env) {
void setBoolObjValue(const bool value, const jobject obj, JNIEnv* env)
{
jclass objClass = env->GetObjectClass(obj);
jfieldID objFid = env->GetFieldID(objClass, "value", "Z");
env->SetIntField(obj, objFid, c2jni(value));
}
/* Kiwix library functions */
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMainPage(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_getMainPage(JNIEnv* env, jobject obj)
{
jstring url;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -91,13 +117,15 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMainPage(JNIEnv *e
}
}
pthread_mutex_unlock(&readerLock);
return url;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getId(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getId(JNIEnv* env,
jobject obj)
{
jstring id;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -108,13 +136,15 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getId(JNIEnv *env, jo
}
}
pthread_mutex_unlock(&readerLock);
return id;
}
JNIEXPORT jint JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getFileSize(JNIEnv *env, jobject obj) {
JNIEXPORT jint JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getFileSize(JNIEnv* env,
jobject obj)
{
jint size;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -125,13 +155,15 @@ JNIEXPORT jint JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getFileSize(JNIEnv *env,
}
}
pthread_mutex_unlock(&readerLock);
return size;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getCreator(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_getCreator(JNIEnv* env, jobject obj)
{
jstring creator;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -142,13 +174,15 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getCreator(JNIEnv *en
}
}
pthread_mutex_unlock(&readerLock);
return creator;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPublisher(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_getPublisher(JNIEnv* env, jobject obj)
{
jstring publisher;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -159,13 +193,15 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPublisher(JNIEnv *
}
}
pthread_mutex_unlock(&readerLock);
return publisher;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getName(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getName(JNIEnv* env,
jobject obj)
{
jstring name;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -176,33 +212,40 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getName(JNIEnv *env,
}
}
pthread_mutex_unlock(&readerLock);
return name;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getFavicon(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_getFavicon(JNIEnv* env, jobject obj)
{
jstring favicon;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
std::string cContent;
std::string cMime;
reader->getFavicon(cContent, cMime);
favicon = c2jni(base64_encode(reinterpret_cast<const unsigned char*>(cContent.c_str()), cContent.length()), env);
favicon
= c2jni(base64_encode(
reinterpret_cast<const unsigned char*>(cContent.c_str()),
cContent.length()),
env);
} catch (...) {
std::cerr << "Unable to get ZIM favicon" << std::endl;
}
}
pthread_mutex_unlock(&readerLock);
return favicon;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDate(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDate(JNIEnv* env,
jobject obj)
{
jstring date;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -213,13 +256,15 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDate(JNIEnv *env,
}
}
pthread_mutex_unlock(&readerLock);
return date;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getLanguage(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_getLanguage(JNIEnv* env, jobject obj)
{
jstring language;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -230,13 +275,15 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getLanguage(JNIEnv *e
}
}
pthread_mutex_unlock(&readerLock);
return language;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMimeType(JNIEnv *env, jobject obj, jstring url) {
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMimeType(
JNIEnv* env, jobject obj, jstring url)
{
jstring mimeType;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
std::string cUrl = jni2c(url, env);
@@ -249,17 +296,21 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getMimeType(JNIEnv *e
}
}
pthread_mutex_unlock(&readerLock);
return mimeType;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadZIM(JNIEnv *env, jobject obj, jstring path) {
JNIEXPORT jboolean JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_loadZIM(JNIEnv* env, jobject obj, jstring path)
{
jboolean retVal = JNI_TRUE;
std::string cPath = jni2c(path, env);
pthread_mutex_lock(&readerLock);
try {
if (reader != NULL) delete reader;
if (reader != NULL) {
delete reader;
}
reader = new kiwix::Reader(cPath);
} catch (...) {
std::cerr << "Unable to load ZIM " << cPath << std::endl;
@@ -271,9 +322,11 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadZIM(JNIEnv *env,
return retVal;
}
JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getContent(JNIEnv *env, jobject obj, jstring url, jobject mimeTypeObj, jobject sizeObj) {
JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getContent(
JNIEnv* env, jobject obj, jstring url, jobject titleObj, jobject mimeTypeObj, jobject sizeObj)
{
/* Default values */
setStringObjValue("", titleObj, env);
setStringObjValue("", mimeTypeObj, env);
setIntObjValue(0, sizeObj, env);
jbyteArray data = env->NewByteArray(0);
@@ -282,15 +335,18 @@ JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getContent(JNIEnv
if (reader != NULL) {
std::string cUrl = jni2c(url, env);
std::string cData;
std::string cTitle;
std::string cMimeType;
unsigned int cSize = 0;
pthread_mutex_lock(&readerLock);
try {
if (reader->getContentByUrl(cUrl, cData, cSize, cMimeType)) {
if (reader->getContentByUrl(cUrl, cData, cTitle, cSize, cMimeType)) {
data = env->NewByteArray(cSize);
env->SetByteArrayRegion(data, 0, cSize, reinterpret_cast<const jbyte*>(cData.c_str()));
env->SetByteArrayRegion(
data, 0, cSize, reinterpret_cast<const jbyte*>(cData.c_str()));
setStringObjValue(cMimeType, mimeTypeObj, env);
setStringObjValue(cTitle, titleObj, env);
setIntObjValue(cSize, sizeObj, env);
}
} catch (...) {
@@ -298,12 +354,13 @@ JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getContent(JNIEnv
}
pthread_mutex_unlock(&readerLock);
}
return data;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_searchSuggestions
(JNIEnv *env, jobject obj, jstring prefix, jint count) {
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_searchSuggestions(
JNIEnv* env, jobject obj, jstring prefix, jint count)
{
jboolean retVal = JNI_FALSE;
std::string cPrefix = jni2c(prefix, env);
unsigned int cCount = jni2c(count);
@@ -316,15 +373,17 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_searchSuggestions
}
}
} catch (...) {
std::cerr << "Unable to search suggestions for pattern " << cPrefix << std::endl;
std::cerr << "Unable to search suggestions for pattern " << cPrefix
<< std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getNextSuggestion
(JNIEnv *env, jobject obj, jobject titleObj) {
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getNextSuggestion(
JNIEnv* env, jobject obj, jobject titleObj)
{
jboolean retVal = JNI_FALSE;
std::string cTitle;
@@ -344,8 +403,9 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getNextSuggestion
return retVal;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPageUrlFromTitle
(JNIEnv *env, jobject obj, jstring title, jobject urlObj) {
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPageUrlFromTitle(
JNIEnv* env, jobject obj, jstring title, jobject urlObj)
{
jboolean retVal = JNI_FALSE;
std::string cTitle = jni2c(title, env);
std::string cUrl;
@@ -362,12 +422,13 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getPageUrlFromTitle
std::cerr << "Unable to get URL for title " << cTitle << std::endl;
}
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getTitle
(JNIEnv *env , jobject obj, jobject titleObj) {
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getTitle(
JNIEnv* env, jobject obj, jobject titleObj)
{
jboolean retVal = JNI_FALSE;
std::string cTitle;
@@ -384,12 +445,13 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getTitle
pthread_mutex_unlock(&readerLock);
return retVal;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDescription(JNIEnv *env, jobject obj) {
JNIEXPORT jstring JNICALL
Java_org_kiwix_kiwixlib_JNIKiwix_getDescription(JNIEnv* env, jobject obj)
{
jstring description;
pthread_mutex_lock(&readerLock);
if (reader != NULL) {
try {
@@ -400,12 +462,13 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getDescription(JNIEnv
}
}
pthread_mutex_unlock(&readerLock);
return description;
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getRandomPage
(JNIEnv *env, jobject obj, jobject urlObj) {
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getRandomPage(
JNIEnv* env, jobject obj, jobject urlObj)
{
jboolean retVal = JNI_FALSE;
std::string cUrl;
@@ -424,11 +487,12 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_getRandomPage
return retVal;
}
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_setDataDirectory
(JNIEnv *env, jobject obj, jstring dirStr) {
JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_setDataDirectory(
JNIEnv* env, jobject obj, jstring dirStr)
{
std::string cPath = jni2c(dirStr, env);
pthread_mutex_lock(&readerLock);
pthread_mutex_lock(&readerLock);
try {
u_setDataDirectory(cPath.c_str());
} catch (...) {
@@ -437,15 +501,26 @@ JNIEXPORT void JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_setDataDirectory
pthread_mutex_unlock(&readerLock);
}
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadFulltextIndex(JNIEnv *env, jobject obj, jstring path) {
JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadFulltextIndex(
JNIEnv* env, jobject obj, jstring path)
{
jboolean retVal = JNI_TRUE;
std::string cPath = jni2c(path, env);
pthread_mutex_lock(&searcherLock);
searcher = NULL;
try {
if (searcher != NULL) delete searcher;
searcher = new kiwix::XapianSearcher(cPath);
if (searcher != NULL) {
delete searcher;
}
if (!reader || !reader->hasFulltextIndex()) {
// Use old API (no embedded full text index).
searcher = new kiwix::Searcher(cPath, reader, "");
} else {
// Use the new API. We don't care about the human readable name as
// we don't use it (in android).
searcher = new kiwix::Searcher();
searcher->add_reader(reader, "");
}
} catch (...) {
searcher = NULL;
retVal = JNI_FALSE;
@@ -456,23 +531,23 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_loadFulltextIndex(JN
return retVal;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_indexedQuery
(JNIEnv *env, jclass obj, jstring query, jint count) {
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_indexedQuery(
JNIEnv* env, jclass obj, jstring query, jint count)
{
std::string cQuery = jni2c(query, env);
unsigned int cCount = jni2c(count);
std::string url;
std::string title;
kiwix::Result* p_result;
std::string result;
unsigned int score;
pthread_mutex_lock(&searcherLock);
try {
if (searcher != NULL) {
searcher->search(cQuery, 0, count);
while (searcher->getNextResult(url, title, score) &&
!title.empty() &&
!url.empty()) {
result += title + "\n";
while ((p_result = searcher->getNextResult())
&& !(p_result->get_title().empty())
&& !(p_result->get_url().empty())) {
result += p_result->get_title() + "\n";
delete p_result;
}
}
} catch (...) {
@@ -482,5 +557,3 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwix_indexedQuery
return env->NewStringUTF(result.c_str());
}

View File

@@ -11,3 +11,7 @@ kiwix_jni = custom_target('jni',
)
kiwix_sources += ['android/kiwix.cpp', kiwix_jni]
install_subdir('org', install_dir: 'kiwix-lib/java')
install_subdir('res', install_dir: 'kiwix-lib')
install_data('AndroidManifest.xml', install_dir: 'kiwix-lib')

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -23,12 +23,9 @@ import org.kiwix.kiwixlib.JNIKiwixString;
import org.kiwix.kiwixlib.JNIKiwixBool;
import org.kiwix.kiwixlib.JNIKiwixInt;
public class JNIKiwix {
static {
System.loadLibrary("kiwix");
}
public class JNIKiwix
{
static { System.loadLibrary("kiwix"); }
public native String getMainPage();
public native String getId();
@@ -39,9 +36,9 @@ public class JNIKiwix {
public native boolean loadZIM(String path);
public native boolean loadFulltextIndex(String path);
public native byte[] getContent(String url, JNIKiwixString mimeType, JNIKiwixInt size);
public native boolean loadFulltextIndex(String path);
public native byte[] getContent(String url, JNIKiwixString title, JNIKiwixString mimeType, JNIKiwixInt size);
public native boolean searchSuggestions(String prefix, int count);

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,7 +19,7 @@
package org.kiwix.kiwixlib;
public class JNIKiwixBool {
public class JNIKiwixBool
{
public boolean value;
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,8 +19,7 @@
package org.kiwix.kiwixlib;
public class JNIKiwixInt {
public class JNIKiwixInt
{
public int value;
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2013
* Copyright (C) 2013 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,7 +19,7 @@
package org.kiwix.kiwixlib;
public class JNIKiwixString {
public class JNIKiwixString
{
public String value;
}

View File

@@ -0,0 +1,3 @@
<resources>
<string name="app_name">Kiwix Lib</string>
</resources>

View File

@@ -19,44 +19,54 @@
#include <common/networkTools.h>
std::map<std::string, std::string> kiwix::getNetworkInterfaces() {
std::map<std::string, std::string> kiwix::getNetworkInterfaces()
{
std::map<std::string, std::string> interfaces;
#ifdef _WIN32
SOCKET sd = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0);
if (sd == SOCKET_ERROR) {
std::cerr << "Failed to get a socket. Error " << WSAGetLastError() <<
std::endl;
std::cerr << "Failed to get a socket. Error " << WSAGetLastError()
<< std::endl;
return interfaces;
}
INTERFACE_INFO InterfaceList[20];
unsigned long nBytesReturned;
if (WSAIoctl(sd, SIO_GET_INTERFACE_LIST, 0, 0, &InterfaceList,
sizeof(InterfaceList), &nBytesReturned, 0, 0) == SOCKET_ERROR) {
std::cerr << "Failed calling WSAIoctl: error " << WSAGetLastError() <<
std::endl;
if (WSAIoctl(sd,
SIO_GET_INTERFACE_LIST,
0,
0,
&InterfaceList,
sizeof(InterfaceList),
&nBytesReturned,
0,
0)
== SOCKET_ERROR) {
std::cerr << "Failed calling WSAIoctl: error " << WSAGetLastError()
<< std::endl;
return interfaces;
}
int nNumInterfaces = nBytesReturned / sizeof(INTERFACE_INFO);
for (int i = 0; i < nNumInterfaces; ++i) {
sockaddr_in *pAddress;
pAddress = (sockaddr_in *) & (InterfaceList[i].iiAddress);
sockaddr_in* pAddress;
pAddress = (sockaddr_in*)&(InterfaceList[i].iiAddress);
/* Add to the map */
std::string interfaceName = std::string(inet_ntoa(pAddress->sin_addr));
std::string interfaceIp = std::string(inet_ntoa(pAddress->sin_addr));
interfaces.insert(std::pair<std::string, std::string>(interfaceName, interfaceIp));
interfaces.insert(
std::pair<std::string, std::string>(interfaceName, interfaceIp));
}
#else
/* Get Network interfaces information */
char buf[16384];
struct ifconf ifconf;
int fd = socket(PF_INET, SOCK_DGRAM, 0); /* Only IPV4 */
ifconf.ifc_len=sizeof buf;
ifconf.ifc_buf=buf;
if(ioctl(fd, SIOCGIFCONF, &ifconf)!=0) {
ifconf.ifc_len = sizeof buf;
ifconf.ifc_buf = buf;
if (ioctl(fd, SIOCGIFCONF, &ifconf) != 0) {
perror("ioctl(SIOCGIFCONF)");
exit(EXIT_FAILURE);
}
@@ -64,73 +74,86 @@ std::map<std::string, std::string> kiwix::getNetworkInterfaces() {
/* Go through each interface */
int i;
size_t len;
struct ifreq *ifreq;
struct ifreq* ifreq;
ifreq = ifconf.ifc_req;
for (i = 0; i < ifconf.ifc_len; ) {
for (i = 0; i < ifconf.ifc_len;) {
if (ifreq->ifr_addr.sa_family == AF_INET) {
/* Get the network interface ip */
char host[128] = { 0 };
const int error = getnameinfo(&(ifreq->ifr_addr), sizeof ifreq->ifr_addr,
host, sizeof host,
0, 0, NI_NUMERICHOST);
char host[128] = {0};
const int error = getnameinfo(&(ifreq->ifr_addr),
sizeof ifreq->ifr_addr,
host,
sizeof host,
0,
0,
NI_NUMERICHOST);
if (!error) {
std::string interfaceName = std::string(ifreq->ifr_name);
std::string interfaceIp = std::string(host);
/* Add to the map */
interfaces.insert(std::pair<std::string, std::string>(interfaceName, interfaceIp));
interfaces.insert(
std::pair<std::string, std::string>(interfaceName, interfaceIp));
} else {
perror("getnameinfo()");
}
}
/* some systems have ifr_addr.sa_len and adjust the length that
* way, but not mine. weird */
#ifndef linux
len=IFNAMSIZ + ifreq->ifr_addr.sa_len;
/* some systems have ifr_addr.sa_len and adjust the length that
* way, but not mine. weird */
#ifndef __linux__
len = IFNAMSIZ + ifreq->ifr_addr.sa_len;
#else
len=sizeof *ifreq;
len = sizeof *ifreq;
#endif
ifreq=(struct ifreq*)((char*)ifreq+len);
i+=len;
ifreq = (struct ifreq*)((char*)ifreq + len);
i += len;
}
#endif
return interfaces;
}
std::string kiwix::getBestPublicIp() {
std::string kiwix::getBestPublicIp()
{
std::map<std::string, std::string> interfaces = kiwix::getNetworkInterfaces();
#ifndef _WIN32
const char* const prioritizedNames[] =
{ "eth0", "eth1", "wlan0", "wlan1", "en0", "en1" };
const char* const prioritizedNames[]
= {"eth0", "eth1", "wlan0", "wlan1", "en0", "en1"};
const int count = (sizeof prioritizedNames) / (sizeof prioritizedNames[0]);
for (int i = 0; i < count; ++i) {
std::map<std::string, std::string>::const_iterator it =
interfaces.find(prioritizedNames[i]);
if (it != interfaces.end())
std::map<std::string, std::string>::const_iterator it
= interfaces.find(prioritizedNames[i]);
if (it != interfaces.end()) {
return it->second;
}
}
#endif
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end(); ++iter) {
iter != interfaces.end();
++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "192.168")
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "192.168") {
return interfaceIp;
}
}
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end(); ++iter) {
iter != interfaces.end();
++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "172.16.")
if (interfaceIp.length() >= 7 && interfaceIp.substr(0, 7) == "172.16.") {
return interfaceIp;
}
}
for (std::map<std::string, std::string>::iterator iter = interfaces.begin();
iter != interfaces.end(); ++iter) {
iter != interfaces.end();
++iter) {
std::string interfaceIp = iter->second;
if (interfaceIp.length() >= 3 && interfaceIp.substr(0, 3) == "10.")
if (interfaceIp.length() >= 3 && interfaceIp.substr(0, 3) == "10.") {
return interfaceIp;
}
}
return "127.0.0.1";

View File

@@ -19,10 +19,11 @@
#include <common/otherTools.h>
void kiwix::sleep(unsigned int milliseconds) {
void kiwix::sleep(unsigned int milliseconds)
{
#ifdef _WIN32
Sleep(milliseconds);
Sleep(milliseconds);
#else
usleep(1000 * milliseconds);
usleep(1000 * milliseconds);
#endif
}

View File

@@ -20,13 +20,13 @@
#include <common/pathTools.h>
#ifdef __APPLE__
#include <mach-o/dyld.h>
#include <limits.h>
#include <mach-o/dyld.h>
#elif _WIN32
#include <direct.h>
#include <windows.h>
#include "shlwapi.h"
#include <direct.h>
#define getcwd _getcwd // stupid MSFT "deprecation" warning
#define getcwd _getcwd // stupid MSFT "deprecation" warning
#endif
#ifdef _WIN32
@@ -47,7 +47,8 @@
#define PATH_MAX 1024
#endif
bool isRelativePath(const string &path) {
bool isRelativePath(const string& path)
{
#ifdef _WIN32
return path.empty() || path.substr(1, 2) == ":\\" ? false : true;
#else
@@ -55,19 +56,21 @@ bool isRelativePath(const string &path) {
#endif
}
string computeRelativePath(const string path, const string absolutePath) {
string computeRelativePath(const string path, const string absolutePath)
{
std::vector<std::string> pathParts = kiwix::split(path, SEPARATOR);
std::vector<std::string> absolutePathParts = kiwix::split(absolutePath, SEPARATOR);
std::vector<std::string> absolutePathParts
= kiwix::split(absolutePath, SEPARATOR);
unsigned int commonCount = 0;
while (commonCount < pathParts.size() &&
commonCount < absolutePathParts.size() &&
pathParts[commonCount] == absolutePathParts[commonCount]) {
while (commonCount < pathParts.size()
&& commonCount < absolutePathParts.size()
&& pathParts[commonCount] == absolutePathParts[commonCount]) {
if (!pathParts[commonCount].empty()) {
commonCount++;
}
}
string relativePath;
#ifdef _WIN32
/* On Windows you have a token more because the root is represented
@@ -77,10 +80,10 @@ string computeRelativePath(const string path, const string absolutePath) {
}
#endif
for (unsigned int i = commonCount ; i < pathParts.size() ; i++) {
for (unsigned int i = commonCount; i < pathParts.size(); i++) {
relativePath += "../";
}
for (unsigned int i = commonCount ; i < absolutePathParts.size() ; i++) {
for (unsigned int i = commonCount; i < absolutePathParts.size(); i++) {
relativePath += absolutePathParts[i];
relativePath += i + 1 < absolutePathParts.size() ? "/" : "";
}
@@ -89,11 +92,12 @@ string computeRelativePath(const string path, const string absolutePath) {
}
/* Warning: the relative path must be with slashes */
string computeAbsolutePath(const string path, const string relativePath) {
string computeAbsolutePath(const string path, const string relativePath)
{
string absolutePath;
if (path.empty()) {
char *path=NULL;
char* path = NULL;
size_t size = 0;
#ifdef _WIN32
@@ -104,15 +108,17 @@ string computeAbsolutePath(const string path, const string relativePath) {
absolutePath = string(path) + SEPARATOR;
} else {
absolutePath = path.substr(path.length() - 1, 1) == SEPARATOR ? path : path + SEPARATOR;
absolutePath = path.substr(path.length() - 1, 1) == SEPARATOR
? path
: path + SEPARATOR;
}
#if _WIN32
char *cRelativePath = _strdup(relativePath.c_str());
char* cRelativePath = _strdup(relativePath.c_str());
#else
char *cRelativePath = strdup(relativePath.c_str());
char* cRelativePath = strdup(relativePath.c_str());
#endif
char *token = strtok(cRelativePath, "/");
char* token = strtok(cRelativePath, "/");
while (token != NULL) {
if (string(token) == "..") {
@@ -121,8 +127,9 @@ string computeAbsolutePath(const string path, const string relativePath) {
} else if (strcmp(token, ".") && strcmp(token, "")) {
absolutePath += string(token);
token = strtok(NULL, "/");
if (token != NULL)
absolutePath += SEPARATOR;
if (token != NULL) {
absolutePath += SEPARATOR;
}
} else {
token = strtok(NULL, "/");
}
@@ -131,31 +138,38 @@ string computeAbsolutePath(const string path, const string relativePath) {
return absolutePath;
}
string removeLastPathElement(const string path, const bool removePreSeparator, const bool removePostSeparator) {
string removeLastPathElement(const string path,
const bool removePreSeparator,
const bool removePostSeparator)
{
string newPath = path;
size_t offset = newPath.find_last_of(SEPARATOR);
if (removePreSeparator &&
if (removePreSeparator &&
#ifndef _WIN32
offset != newPath.find_first_of(SEPARATOR) &&
offset != newPath.find_first_of(SEPARATOR) &&
#endif
offset == newPath.length()-1) {
offset == newPath.length() - 1) {
newPath = newPath.substr(0, offset);
offset = newPath.find_last_of(SEPARATOR);
}
newPath = removePostSeparator ? newPath.substr(0, offset) : newPath.substr(0, offset+1);
newPath = removePostSeparator ? newPath.substr(0, offset)
: newPath.substr(0, offset + 1);
return newPath;
}
string appendToDirectory(const string &directoryPath, const string &filename) {
string appendToDirectory(const string& directoryPath, const string& filename)
{
string newPath = directoryPath + SEPARATOR + filename;
return newPath;
}
string getLastPathElement(const string &path) {
string getLastPathElement(const string& path)
{
return path.substr(path.find_last_of(SEPARATOR) + 1);
}
unsigned int getFileSize(const string &path) {
unsigned int getFileSize(const string& path)
{
#ifdef _WIN32
struct _stat filestatus;
_stat(path.c_str(), &filestatus);
@@ -167,12 +181,15 @@ unsigned int getFileSize(const string &path) {
return filestatus.st_size / 1024;
}
string getFileSizeAsString(const string &path) {
ostringstream convert; convert << getFileSize(path);
string getFileSizeAsString(const string& path)
{
ostringstream convert;
convert << getFileSize(path);
return convert.str();
}
bool fileExists(const string &path) {
bool fileExists(const string& path)
{
#ifdef _WIN32
return PathFileExists(path.c_str());
#else
@@ -187,7 +204,8 @@ bool fileExists(const string &path) {
#endif
}
bool makeDirectory(const string &path) {
bool makeDirectory(const string& path)
{
#ifdef _WIN32
int status = _mkdir(path.c_str());
#else
@@ -197,18 +215,19 @@ bool makeDirectory(const string &path) {
}
/* Try to create a link and if does not work then make a copy */
bool copyFile(const string &sourcePath, const string &destPath) {
bool copyFile(const string& sourcePath, const string& destPath)
{
try {
#ifndef _WIN32
if (link(sourcePath.c_str(), destPath.c_str()) != 0) {
#endif
std::ifstream infile(sourcePath.c_str(), std::ios_base::binary);
std::ofstream outfile(destPath.c_str(), std::ios_base::binary);
outfile << infile.rdbuf();
std::ifstream infile(sourcePath.c_str(), std::ios_base::binary);
std::ofstream outfile(destPath.c_str(), std::ios_base::binary);
outfile << infile.rdbuf();
#ifndef _WIN32
}
#endif
} catch (exception &e) {
} catch (exception& e) {
cerr << e.what() << endl;
return false;
}
@@ -216,18 +235,19 @@ bool copyFile(const string &sourcePath, const string &destPath) {
return true;
}
string getExecutablePath() {
string getExecutablePath()
{
char binRootPath[PATH_MAX];
#ifdef _WIN32
GetModuleFileName( NULL, binRootPath, PATH_MAX);
GetModuleFileName(NULL, binRootPath, PATH_MAX);
return std::string(binRootPath);
#elif __APPLE__
uint32_t max = (uint32_t)PATH_MAX;
_NSGetExecutablePath(binRootPath, &max);
return std::string(binRootPath);
#else
ssize_t size = readlink("/proc/self/exe", binRootPath, PATH_MAX);
ssize_t size = readlink("/proc/self/exe", binRootPath, PATH_MAX);
if (size != -1) {
return std::string(binRootPath, size);
}
@@ -236,7 +256,8 @@ string getExecutablePath() {
return "";
}
bool writeTextFile(const string &path, const string &content) {
bool writeTextFile(const string& path, const string& content)
{
std::ofstream file;
file.open(path.c_str());
file << content;
@@ -244,8 +265,9 @@ bool writeTextFile(const string &path, const string &content) {
return true;
}
string getCurrentDirectory() {
char* a_cwd = getcwd(NULL,0);
string getCurrentDirectory()
{
char* a_cwd = getcwd(NULL, 0);
string s_cwd(a_cwd);
free(a_cwd);
return s_cwd;

View File

@@ -21,10 +21,11 @@
std::map<std::string, RegexMatcher*> regexCache;
RegexMatcher *buildRegex(const std::string &regex) {
RegexMatcher *matcher;
RegexMatcher* buildRegex(const std::string& regex)
{
RegexMatcher* matcher;
std::map<std::string, RegexMatcher*>::iterator itr = regexCache.find(regex);
/* Regex is in cache */
if (itr != regexCache.end()) {
matcher = itr->second;
@@ -42,22 +43,26 @@ RegexMatcher *buildRegex(const std::string &regex) {
}
/* todo */
void freeRegexCache() {
void freeRegexCache()
{
}
bool matchRegex(const std::string &content, const std::string &regex) {
bool matchRegex(const std::string& content, const std::string& regex)
{
ucnv_setDefaultName("UTF-8");
UnicodeString ucontent = UnicodeString(content.c_str());
RegexMatcher *matcher = buildRegex(regex);
RegexMatcher* matcher = buildRegex(regex);
matcher->reset(ucontent);
return matcher->find();
}
std::string replaceRegex(const std::string &content, const std::string &replacement, const std::string &regex) {
std::string replaceRegex(const std::string& content,
const std::string& replacement,
const std::string& regex)
{
ucnv_setDefaultName("UTF-8");
UnicodeString ucontent = UnicodeString(content.c_str());
UnicodeString ureplacement = UnicodeString(replacement.c_str());
RegexMatcher *matcher = buildRegex(regex);
RegexMatcher* matcher = buildRegex(regex);
matcher->reset(ucontent);
UErrorCode status = U_ZERO_ERROR;
UnicodeString uresult = matcher->replaceAll(ureplacement, status);
@@ -66,16 +71,19 @@ std::string replaceRegex(const std::string &content, const std::string &replacem
return tmp;
}
std::string appendToFirstOccurence(const std::string &content, const std::string regex, const std::string &replacement) {
std::string appendToFirstOccurence(const std::string& content,
const std::string regex,
const std::string& replacement)
{
ucnv_setDefaultName("UTF-8");
UnicodeString ucontent = UnicodeString(content.c_str());
UnicodeString ureplacement = UnicodeString(replacement.c_str());
RegexMatcher *matcher = buildRegex(regex);
RegexMatcher* matcher = buildRegex(regex);
matcher->reset(ucontent);
if (matcher->find()) {
UErrorCode status = U_ZERO_ERROR;
ucontent.insert(matcher->end(status), ureplacement);
ucontent.insert(matcher->end(status), ureplacement);
std::string tmp;
ucontent.toUTF8String(tmp);
return tmp;
@@ -83,4 +91,3 @@ std::string appendToFirstOccurence(const std::string &content, const std::strin
return content;
}

View File

@@ -19,25 +19,36 @@
#include <common/stringTools.h>
#include <unicode/normlzr.h>
#include <unicode/rep.h>
#include <unicode/translit.h>
#include <unicode/ucnv.h>
#include <unicode/uniset.h>
#include <unicode/ustring.h>
/* tell ICU where to find its dat file (tables) */
void kiwix::loadICUExternalTables() {
void kiwix::loadICUExternalTables()
{
#ifdef __APPLE__
std::string executablePath = getExecutablePath();
std::string executableDirectory = removeLastPathElement(executablePath);
std::string datPath = computeAbsolutePath(executableDirectory, "icudt49l.dat");
try {
u_setDataDirectory(datPath.c_str());
} catch (exception &e) {
std::cerr << e.what() << std::endl;
}
std::string executablePath = getExecutablePath();
std::string executableDirectory = removeLastPathElement(executablePath);
std::string datPath
= computeAbsolutePath(executableDirectory, "icudt49l.dat");
try {
u_setDataDirectory(datPath.c_str());
} catch (exception& e) {
std::cerr << e.what() << std::endl;
}
#endif
}
std::string kiwix::removeAccents(const std::string &text) {
std::string kiwix::removeAccents(const std::string& text)
{
loadICUExternalTables();
ucnv_setDefaultName("UTF-8");
UErrorCode status = U_ZERO_ERROR;
Transliterator *removeAccentsTrans = Transliterator::createInstance("Lower; NFD; [:M:] remove; NFC", UTRANS_FORWARD, status);
Transliterator* removeAccentsTrans = Transliterator::createInstance(
"Lower; NFD; [:M:] remove; NFC", UTRANS_FORWARD, status);
UnicodeString ustring = UnicodeString(text.c_str());
removeAccentsTrans->transliterate(ustring);
delete removeAccentsTrans;
@@ -49,7 +60,8 @@ std::string kiwix::removeAccents(const std::string &text) {
#ifndef __ANDROID__
/* Prepare integer for display */
std::string kiwix::beautifyInteger(const unsigned int number) {
std::string kiwix::beautifyInteger(const unsigned int number)
{
std::stringstream numberStream;
numberStream << number;
std::string numberString = numberStream.str();
@@ -63,49 +75,58 @@ std::string kiwix::beautifyInteger(const unsigned int number) {
return numberString;
}
std::string kiwix::beautifyFileSize(const unsigned int number) {
if (number > 1024*1024) {
return kiwix::beautifyInteger(number/(1024*1024)) + " GB";
std::string kiwix::beautifyFileSize(const unsigned int number)
{
if (number > 1024 * 1024) {
return kiwix::beautifyInteger(number / (1024 * 1024)) + " GB";
} else {
return kiwix::beautifyInteger(number/1024 !=
0 ? number/1024 : 1) + " MB";
return kiwix::beautifyInteger(number / 1024 != 0 ? number / 1024 : 1)
+ " MB";
}
}
void kiwix::printStringInHexadecimal(UnicodeString s) {
void kiwix::printStringInHexadecimal(UnicodeString s)
{
std::cout << std::showbase << std::hex;
for (int i=0; i<s.length(); i++) {
for (int i = 0; i < s.length(); i++) {
char c = (char)((s.getTerminatedBuffer())[i]);
if (c & 0x80)
if (c & 0x80) {
std::cout << (c & 0xffff) << " ";
else
} else {
std::cout << c << " ";
}
}
std::cout << std::endl;
}
void kiwix::printStringInHexadecimal(const char *s) {
void kiwix::printStringInHexadecimal(const char* s)
{
std::cout << std::showbase << std::hex;
for (char const* pc = s; *pc; ++pc) {
if (*pc & 0x80)
if (*pc & 0x80) {
std::cout << (*pc & 0xffff);
else
} else {
std::cout << *pc;
}
std::cout << ' ';
}
std::cout << std::endl;
}
void kiwix::stringReplacement(std::string& str, const std::string& oldStr, const std::string& newStr) {
void kiwix::stringReplacement(std::string& str,
const std::string& oldStr,
const std::string& newStr)
{
size_t pos = 0;
while((pos = str.find(oldStr, pos)) != std::string::npos) {
while ((pos = str.find(oldStr, pos)) != std::string::npos) {
str.replace(pos, oldStr.length(), newStr);
pos += newStr.length();
}
}
/* Encode string to avoid XSS attacks */
std::string kiwix::encodeDiples(const std::string& str) {
std::string kiwix::encodeDiples(const std::string& str)
{
std::string result = str;
kiwix::stringReplacement(result, "<", "&lt;");
kiwix::stringReplacement(result, ">", "&gt;");
@@ -113,58 +134,68 @@ std::string kiwix::encodeDiples(const std::string& str) {
}
// Urlencode
//based on javascript encodeURIComponent()
// based on javascript encodeURIComponent()
std::string char2hex(char dec) {
char dig1 = (dec&0xF0)>>4;
char dig2 = (dec&0x0F);
if ( 0<= dig1 && dig1<= 9) dig1+=48; //0,48inascii
if (10<= dig1 && dig1<=15) dig1+=97-10; //a,97inascii
if ( 0<= dig2 && dig2<= 9) dig2+=48;
if (10<= dig2 && dig2<=15) dig2+=97-10;
std::string char2hex(char dec)
{
char dig1 = (dec & 0xF0) >> 4;
char dig2 = (dec & 0x0F);
if (0 <= dig1 && dig1 <= 9) {
dig1 += 48; // 0,48inascii
}
if (10 <= dig1 && dig1 <= 15) {
dig1 += 97 - 10; // a,97inascii
}
if (0 <= dig2 && dig2 <= 9) {
dig2 += 48;
}
if (10 <= dig2 && dig2 <= 15) {
dig2 += 97 - 10;
}
std::string r;
r.append( &dig1, 1);
r.append( &dig2, 1);
r.append(&dig1, 1);
r.append(&dig2, 1);
return r;
}
std::string kiwix::urlEncode(const std::string &c) {
std::string escaped="";
std::string kiwix::urlEncode(const std::string& c)
{
std::string escaped = "";
int max = c.length();
for(int i=0; i<max; i++)
{
if ( (48 <= c[i] && c[i] <= 57) ||//0-9
(65 <= c[i] && c[i] <= 90) ||//abc...xyz
(97 <= c[i] && c[i] <= 122) || //ABC...XYZ
(c[i]=='~' || c[i]=='!' || c[i]=='*' || c[i]=='(' || c[i]==')' || c[i]=='\'')
)
{
escaped.append( &c[i], 1);
}
else
{
escaped.append("%");
escaped.append( char2hex(c[i]) );//converts char 255 to string "ff"
}
for (int i = 0; i < max; i++) {
if ((48 <= c[i] && c[i] <= 57) || // 0-9
(65 <= c[i] && c[i] <= 90)
|| // abc...xyz
(97 <= c[i] && c[i] <= 122)
|| // ABC...XYZ
(c[i] == '~' || c[i] == '!' || c[i] == '*' || c[i] == '(' || c[i] == ')'
|| c[i] == '\'')) {
escaped.append(&c[i], 1);
} else {
escaped.append("%");
escaped.append(char2hex(c[i])); // converts char 255 to string "ff"
}
}
return escaped;
}
#endif
static char charFromHex(std::string a) {
static char charFromHex(std::string a)
{
std::istringstream Blat(a);
int Z;
Blat >> std::hex >> Z;
return char (Z);
return char(Z);
}
std::string kiwix::urlDecode(const std::string &originalUrl) {
std::string kiwix::urlDecode(const std::string& originalUrl)
{
std::string url = originalUrl;
std::string::size_type pos = 0;
while ((pos = url.find('%', pos)) != std::string::npos &&
pos + 2 < url.length()) {
while ((pos = url.find('%', pos)) != std::string::npos
&& pos + 2 < url.length()) {
url.replace(pos, 3, 1, charFromHex(url.substr(pos + 1, 2)));
++pos;
}
@@ -172,39 +203,43 @@ std::string kiwix::urlDecode(const std::string &originalUrl) {
}
/* Split string in a token array */
std::vector<std::string> kiwix::split(const std::string & str,
const std::string & delims=" *-")
std::vector<std::string> kiwix::split(const std::string& str,
const std::string& delims = " *-")
{
std::string::size_type lastPos = str.find_first_not_of(delims, 0);
std::string::size_type pos = str.find_first_of(delims, lastPos);
std::vector<std::string> tokens;
while (std::string::npos != pos || std::string::npos != lastPos)
{
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = str.find_first_not_of(delims, pos);
pos = str.find_first_of(delims, lastPos);
}
while (std::string::npos != pos || std::string::npos != lastPos) {
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = str.find_first_not_of(delims, pos);
pos = str.find_first_of(delims, lastPos);
}
return tokens;
}
std::vector<std::string> kiwix::split(const char* lhs, const char* rhs){
const std::string m1 (lhs), m2 (rhs);
std::vector<std::string> kiwix::split(const char* lhs, const char* rhs)
{
const std::string m1(lhs), m2(rhs);
return split(m1, m2);
}
std::vector<std::string> kiwix::split(const char* lhs, const std::string& rhs){
std::vector<std::string> kiwix::split(const char* lhs, const std::string& rhs)
{
return split(lhs, rhs.c_str());
}
std::vector<std::string> kiwix::split(const std::string& lhs, const char* rhs){
std::vector<std::string> kiwix::split(const std::string& lhs, const char* rhs)
{
return split(lhs.c_str(), rhs);
}
std::string kiwix::ucFirst (const std::string &word) {
if (word.empty())
std::string kiwix::ucFirst(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
@@ -216,9 +251,11 @@ std::string kiwix::ucFirst (const std::string &word) {
return result;
}
std::string kiwix::ucAll (const std::string &word) {
if (word.empty())
std::string kiwix::ucAll(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
@@ -228,9 +265,11 @@ std::string kiwix::ucAll (const std::string &word) {
return result;
}
std::string kiwix::lcFirst (const std::string &word) {
if (word.empty())
std::string kiwix::lcFirst(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
@@ -242,9 +281,11 @@ std::string kiwix::lcFirst (const std::string &word) {
return result;
}
std::string kiwix::lcAll (const std::string &word) {
if (word.empty())
std::string kiwix::lcAll(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
@@ -254,9 +295,11 @@ std::string kiwix::lcAll (const std::string &word) {
return result;
}
std::string kiwix::toTitle (const std::string &word) {
if (word.empty())
std::string kiwix::toTitle(const std::string& word)
{
if (word.empty()) {
return "";
}
std::string result;
@@ -267,6 +310,7 @@ std::string kiwix::toTitle (const std::string &word) {
return result;
}
std::string kiwix::normalize (const std::string &word) {
std::string kiwix::normalize(const std::string& word)
{
return kiwix::lcAll(word);
}

View File

@@ -1,528 +0,0 @@
/*
* Copyright 2011-2014 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "indexer.h"
#include "xapian/myhtmlparse.h"
#include "kiwixlib-resources.h"
namespace kiwix {
/* Count word */
unsigned int Indexer::countWords(const string &text) {
unsigned int numWords = 1;
unsigned int length = text.size();
for(unsigned int i=0; i<length;) {
while(i<length && text[i] != ' ') {
i++;
}
numWords++;
i++;
}
return numWords;
}
/* Constructor */
Indexer::Indexer() :
keywordsBoostFactor(3),
verboseFlag(false) {
/* Initialize mutex */
pthread_mutex_init(&threadIdsMutex, NULL);
pthread_mutex_init(&toParseQueueMutex, NULL);
pthread_mutex_init(&toIndexQueueMutex, NULL);
pthread_mutex_init(&articleExtractorRunningMutex, NULL);
pthread_mutex_init(&articleParserRunningMutex, NULL);
pthread_mutex_init(&articleIndexerRunningMutex, NULL);
pthread_mutex_init(&articleCountMutex, NULL);
pthread_mutex_init(&zimPathMutex, NULL);
pthread_mutex_init(&zimIdMutex, NULL);
pthread_mutex_init(&indexPathMutex, NULL);
pthread_mutex_init(&progressionMutex, NULL);
pthread_mutex_init(&verboseMutex, NULL);
}
/* Destructor */
Indexer::~Indexer() {
}
/* Read the stopwords */
void Indexer::readStopWords(const string languageCode) {
std::string stopWord;
std::istringstream file(getResource("stopwords/" + languageCode));
this->stopWords.clear();
while (getline(file, stopWord, '\n')) {
this->stopWords.push_back(stopWord);
}
if (this->verboseFlag) {
std::cout << "Read stop words, lang code:" << languageCode << ", count:" << this->stopWords.size() << std::endl;
}
}
#pragma mark - Extractor
/* Article extractor methods */
void *Indexer::extractArticles(void *ptr) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
kiwix::Indexer *self = (kiwix::Indexer *)ptr;
/* Get the number of article to index and the ZIM id */
kiwix::Reader reader(self->getZimPath());
unsigned int articleCount = reader.getArticleCount();
self->setArticleCount(articleCount);
string zimId = reader.getId();
self->setZimId(zimId);
/* Progression */
unsigned int readArticleCount = 0;
unsigned int currentProgression = 0;
self->setProgression(currentProgression);
unsigned int newProgress;
/* StopWords */
self->readStopWords(reader.getLanguage());
/* Goes trough all articles */
zim::File *zimHandler = reader.getZimFileHandler();
unsigned int currentOffset = zimHandler->getNamespaceBeginOffset('A');
unsigned int lastOffset = zimHandler->getNamespaceEndOffset('A');
zim::Article currentArticle;
while (currentOffset < lastOffset) {
currentArticle = zimHandler->getArticle(currentOffset);
if (!currentArticle.isRedirect()) {
/* Add articles to the queue */
indexerToken token;
token.title = currentArticle.getTitle();
token.url = currentArticle.getLongUrl();
token.content = string(currentArticle.getData().data(), currentArticle.getData().size());
self->pushToParseQueue(token);
readArticleCount += 1;
/* Update progress */
if (self->progressCallback) {
self->progressCallback(readArticleCount, articleCount);
}
newProgress = (unsigned int)((float)readArticleCount / (float)articleCount * 100);
if (newProgress != currentProgression) {
self->setProgression(newProgress);
}
}
currentOffset += 1;
/* Test if the thread should be cancelled */
pthread_testcancel();
}
self->articleExtractorRunning(false);
pthread_exit(NULL);
return NULL;
}
void Indexer::articleExtractorRunning(bool value) {
pthread_mutex_lock(&articleExtractorRunningMutex);
this->articleExtractorRunningFlag = value;
pthread_mutex_unlock(&articleExtractorRunningMutex);
}
bool Indexer::isArticleExtractorRunning() {
pthread_mutex_lock(&articleExtractorRunningMutex);
bool retVal = this->articleExtractorRunningFlag;
pthread_mutex_unlock(&articleExtractorRunningMutex);
return retVal;
}
#pragma mark - Parser
/* Article parser methods */
void *Indexer::parseArticles(void *ptr) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
kiwix::Indexer *self = (kiwix::Indexer *)ptr;
size_t found;
indexerToken token;
while (self->popFromToParseQueue(token)) {
MyHtmlParser htmlParser;
/* The parser generate a lot of exceptions which should be avoided */
try {
htmlParser.parse_html(token.content, "UTF-8", true);
} catch (...) {
}
/* If content does not have the noindex meta tag */
/* Seems that the parser generates an exception in such case */
found = htmlParser.dump.find("NOINDEX");
if (found == string::npos) {
/* Get the accented title */
token.accentedTitle = (htmlParser.title.empty() ? token.title : htmlParser.title);
/* count words */
stringstream countWordStringStream;
countWordStringStream << self->countWords(htmlParser.dump);
token.wordCount = countWordStringStream.str();
/* snippet */
std::string snippet = std::string(htmlParser.dump, 0, 300);
std::string::size_type last = snippet.find_last_of('.');
if (last == snippet.npos)
last = snippet.find_last_of(' ');
if (last != snippet.npos)
snippet = snippet.substr(0, last);
token.snippet = snippet;
/* size */
stringstream sizeStringStream;
sizeStringStream << token.content.size() / 1024;
token.size = sizeStringStream.str();
/* Remove accent */
token.title = kiwix::removeAccents(token.accentedTitle);
token.keywords = kiwix::removeAccents(htmlParser.keywords);
token.content = kiwix::removeAccents(htmlParser.dump);
self->pushToIndexQueue(token);
}
/* Test if the thread should be cancelled */
pthread_testcancel();
}
self->articleParserRunning(false);
pthread_exit(NULL);
return NULL;
}
void Indexer::articleParserRunning(bool value) {
pthread_mutex_lock(&articleParserRunningMutex);
this->articleParserRunningFlag = value;
pthread_mutex_unlock(&articleParserRunningMutex);
}
bool Indexer::isArticleParserRunning() {
pthread_mutex_lock(&articleParserRunningMutex);
bool retVal = this->articleParserRunningFlag;
pthread_mutex_unlock(&articleParserRunningMutex);
return retVal;
}
#pragma mark - Indexer
/* Article indexer methods */
void *Indexer::indexArticles(void *ptr) {
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
kiwix::Indexer *self = (kiwix::Indexer *)ptr;
unsigned int indexedArticleCount = 0;
indexerToken token;
self->indexingPrelude(self->getIndexPath());
while (self->popFromToIndexQueue(token)) {
self->index(token.url,
token.accentedTitle,
token.title,
token.keywords,
token.content,
token.snippet,
token.size,
token.wordCount
);
indexedArticleCount += 1;
/* Make a hard-disk flush every 10.000 articles */
if (indexedArticleCount % 5000 == 0) {
self->flush();
}
/* Test if the thread should be cancelled */
pthread_testcancel();
}
self->indexingPostlude(self->getIndexPath());
/* Write content id file */
string path = appendToDirectory(self->getIndexPath(), "content.id");
writeTextFile(path, self->getZimId());
self->setProgression(100);
kiwix::sleep(100);
self->articleIndexerRunning(false);
pthread_exit(NULL);
return NULL;
}
void Indexer::articleIndexerRunning(bool value) {
pthread_mutex_lock(&articleIndexerRunningMutex);
this->articleIndexerRunningFlag = value;
pthread_mutex_unlock(&articleIndexerRunningMutex);
}
bool Indexer::isArticleIndexerRunning() {
pthread_mutex_lock(&articleIndexerRunningMutex);
bool retVal = this->articleIndexerRunningFlag;
pthread_mutex_unlock(&articleIndexerRunningMutex);
return retVal;
}
#pragma mark - Parse Queue
/* ToParseQueue methods */
bool Indexer::isToParseQueueEmpty() {
pthread_mutex_lock(&toParseQueueMutex);
bool retVal = this->toParseQueue.empty();
pthread_mutex_unlock(&toParseQueueMutex);
return retVal;
}
void Indexer::pushToParseQueue(indexerToken &token) {
pthread_mutex_lock(&toParseQueueMutex);
this->toParseQueue.push(token);
pthread_mutex_unlock(&toParseQueueMutex);
kiwix::sleep(int(this->toParseQueue.size() / 200) / 10 * 1000);
}
bool Indexer::popFromToParseQueue(indexerToken &token) {
while (this->isToParseQueueEmpty() && this->isArticleExtractorRunning()) {
kiwix::sleep(500);
if (this->getVerboseFlag()) {
std::cout << "Waiting... ToParseQueue is empty for now..." << std::endl;
}
pthread_testcancel();
}
if (!this->isToParseQueueEmpty()) {
pthread_mutex_lock(&toParseQueueMutex);
token = this->toParseQueue.front();
this->toParseQueue.pop();
pthread_mutex_unlock(&toParseQueueMutex);
} else {
return false;
}
return true;
}
#pragma mark - Index Queue
/* ToIndexQueue methods */
bool Indexer::isToIndexQueueEmpty() {
pthread_mutex_lock(&toIndexQueueMutex);
bool retVal = this->toIndexQueue.empty();
pthread_mutex_unlock(&toIndexQueueMutex);
return retVal;
}
void Indexer::pushToIndexQueue(indexerToken &token) {
pthread_mutex_lock(&toIndexQueueMutex);
this->toIndexQueue.push(token);
pthread_mutex_unlock(&toIndexQueueMutex);
kiwix::sleep(int(this->toIndexQueue.size() / 200) / 10 * 1000);
}
bool Indexer::popFromToIndexQueue(indexerToken &token) {
while (this->isToIndexQueueEmpty() && this->isArticleParserRunning()) {
kiwix::sleep(500);
if (this->getVerboseFlag()) {
std::cout << "Waiting... ToIndexQueue is empty for now..." << std::endl;
}
pthread_testcancel();
}
if (!this->isToIndexQueueEmpty()) {
pthread_mutex_lock(&toIndexQueueMutex);
token = this->toIndexQueue.front();
this->toIndexQueue.pop();
pthread_mutex_unlock(&toIndexQueueMutex);
} else {
return false;
}
return true;
}
#pragma mark - Properties Getter & Setter
/* ZIM & Index methods */
void Indexer::setZimPath(const string path) {
pthread_mutex_lock(&zimPathMutex);
this->zimPath = path;
pthread_mutex_unlock(&zimPathMutex);
}
string Indexer::getZimPath() {
pthread_mutex_lock(&zimPathMutex);
string retVal = this->zimPath;
pthread_mutex_unlock(&zimPathMutex);
return retVal;
}
void Indexer::setIndexPath(const string path) {
pthread_mutex_lock(&indexPathMutex);
this->indexPath = path;
pthread_mutex_unlock(&indexPathMutex);
}
string Indexer::getIndexPath() {
pthread_mutex_lock(&indexPathMutex);
string retVal = this->indexPath;
pthread_mutex_unlock(&indexPathMutex);
return retVal;
}
void Indexer::setArticleCount(const unsigned int articleCount) {
pthread_mutex_lock(&articleCountMutex);
this->articleCount = articleCount;
pthread_mutex_unlock(&articleCountMutex);
}
unsigned int Indexer::getArticleCount() {
pthread_mutex_lock(&articleCountMutex);
unsigned int retVal = this->articleCount;
pthread_mutex_unlock(&articleCountMutex);
return retVal;
}
void Indexer::setProgression(const unsigned int progression) {
pthread_mutex_lock(&progressionMutex);
this->progression = progression;
pthread_mutex_unlock(&progressionMutex);
}
unsigned int Indexer::getProgression() {
pthread_mutex_lock(&progressionMutex);
unsigned int retVal = this->progression;
pthread_mutex_unlock(&progressionMutex);
return retVal;
}
void Indexer::setZimId(const string id) {
pthread_mutex_lock(&zimIdMutex);
this->zimId = id;
pthread_mutex_unlock(&zimIdMutex);
}
string Indexer::getZimId() {
pthread_mutex_lock(&zimIdMutex);
string retVal = this->zimId;
pthread_mutex_unlock(&zimIdMutex);
return retVal;
}
#pragma mark - Status Management
/* Manage */
bool Indexer::start(const string zimPath, const string indexPath, ProgressCallback callback) {
if (this->getVerboseFlag()) {
std::cout << "Indexing of '" << zimPath << "' starting..." <<std::endl;
}
if (callback) {
this->progressCallback = callback;
}
this->setArticleCount(0);
this->setProgression(0);
this->setZimPath(zimPath);
this->setIndexPath(indexPath);
pthread_mutex_lock(&threadIdsMutex);
this->articleExtractorRunning(true);
pthread_create(&(this->articleExtractor), NULL, Indexer::extractArticles, (void*)this);
pthread_detach(this->articleExtractor);
while(this->isArticleExtractorRunning() && this->getArticleCount() == 0) {
kiwix::sleep(100);
}
this->articleParserRunning(true);
pthread_create(&(this->articleParser), NULL, Indexer::parseArticles, (void*)this);
pthread_detach(this->articleParser);
this->articleIndexerRunning(true);
pthread_create(&(this->articleIndexer), NULL, Indexer::indexArticles, (void*)this);
pthread_detach(this->articleIndexer);
pthread_mutex_unlock(&threadIdsMutex);
return true;
}
bool Indexer::isRunning() {
if (this->getVerboseFlag()) {
std::cout << "isArticleExtractor running: " << (this->isArticleExtractorRunning() ? "yes" : "no") << std::endl;
std::cout << "isArticleParser running: " << (this->isArticleParserRunning() ? "yes" : "no") << std::endl;
std::cout << "isArticleIndexer running: " << (this->isArticleIndexerRunning() ? "yes" : "no") << std::endl;
}
return this->isArticleExtractorRunning() || this->isArticleIndexerRunning() || this->isArticleParserRunning();
}
bool Indexer::stop() {
if (this->isRunning()) {
bool isArticleExtractorRunning = this->isArticleExtractorRunning();
bool isArticleIndexerRunning = this->isArticleIndexerRunning();
bool isArticleParserRunning = this->isArticleParserRunning();
pthread_mutex_lock(&threadIdsMutex);
if (isArticleIndexerRunning) {
pthread_cancel(this->articleIndexer);
this->articleIndexerRunning(false);
}
if (isArticleParserRunning) {
pthread_cancel(this->articleParser);
this->articleParserRunning(false);
}
if (isArticleExtractorRunning) {
pthread_cancel(this->articleExtractor);
this->articleExtractorRunning(false);
}
pthread_mutex_unlock(&threadIdsMutex);
}
return true;
}
#pragma mark - verbose
/* Manage the verboseFlag */
void Indexer::setVerboseFlag(const bool value) {
pthread_mutex_lock(&verboseMutex);
this->verboseFlag = value;
pthread_mutex_unlock(&verboseMutex);
}
bool Indexer::getVerboseFlag() {
bool value;
pthread_mutex_lock(&verboseMutex);
value = this->verboseFlag;
pthread_mutex_unlock(&verboseMutex);
return value;
}
}

View File

@@ -19,125 +19,136 @@
#include "library.h"
namespace kiwix {
namespace kiwix
{
/* Constructor */
Book::Book() : readOnly(false)
{
}
/* Destructor */
Book::~Book()
{
}
/* Sort functions */
bool Book::sortByLastOpen(const kiwix::Book& a, const kiwix::Book& b)
{
return atoi(a.last.c_str()) > atoi(b.last.c_str());
}
/* Constructor */
Book::Book():
readOnly(false) {
}
/* Destructor */
Book::~Book() {
}
bool Book::sortByTitle(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.title.c_str(), b.title.c_str()) < 0;
}
/* Sort functions */
bool Book::sortByLastOpen(const kiwix::Book &a, const kiwix::Book &b) {
return atoi(a.last.c_str()) > atoi(b.last.c_str());
}
bool Book::sortByDate(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.date.c_str(), b.date.c_str()) > 0;
}
bool Book::sortByTitle(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.title.c_str(), b.title.c_str()) < 0;
}
bool Book::sortBySize(const kiwix::Book& a, const kiwix::Book& b)
{
return atoi(a.size.c_str()) < atoi(b.size.c_str());
}
bool Book::sortByDate(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.date.c_str(), b.date.c_str()) > 0;
}
bool Book::sortByPublisher(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.publisher.c_str(), b.publisher.c_str()) < 0;
}
bool Book::sortBySize(const kiwix::Book &a, const kiwix::Book &b) {
return atoi(a.size.c_str()) < atoi(b.size.c_str());
}
bool Book::sortByCreator(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.creator.c_str(), b.creator.c_str()) < 0;
}
bool Book::sortByPublisher(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.publisher.c_str(), b.publisher.c_str()) < 0;
}
bool Book::sortByLanguage(const kiwix::Book& a, const kiwix::Book& b)
{
return strcmp(a.language.c_str(), b.language.c_str()) < 0;
}
bool Book::sortByCreator(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.creator.c_str(), b.creator.c_str()) < 0;
}
bool Book::sortByLanguage(const kiwix::Book &a, const kiwix::Book &b) {
return strcmp(a.language.c_str(), b.language.c_str()) < 0;
}
std::string Book::getHumanReadableIdFromPath() {
std::string id = pathAbsolute;
if (!id.empty()) {
kiwix::removeAccents(id);
std::string Book::getHumanReadableIdFromPath()
{
std::string id = pathAbsolute;
if (!id.empty()) {
kiwix::removeAccents(id);
#ifdef _WIN32
id = replaceRegex(id, "", "^.*\\\\");
id = replaceRegex(id, "", "^.*\\\\");
#else
id = replaceRegex(id, "", "^.*/");
id = replaceRegex(id, "", "^.*/");
#endif
id = replaceRegex(id, "", "\\.zim[a-z]*$");
id = replaceRegex(id, "_", " ");
id = replaceRegex(id, "plus", "\\+");
}
return id;
id = replaceRegex(id, "", "\\.zim[a-z]*$");
id = replaceRegex(id, "_", " ");
id = replaceRegex(id, "plus", "\\+");
}
/* Constructor */
Library::Library():
version(KIWIX_LIBRARY_VERSION) {
}
/* Destructor */
Library::~Library() {
}
bool Library::addBook(const Book &book) {
/* Try to find it */
std::vector<kiwix::Book>::iterator itr;
for ( itr = this->books.begin(); itr != this->books.end(); ++itr ) {
if (itr->id == book.id) {
if (!itr->readOnly) {
itr->readOnly = book.readOnly;
if (itr->path.empty())
itr->path = book.path;
if (itr->pathAbsolute.empty())
itr->pathAbsolute = book.pathAbsolute;
if (itr->url.empty())
itr->url = book.url;
if (itr->tags.empty())
itr->tags = book.tags;
if (itr->name.empty())
itr->name = book.name;
if (itr->indexPath.empty()) {
itr->indexPath = book.indexPath;
itr->indexType = book.indexType;
}
if (itr->indexPathAbsolute.empty()) {
itr->indexPathAbsolute = book.indexPathAbsolute;
itr->indexType = book.indexType;
}
if (itr->faviconMimeType.empty()) {
itr->favicon = book.favicon;
itr->faviconMimeType = book.faviconMimeType;
}
}
return false;
}
}
/* otherwise */
this->books.push_back(book);
return true;
}
bool Library::removeBookByIndex(const unsigned int bookIndex) {
books.erase(books.begin()+bookIndex);
return true;
}
return id;
}
/* Constructor */
Library::Library() : version(KIWIX_LIBRARY_VERSION)
{
}
/* Destructor */
Library::~Library()
{
}
bool Library::addBook(const Book& book)
{
/* Try to find it */
std::vector<kiwix::Book>::iterator itr;
for (itr = this->books.begin(); itr != this->books.end(); ++itr) {
if (itr->id == book.id) {
if (!itr->readOnly) {
itr->readOnly = book.readOnly;
if (itr->path.empty()) {
itr->path = book.path;
}
if (itr->pathAbsolute.empty()) {
itr->pathAbsolute = book.pathAbsolute;
}
if (itr->url.empty()) {
itr->url = book.url;
}
if (itr->tags.empty()) {
itr->tags = book.tags;
}
if (itr->name.empty()) {
itr->name = book.name;
}
if (itr->indexPath.empty()) {
itr->indexPath = book.indexPath;
itr->indexType = book.indexType;
}
if (itr->indexPathAbsolute.empty()) {
itr->indexPathAbsolute = book.indexPathAbsolute;
itr->indexType = book.indexType;
}
if (itr->faviconMimeType.empty()) {
itr->favicon = book.favicon;
itr->faviconMimeType = book.faviconMimeType;
}
}
return false;
}
}
/* otherwise */
this->books.push_back(book);
return true;
}
bool Library::removeBookByIndex(const unsigned int bookIndex)
{
books.erase(books.begin() + bookIndex);
return true;
}
}

View File

File diff suppressed because it is too large Load Diff

View File

@@ -16,17 +16,16 @@ kiwix_sources += lib_resources
if xapian_dep.found()
kiwix_sources += ['xapianSearcher.cpp']
if not get_option('android')
kiwix_sources += ['xapianIndexer.cpp']
endif
endif
if not get_option('android')
kiwix_sources += ['indexer.cpp']
else
if get_option('android')
subdir('android')
install_dir = 'kiwix-lib/jniLibs/' + meson.get_cross_property('android_abi')
else
install_dir = get_option('libdir')
endif
if has_ctpp2_dep
kiwix_sources += ['ctpp2/CTPP2VMStringLoader.cpp']
endif
@@ -40,5 +39,6 @@ kiwixlib = library('kiwix',
kiwix_sources,
include_directories : inc,
dependencies : all_deps,
version: '1.0.0',
install : true)
version: meson.project_version(),
install: true,
install_dir: install_dir)

View File

File diff suppressed because it is too large Load Diff

View File

@@ -19,6 +19,10 @@
#include "searcher.h"
#include "kiwixlib-resources.h"
#include "reader.h"
#include "xapianSearcher.h"
#include <zim/search.h>
#ifdef ENABLE_CTPP2
#include <ctpp2/CDT.hpp>
@@ -29,194 +33,359 @@
using namespace CTPP;
#endif
namespace kiwix
{
class _Result : public Result
{
public:
_Result(Searcher* searcher, zim::Search::iterator& iterator);
virtual ~_Result(){};
namespace kiwix {
virtual std::string get_url();
virtual std::string get_title();
virtual int get_score();
virtual std::string get_snippet();
virtual std::string get_content();
virtual int get_wordCount();
virtual int get_size();
virtual int get_readerIndex();
/* Constructor */
Searcher::Searcher() :
searchPattern(""),
protocolPrefix("zim://"),
searchProtocolPrefix("search://?"),
resultCountPerPage(0),
estimatedResultCount(0),
resultStart(0),
resultEnd(0)
private:
Searcher* searcher;
zim::Search::iterator iterator;
};
struct SearcherInternal {
const zim::Search* _search;
XapianSearcher* _xapianSearcher;
zim::Search::iterator current_iterator;
SearcherInternal() : _search(NULL), _xapianSearcher(NULL) {}
~SearcherInternal()
{
template_ct2 = RESOURCE::results_ct2;
loadICUExternalTables();
if (_search != NULL) {
delete _search;
}
if (_xapianSearcher != NULL) {
delete _xapianSearcher;
}
}
/* Destructor */
Searcher::~Searcher() {}
/* Search strings in the database */
void Searcher::search(std::string &search, unsigned int resultStart,
unsigned int resultEnd, const bool verbose) {
this->reset();
};
if (verbose == true) {
cout << "Performing query `" << search << "'" << endl;
/* Constructor */
Searcher::Searcher(const string& xapianDirectoryPath,
Reader* reader,
const string& humanReadableName)
: internal(new SearcherInternal()),
searchPattern(""),
protocolPrefix("zim://"),
searchProtocolPrefix("search://?"),
resultCountPerPage(0),
estimatedResultCount(0),
resultStart(0),
resultEnd(0)
{
loadICUExternalTables();
if (!reader || !reader->hasFulltextIndex()) {
internal->_xapianSearcher = new XapianSearcher(xapianDirectoryPath, reader);
}
this->contentHumanReadableId = humanReadableName;
this->humanReaderNames.push_back(humanReadableName);
}
Searcher::Searcher()
: internal(new SearcherInternal()),
searchPattern(""),
protocolPrefix("zim://"),
searchProtocolPrefix("search://?"),
resultCountPerPage(0),
estimatedResultCount(0),
resultStart(0),
resultEnd(0)
{
loadICUExternalTables();
}
/* Destructor */
Searcher::~Searcher()
{
delete internal;
}
void Searcher::add_reader(Reader* reader, const std::string& humanReadableName)
{
this->readers.push_back(reader);
this->humanReaderNames.push_back(humanReadableName);
}
/* Search strings in the database */
void Searcher::search(std::string& search,
unsigned int resultStart,
unsigned int resultEnd,
const bool verbose)
{
this->reset();
if (verbose == true) {
cout << "Performing query `" << search << "'" << endl;
}
/* If resultEnd & resultStart inverted */
if (resultStart > resultEnd) {
resultEnd += resultStart;
resultStart = resultEnd - resultStart;
resultEnd -= resultStart;
}
/* Try to find results */
if (resultStart != resultEnd) {
/* Avoid big researches */
this->resultCountPerPage = resultEnd - resultStart;
if (this->resultCountPerPage > 70) {
resultEnd = resultStart + 70;
this->resultCountPerPage = 70;
}
/* If resultEnd & resultStart inverted */
if (resultStart > resultEnd) {
resultEnd += resultStart;
resultStart = resultEnd - resultStart;
resultEnd -= resultStart;
}
/* Try to find results */
if (resultStart != resultEnd) {
/* Avoid big researches */
this->resultCountPerPage = resultEnd - resultStart;
if (this->resultCountPerPage > 70) {
resultEnd = resultStart + 70;
this->resultCountPerPage = 70;
/* Perform the search */
this->searchPattern = search;
this->resultStart = resultStart;
this->resultEnd = resultEnd;
string unaccentedSearch = removeAccents(search);
if (internal->_xapianSearcher) {
internal->_xapianSearcher->searchInIndex(
unaccentedSearch, resultStart, resultEnd, verbose);
this->estimatedResultCount
= internal->_xapianSearcher->results.get_matches_estimated();
} else {
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
zims.push_back((*current)->getZimFileHandler());
}
/* Perform the search */
this->searchPattern = search;
this->resultStart = resultStart;
this->resultEnd = resultEnd;
string unaccentedSearch = removeAccents(search);
searchInIndex(unaccentedSearch, resultStart, resultEnd, verbose);
this->resultOffset = this->results.begin();
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
return;
}
/* Reset the results */
void Searcher::reset() {
this->results.clear();
this->resultOffset = this->results.begin();
return;
}
void Searcher::restart_search()
{
if (internal->_xapianSearcher) {
internal->_xapianSearcher->restart_search();
} else {
internal->current_iterator = internal->_search->begin();
}
}
Result* Searcher::getNextResult()
{
if (internal->_xapianSearcher) {
return internal->_xapianSearcher->getNextResult();
} else if (internal->current_iterator != internal->_search->end()) {
Result* result = new _Result(this, internal->current_iterator);
internal->current_iterator++;
return result;
}
return NULL;
}
/* Reset the results */
void Searcher::reset()
{
this->estimatedResultCount = 0;
this->searchPattern = "";
return;
}
void Searcher::suggestions(std::string& search, const bool verbose)
{
this->reset();
if (verbose == true) {
cout << "Performing suggestion query `" << search << "`" << endl;
}
this->searchPattern = search;
this->resultStart = 0;
this->resultEnd = 10;
string unaccentedSearch = removeAccents(search);
if (internal->_xapianSearcher) {
/* [TODO] Suggestion on a external database ?
* We do not support that. */
this->estimatedResultCount = 0;
this->searchPattern = "";
return;
}
/* Return the result count estimation */
unsigned int Searcher::getEstimatedResultCount() {
return this->estimatedResultCount;
}
/* Get next result */
bool Searcher::getNextResult(string &url, string &title, unsigned int &score) {
bool retVal = false;
if (this->resultOffset != this->results.end()) {
/* url */
url = this->resultOffset->url;
/* title */
title = this->resultOffset->title;
/* score */
score = this->resultOffset->score;
/* increment the cursor for the next call */
this->resultOffset++;
retVal = true;
} else {
std::vector<const zim::File*> zims;
for (auto current = this->readers.begin(); current != this->readers.end();
current++) {
zims.push_back((*current)->getZimFileHandler());
}
return retVal;
zim::Search* search = new zim::Search(zims);
search->set_query(unaccentedSearch);
search->set_range(resultStart, resultEnd);
search->set_suggestion_mode(true);
internal->_search = search;
internal->current_iterator = internal->_search->begin();
this->estimatedResultCount = internal->_search->get_matches_estimated();
}
}
bool Searcher::setProtocolPrefix(const std::string prefix) {
this->protocolPrefix = prefix;
return true;
/* Return the result count estimation */
unsigned int Searcher::getEstimatedResultCount()
{
return this->estimatedResultCount;
}
bool Searcher::setProtocolPrefix(const std::string prefix)
{
this->protocolPrefix = prefix;
return true;
}
bool Searcher::setSearchProtocolPrefix(const std::string prefix)
{
this->searchProtocolPrefix = prefix;
return true;
}
_Result::_Result(Searcher* searcher, zim::Search::iterator& iterator)
: searcher(searcher), iterator(iterator)
{
}
std::string _Result::get_url()
{
return iterator.get_url();
}
std::string _Result::get_title()
{
return iterator.get_title();
}
int _Result::get_score()
{
return iterator.get_score();
}
std::string _Result::get_snippet()
{
return iterator.get_snippet();
}
std::string _Result::get_content()
{
if (iterator->good()) {
return iterator->getData();
}
bool Searcher::setSearchProtocolPrefix(const std::string prefix) {
this->searchProtocolPrefix = prefix;
return true;
}
void Searcher::setContentHumanReadableId(const string &contentHumanReadableId) {
this->contentHumanReadableId = contentHumanReadableId;
}
return "";
}
int _Result::get_size()
{
return iterator.get_size();
}
int _Result::get_wordCount()
{
return iterator.get_wordCount();
}
int _Result::get_readerIndex()
{
return iterator.get_fileIndex();
}
#ifdef ENABLE_CTPP2
string Searcher::getHtml() {
SimpleVM oSimpleVM;
string Searcher::getHtml()
{
SimpleVM oSimpleVM;
// Fill data
CDT oData;
CDT resultsCDT(CDT::ARRAY_VAL);
// Fill data
CDT oData;
CDT resultsCDT(CDT::ARRAY_VAL);
this->resultOffset = this->results.begin();
while (this->resultOffset != this->results.end()) {
CDT result;
result["title"] = this->resultOffset->title;
result["url"] = this->resultOffset->url;
result["snippet"] = this->resultOffset->snippet;
this->restart_search();
Result* p_result = NULL;
while ((p_result = this->getNextResult())) {
CDT result;
result["title"] = p_result->get_title();
result["url"] = p_result->get_url();
result["snippet"] = p_result->get_snippet();
result["contentId"] = humanReaderNames[p_result->get_readerIndex()];
if (this->resultOffset->size >= 0)
result["size"] = kiwix::beautifyInteger(this->resultOffset->size);
if (this->resultOffset->wordCount >= 0)
result["wordCount"] = kiwix::beautifyInteger(this->resultOffset->wordCount);
resultsCDT.PushBack(result);
this->resultOffset++;
if (p_result->get_size() >= 0) {
result["size"] = kiwix::beautifyInteger(p_result->get_size());
}
this->resultOffset = this->results.begin();
oData["results"] = resultsCDT;
// pages
CDT pagesCDT(CDT::ARRAY_VAL);
unsigned int pageStart = this->resultStart / this->resultCountPerPage >= 5 ? this->resultStart / this->resultCountPerPage - 4 : 0;
unsigned int pageCount = this->estimatedResultCount / this->resultCountPerPage + 1 - pageStart;
if (pageCount > 10)
pageCount = 10;
else if (pageCount == 1)
pageCount = 0;
for (unsigned int i=pageStart; i<pageStart+pageCount; i++) {
CDT page;
page["label"] = i + 1;
page["start"] = i * this->resultCountPerPage;
page["end"] = (i+1) * this->resultCountPerPage;
if (i * this->resultCountPerPage == this->resultStart)
page["selected"] = true;
pagesCDT.PushBack(page);
if (p_result->get_wordCount() >= 0) {
result["wordCount"] = kiwix::beautifyInteger(p_result->get_wordCount());
}
oData["pages"] = pagesCDT;
oData["count"] = kiwix::beautifyInteger(this->estimatedResultCount);
oData["searchPattern"] = kiwix::encodeDiples(this->searchPattern);
oData["searchPatternEncoded"] = urlEncode(this->searchPattern);
oData["resultStart"] = this->resultStart + 1;
oData["resultEnd"] = (this->resultEnd > this->estimatedResultCount ? this->estimatedResultCount : this->resultEnd);
oData["resultRange"] = this->resultCountPerPage;
oData["resultLastPageStart"] = this->estimatedResultCount > this->resultCountPerPage ? this->estimatedResultCount - this->resultCountPerPage : 0;
oData["protocolPrefix"] = this->protocolPrefix;
oData["searchProtocolPrefix"] = this->searchProtocolPrefix;
oData["contentId"] = this->contentHumanReadableId;
VMStringLoader oLoader(template_ct2.c_str(), template_ct2.size());
FileLogger oLogger(stderr);
// DEBUG only (write output to stdout)
// oSimpleVM.Run(oData, oLoader, stdout, oLogger);
std::string sResult;
oSimpleVM.Run(oData, oLoader, sResult, oLogger);
return sResult;
resultsCDT.PushBack(result);
delete p_result;
}
this->restart_search();
oData["results"] = resultsCDT;
// pages
CDT pagesCDT(CDT::ARRAY_VAL);
unsigned int pageStart
= this->resultStart / this->resultCountPerPage >= 5
? this->resultStart / this->resultCountPerPage - 4
: 0;
unsigned int pageCount
= this->estimatedResultCount / this->resultCountPerPage + 1 - pageStart;
if (pageCount > 10) {
pageCount = 10;
} else if (pageCount == 1) {
pageCount = 0;
}
for (unsigned int i = pageStart; i < pageStart + pageCount; i++) {
CDT page;
page["label"] = i + 1;
page["start"] = i * this->resultCountPerPage;
page["end"] = (i + 1) * this->resultCountPerPage;
if (i * this->resultCountPerPage == this->resultStart) {
page["selected"] = true;
}
pagesCDT.PushBack(page);
}
oData["pages"] = pagesCDT;
oData["count"] = kiwix::beautifyInteger(this->estimatedResultCount);
oData["searchPattern"] = kiwix::encodeDiples(this->searchPattern);
oData["searchPatternEncoded"] = urlEncode(this->searchPattern);
oData["resultStart"] = this->resultStart + 1;
oData["resultEnd"] = (this->resultEnd > this->estimatedResultCount
? this->estimatedResultCount
: this->resultEnd);
oData["resultRange"] = this->resultCountPerPage;
oData["resultLastPageStart"]
= this->estimatedResultCount > this->resultCountPerPage
? this->estimatedResultCount - this->resultCountPerPage
: 0;
oData["protocolPrefix"] = this->protocolPrefix;
oData["searchProtocolPrefix"] = this->searchProtocolPrefix;
oData["contentId"] = this->contentHumanReadableId;
std::string template_ct2 = RESOURCE::results_ct2;
VMStringLoader oLoader(template_ct2.c_str(), template_ct2.size());
FileLogger oLogger(stderr);
// DEBUG only (write output to stdout)
// oSimpleVM.Run(oData, oLoader, stdout, oLogger);
std::string sResult;
oSimpleVM.Run(oData, oLoader, sResult, oLogger);
return sResult;
}
#endif
}

View File

@@ -1,111 +0,0 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "xapianIndexer.h"
namespace kiwix {
/* Constructor */
XapianIndexer::XapianIndexer() {
/*
stemmer(Xapian::Stem("french")) {
this->indexer.set_stemmer(this->stemmer);
*/
}
void XapianIndexer::indexingPrelude(const string indexPath) {
this->writableDatabase = Xapian::WritableDatabase(indexPath+".tmp", Xapian::DB_CREATE_OR_OVERWRITE | Xapian::DB_BACKEND_GLASS);
this->writableDatabase.begin_transaction(true);
/* Insert the stopwords */
if (!this->stopWords.empty()) {
std::vector<std::string>::iterator it = this->stopWords.begin();
for( ; it != this->stopWords.end(); ++it) {
this->stopper.add(*it);
}
this->indexer.set_stopper(&(this->stopper));
}
}
void XapianIndexer::index(const string &url,
const string &title,
const string &unaccentedTitle,
const string &keywords,
const string &content,
const string &snippet,
const string &size,
const string &wordCount) {
/* Put the data in the document */
Xapian::Document currentDocument;
currentDocument.clear_values();
currentDocument.add_value(0, title);
currentDocument.add_value(1, snippet);
currentDocument.add_value(2, size);
currentDocument.add_value(3, wordCount);
currentDocument.set_data(url);
indexer.set_document(currentDocument);
/* Index the title */
if (!unaccentedTitle.empty()) {
this->indexer.index_text_without_positions(unaccentedTitle, this->getTitleBoostFactor(content.size()));
}
/* Index the keywords */
if (!keywords.empty()) {
this->indexer.index_text_without_positions(keywords, keywordsBoostFactor);
}
/* Index the content */
if (!content.empty()) {
this->indexer.index_text_without_positions(content);
}
/* add to the database */
this->writableDatabase.add_document(currentDocument);
}
void XapianIndexer::flush() {
this->writableDatabase.commit_transaction();
this->writableDatabase.begin_transaction(true);
}
void XapianIndexer::indexingPostlude(const string indexPath) {
this->flush();
this->writableDatabase.commit_transaction();
#ifdef _WIN32
this->writableDatabase.close();
#endif
/* Compacting the index */
Xapian::Compactor compactor;
try {
Xapian::Database src;
src.add_database(Xapian::Database(indexPath+".tmp"));
src.compact(indexPath, Xapian::Compactor::FULL | Xapian::DBCOMPACT_SINGLE_FILE, 0, compactor);
} catch (const Xapian::Error &error) {
cerr << indexPath << ": " << error.get_description() << endl;
exit(1);
} catch (const char * msg) {
cerr << indexPath << ": " << msg << endl;
exit(1);
}
}
}

View File

@@ -18,82 +18,217 @@
*/
#include "xapianSearcher.h"
#include <zim/zim.h>
#include <zim/file.h>
#include <sys/types.h>
#include <unicode/locid.h>
#include <unistd.h>
#include <zim/article.h>
#include <zim/error.h>
#include <sys/types.h>
#include <unistd.h>
#include <zim/file.h>
#include <zim/zim.h>
#include "xapian/myhtmlparse.h"
namespace kiwix {
#include <vector>
/* Constructor */
XapianSearcher::XapianSearcher(const string &xapianDirectoryPath)
: Searcher(),
stemmer(Xapian::Stem("english")) {
this->openIndex(xapianDirectoryPath);
namespace kiwix
{
std::map<std::string, int> read_valuesmap(const std::string& s)
{
std::map<std::string, int> result;
std::vector<std::string> elems = split(s, ";");
for (std::vector<std::string>::iterator elem = elems.begin();
elem != elems.end();
elem++) {
std::vector<std::string> tmp_elems = split(*elem, ":");
result.insert(
std::pair<std::string, int>(tmp_elems[0], atoi(tmp_elems[1].c_str())));
}
return result;
}
/* Open Xapian readable database */
void XapianSearcher::openIndex(const string &directoryPath) {
try
{
zim::File zimFile = zim::File(directoryPath);
zim::Article xapianArticle = zimFile.getArticle('Z', "/fulltextIndex/xapian");
if (!xapianArticle.good())
throw NoXapianIndexInZim();
zim::offset_type dbOffset = xapianArticle.getOffset();
int databasefd = open(directoryPath.c_str(), O_RDONLY);
lseek(databasefd, dbOffset, SEEK_SET);
this->readableDatabase = Xapian::Database(databasefd);
/* Constructor */
XapianSearcher::XapianSearcher(const string& xapianDirectoryPath,
Reader* reader)
: reader(reader)
{
this->openIndex(xapianDirectoryPath);
}
/* Open Xapian readable database */
void XapianSearcher::openIndex(const string& directoryPath)
{
this->readableDatabase = Xapian::Database(directoryPath);
this->valuesmap
= read_valuesmap(this->readableDatabase.get_metadata("valuesmap"));
this->language = this->readableDatabase.get_metadata("language");
this->stopwords = this->readableDatabase.get_metadata("stopwords");
setup_queryParser();
}
/* Close Xapian writable database */
void XapianSearcher::closeIndex()
{
return;
}
void XapianSearcher::setup_queryParser()
{
queryParser.set_database(readableDatabase);
if (!language.empty()) {
/* Build ICU Local object to retrieve ISO-639 language code (from
ISO-639-3) */
icu::Locale languageLocale(language.c_str());
/* Configuring language base steemming */
try {
stemmer = Xapian::Stem(languageLocale.getLanguage());
queryParser.set_stemmer(stemmer);
queryParser.set_stemming_strategy(Xapian::QueryParser::STEM_ALL);
} catch (...) {
this->readableDatabase = Xapian::Database(directoryPath);
std::cout << "No steemming for language '" << languageLocale.getLanguage()
<< "'" << std::endl;
}
}
/* Close Xapian writable database */
void XapianSearcher::closeIndex() {
return;
}
/* Search strings in the database */
void XapianSearcher::searchInIndex(string &search, const unsigned int resultStart,
const unsigned int resultEnd, const bool verbose) {
/* Create the query */
Xapian::QueryParser queryParser;
Xapian::Query query = queryParser.parse_query(search);
/* Create the enquire object */
Xapian::Enquire enquire(this->readableDatabase);
enquire.set_query(query);
/* Get the results */
Xapian::MSet matches = enquire.get_mset(resultStart, resultEnd - resultStart);
Xapian::MSetIterator i;
for (i = matches.begin(); i != matches.end(); ++i) {
Xapian::Document doc = i.get_document();
Result result;
result.url = doc.get_data();
result.title = doc.get_value(0);
result.snippet = doc.get_value(1);
result.size = (doc.get_value(2).empty() == true ? -1 : atoi(doc.get_value(2).c_str()));
result.wordCount = (doc.get_value(3).empty() == true ? -1 : atoi(doc.get_value(3).c_str()));
result.score = i.get_percent();
this->results.push_back(result);
if (verbose) {
std::cout << "Document ID " << *i << " \t";
std::cout << i.get_percent() << "% ";
std::cout << "\t[" << doc.get_data() << "] - " << doc.get_value(0) << std::endl;
}
if (!stopwords.empty()) {
std::string stopWord;
std::istringstream file(this->stopwords);
while (std::getline(file, stopWord, '\n')) {
this->stopper.add(stopWord);
}
/* Update the global resultCount value*/
this->estimatedResultCount = matches.get_matches_estimated();
return;
queryParser.set_stopper(&(this->stopper));
}
}
/* Search strings in the database */
void XapianSearcher::searchInIndex(string& search,
const unsigned int resultStart,
const unsigned int resultEnd,
const bool verbose)
{
/* Create the query */
Xapian::Query query = queryParser.parse_query(search);
/* Create the enquire object */
Xapian::Enquire enquire(this->readableDatabase);
enquire.set_query(query);
/* Get the results */
this->results = enquire.get_mset(resultStart, resultEnd - resultStart);
this->current_result = this->results.begin();
}
/* Get next result */
Result* XapianSearcher::getNextResult()
{
if (this->current_result != this->results.end()) {
XapianResult* result = new XapianResult(this, this->current_result);
this->current_result++;
return result;
}
return NULL;
}
void XapianSearcher::restart_search()
{
this->current_result = this->results.begin();
}
XapianResult::XapianResult(XapianSearcher* searcher,
Xapian::MSetIterator& iterator)
: searcher(searcher), iterator(iterator), document(iterator.get_document())
{
}
std::string XapianResult::get_url()
{
return document.get_data();
}
std::string XapianResult::get_title()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
return document.get_value(0);
} else if (searcher->valuesmap.find("title") != searcher->valuesmap.end()) {
return document.get_value(searcher->valuesmap["title"]);
}
return "";
}
int XapianResult::get_score()
{
return iterator.get_percent();
}
std::string XapianResult::get_snippet()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
std::string stored_snippet = document.get_value(1);
if (!stored_snippet.empty()) {
return stored_snippet;
}
/* Let's continue here, and see if we can genenate one */
} else if (searcher->valuesmap.find("snippet") != searcher->valuesmap.end()) {
return document.get_value(searcher->valuesmap["snippet"]);
}
/* No reader, no snippet */
if (!searcher->reader) {
return "";
}
/* Get the content of the article to generate a snippet.
We parse it and use the html dump to avoid remove html tags in the
content and be able to nicely cut the text at random place. */
MyHtmlParser htmlParser;
std::string content = get_content();
if (content.empty()) {
return content;
}
try {
htmlParser.parse_html(content, "UTF-8", true);
} catch (...) {
}
return searcher->results.snippet(htmlParser.dump, 500);
}
std::string XapianResult::get_content()
{
if (!searcher->reader) {
return "";
}
std::string content;
std::string title;
unsigned int contentLength;
std::string contentType;
searcher->reader->getContentByUrl(
get_url(), content, title, contentLength, contentType);
return content;
}
int XapianResult::get_size()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
return document.get_value(2).empty() == true
? -1
: atoi(document.get_value(2).c_str());
} else if (searcher->valuesmap.find("size") != searcher->valuesmap.end()) {
return atoi(document.get_value(searcher->valuesmap["size"]).c_str());
}
/* The size is never used. Do we really want to get the content and
calculate the size ? */
return -1;
}
int XapianResult::get_wordCount()
{
if (searcher->valuesmap.empty()) {
/* This is the old legacy version. Guess and try */
return document.get_value(3).empty() == true
? -1
: atoi(document.get_value(3).c_str());
} else if (searcher->valuesmap.find("wordcount")
!= searcher->valuesmap.end()) {
return atoi(document.get_value(searcher->valuesmap["wordcount"]).c_str());
}
return -1;
}
} // Kiwix namespace

View File

@@ -1,7 +1,27 @@
ctpp2c = find_program('ctpp2c', required:false)
if ctpp2c.found()
search_result_template = custom_target('result_template',
input: 'results.tmpl',
output: 'results.ct2',
command: [intermediate_ctpp2c, ctpp2c, '@INPUT@', '@OUTPUT@']
)
resources_list = 'resources_list_ctpp2.txt'
resources_depends = [search_result_template]
else
resources_list = 'resources_list_noctpp2.txt'
resources_depends = []
endif
lib_resources = custom_target('resources',
input: 'resources_list.txt',
input: resources_list,
output: ['kiwixlib-resources.cpp', 'kiwixlib-resources.h'],
command:[res_compiler, '--cxxfile', '@OUTPUT0@', '--hfile', '@OUTPUT1@', '@INPUT@']
)
command:[res_compiler,
'--cxxfile', '@OUTPUT0@',
'--hfile', '@OUTPUT1@',
'--source_dir', '@OUTDIR@',
'@INPUT@'],
depends: resources_depends
)

View File

@@ -0,0 +1,3 @@
stopwords/en
stopwords/he
stopwords/fra

View File

Binary file not shown.

159
static/results.tmpl Normal file
View File

@@ -0,0 +1,159 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="content-type" />
<style type="text/css">
body{
color: #00000;
font: small/normal Arial,Helvetica,Sans-Serif;
margin-top: 0.5em;
font-size: 90%;
}
a{
color: #04c;
}
a:visited {
color: #639
}
a:hover {
text-decoration: underline
}
.header {
font-size: 120%;
}
ul {
margin:0;
padding:0
}
.results {
font-size: 110%;
}
.results li {
list-style-type:none;
margin-top: 0.5em;
}
.results a {
font-size: 110%;
text-decoration: underline
}
cite {
font-style:normal;
word-wrap:break-word;
display: block;
font-size: 100%;
}
.informations {
color: #388222;
font-size: 100%;
}
.footer {
padding: 0;
margin-top: 1em;
width: 100%;
float: left
}
.footer a, .footer span {
display: block;
padding: .3em .7em;
margin: 0 .38em 0 0;
text-align:center;
text-decoration: none;
}
.footer a:hover {
background: #ededed;
}
.footer ul, .footer li {
list-style:none;
margin: 0;
padding: 0;
}
.footer li {
float: left;
}
.selected {
background: #ededed;
}
</style>
<title>Search: <TMPL_var searchPattern></title>
</head>
<body bgcolor="white">
<div class="header">
<TMPL_if results>
Results
<b>
<TMPL_var resultStart>-<TMPL_var resultEnd>
</b> of <b>
<TMPL_var count>
</b> for <b>
<TMPL_var searchPattern>
</b>
<TMPL_else>
No results were found for <b><TMPL_var searchPattern></b>
</TMPL_if>
</div>
<div class="results">
<ul>
<TMPL_foreach results as result>
<li>
<a href="<TMPL_var protocolPrefix><TMPL_var result.contentId>/<TMPL_var result.url>">
<TMPL_var result.title>
</a>
<cite>
<TMPL_if result.snippet>
<TMPL_var result.snippet>...
</TMPL_if>
</cite>
<TMPL_if wordCount>
<div class="informations"><TMPL_var wordCount> words</div>
</TMPL_if>
</li>
</TMPL_foreach>
</ul>
</div>
<div class="footer">
<ul>
<TMPL_if (resultLastPageStart>0)>
<li>
<a href="<TMPL_var searchProtocolPrefix>pattern=<TMPL_var searchPatternEncoded><TMPL_if contentId>&content=<TMPL_var contentId></TMPL_if>&start=0&end=<TMPL_var resultRange>">
</a>
</li>
</TMPL_if>
<TMPL_foreach pages as page>
<li>
<a <TMPL_if page.selected>class="selected"</TMPL_if>
href="<TMPL_var searchProtocolPrefix>pattern=<TMPL_var searchPatternEncoded><TMPL_if contentId>&content=<TMPL_var contentId></TMPL_if>&start=<TMPL_var page.start>&end=<TMPL_var page.end>">
<TMPL_var page.label>
</a>
</li>
</TMPL_foreach>
<TMPL_if (resultLastPageStart>0)>
<li>
<a href="<TMPL_var searchProtocolPrefix>pattern=<TMPL_var searchPatternEncoded><TMPL_if contentId>&content=<TMPL_var contentId></TMPL_if>&start=<TMPL_var resultLastPageStart>&end=<TMPL_var (resultLastPageStart+resultRange)>">
</a>
</li>
</TMPL_if>
</ul>
</div>
</body>
</html>

35
travis/compile.sh Executable file
View File

@@ -0,0 +1,35 @@
#!/usr/bin/env bash
set -e
BUILD_DIR=${HOME}/BUILD_${PLATFORM}
INSTALL_DIR=${BUILD_DIR}/INSTALL
case ${PLATFORM} in
"native_static")
MESON_OPTION="--default-library=static"
;;
"native_dyn")
MESON_OPTION="--default-library=shared"
;;
"win32_static")
MESON_OPTION="--default-library=static --cross-file ${BUILD_DIR}/meson_cross_file.txt"
;;
"win32_dyn")
MESON_OPTION="--default-library=shared --cross-file ${BUILD_DIR}/meson_cross_file.txt"
;;
"android_arm")
MESON_OPTION="-Dandroid=true --default-library=shared --cross-file ${BUILD_DIR}/meson_cross_file.txt"
;;
"android_arm64")
MESON_OPTION="-Dandroid=true --default-library=shared --cross-file ${BUILD_DIR}/meson_cross_file.txt"
;;
esac
cd ${TRAVIS_BUILD_DIR}
export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/x86_64-linux-gnu/pkgconfig
meson . build -Dctpp2-install-prefix=${INSTALL_DIR} ${MESON_OPTION}
cd build
ninja

47
travis/install_deps.sh Executable file
View File

@@ -0,0 +1,47 @@
#!/usr/bin/env bash
set -e
REPO_NAME=${TRAVIS_REPO_SLUG#*/}
ARCHIVE_NAME=deps_${PLATFORM}_${REPO_NAME}.tar.gz
# Packages.
case ${PLATFORM} in
"native_static")
PACKAGES="gcc cmake libbz2-dev ccache zlib1g-dev uuid-dev libctpp2-dev ctpp2-utils"
;;
"native_dyn")
PACKAGES="gcc cmake libbz2-dev ccache zlib1g-dev uuid-dev libctpp2-dev ctpp2-utils libmicrohttpd-dev"
;;
"win32_static")
PACKAGES="g++-mingw-w64-i686 gcc-mingw-w64-i686 gcc-mingw-w64-base mingw-w64-tools ccache ctpp2-utils"
;;
"win32_dyn")
PACKAGES="g++-mingw-w64-i686 gcc-mingw-w64-i686 gcc-mingw-w64-base mingw-w64-tools ccache ctpp2-utils"
;;
"android_arm")
PACKAGES="gcc cmake ccache ctpp2-utils"
;;
"android_arm64")
PACKAGES="gcc cmake ccache ctpp2-utils"
;;
esac
sudo apt-get update -qq
sudo apt-get install -qq python3-pip ${PACKAGES}
sudo pip3 install meson
# Ninja
cd $HOME
git clone git://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
sudo cp ninja /bin
# Dependencies comming from kiwix-build.
cd ${HOME}
wget http://tmp.kiwix.org/ci/${ARCHIVE_NAME}
mkdir -p BUILD_${PLATFORM}
cd BUILD_${PLATFORM}
tar xf ${HOME}/${ARCHIVE_NAME}