Compare commits

...

29 Commits
1.1.0 ... 2.0.0

Author SHA1 Message Date
Matthieu Gautier
e9ab074b5d Merge pull request #136 from kiwix/2.0.0
2.0.0
2018-04-23 20:15:45 +02:00
Matthieu Gautier
45a000edaa New version 2.0.0 2018-04-23 18:06:49 +02:00
Matthieu Gautier
e216c44034 kiwix-lib needs libzim>=3.3.0 2018-04-23 18:06:49 +02:00
Matthieu Gautier
59661626e9 Merge pull request #135 from kiwix/update_README
Add dependency `libaria2` in the README.
2018-04-23 18:06:24 +02:00
Matthieu Gautier
6b0d2788aa Add dependency libaria2 in the README. 2018-04-23 17:40:21 +02:00
Matthieu Gautier
1b49c632b3 Merge pull request #123 from kiwix/new_api
New api
2018-04-23 17:07:45 +02:00
Chris Li
68665693c5 fixed some typos in the docs string 2018-04-19 18:04:07 +02:00
Matthieu Gautier
1dd828e79c Fix pathExists and check for correct path for xapian index.
The correct path for xapian database should be "X/fulltext/xapian",
not "Z//fulltextIndex/xapian".

So lets check for the right path and fallback to the wrong one (but
used in old zims).

The double '/' in the path is a bug of zimwriterfs and is specific
to the xapian database.
We must handle this correctly in `hasFulltextIndex` and not (buggly) in
`pathExists`.
(Hopefully, it seems that pathExists were used only by hasFulltextIndex)
2018-04-19 18:04:07 +02:00
Matthieu Gautier
135028c16a Introduce better API to manipulate entries in a zim file.
The previous API suffer different problems:
- It was difficult to handle articles redirecting to other article.
- It was not possible to get few information (title) without getting
  the whole content.

The new API introduce the new class `Entry` that act as a proxy to an
article in the zim file.

Methods of `Reader` now return an `Entry` and the user has to call
`Entry`'s methods to get useful information.
No redirection is made explicitly.
If an entry is not found, an exception is raised instead of returning
an invalid `Entry`.

The common pattern to get the content of an entry become :

```
std::string content;
try {
  auto entry = reader.getEntryFromPath(path);
  entry = entry.getFinalEntry();
  content = entry.getContent();
} catch (NoEntry& e) {
  ...
}
```

Older methods are keep (with the same behavior) but are marked as
deprecated.
2018-04-19 18:04:07 +02:00
Matthieu Gautier
1f3fcd85a0 Allow us to declare method to be deprecated. 2018-04-19 18:04:07 +02:00
Matthieu Gautier
6e13d44459 Merge pull request #129 from kiwix/opds
Opds
2018-04-19 18:02:59 +02:00
Matthieu Gautier
47ce044e3e Add method to Manager to populate the library from a opds stream.
The library's books are created in the metadata in the opds.
As the opds stream is by definition a distant "library", there is no
zim to read to complete missing information.

This can lead to incomplete `library.xml`.
2018-04-19 17:53:08 +02:00
Matthieu Gautier
1f091da3f4 Add a downloader tools to download files.
The downloader is using libaria2.

For now, only one download can be run a the time.
A download will start only if (and as soon as) no download is running.
2018-04-19 17:53:08 +02:00
Matthieu Gautier
d4fefd1a57 Add a function to create a temporary directory. 2018-04-19 17:53:05 +02:00
Matthieu Gautier
9f86b59d1d Add a function to get the content of a file. 2018-04-19 17:53:02 +02:00
Matthieu Gautier
2164faba44 Add a potential search description link in the opds stream. 2018-04-19 17:08:01 +02:00
Matthieu Gautier
b48428e443 Be able to create a OPDSDumper without library and associate it later. 2018-04-19 17:08:01 +02:00
Matthieu Gautier
ad92af928b Be able to filter a library.
This generate a new library only with the corresponding books.
2018-04-19 17:08:01 +02:00
Matthieu Gautier
ee51c470b4 Allow the manager to dump the opds feed of the whole library. 2018-04-19 17:08:01 +02:00
Matthieu Gautier
5398d69231 Merge pull request #134 from kiwix/macos
Build kiwix-lib on macos.
2018-04-19 15:37:22 +02:00
Matthieu Gautier
c0bc2ed111 Build kiwix-lib on macos.
Also try to speed up a bit the build by :
- installing packages using the travis apt plugin and do not use sudo
- Use prebuild ninja binary.
2018-04-19 15:29:48 +02:00
Matthieu Gautier
10893ae19f Merge pull request #125 from kiwix/no_warning
Try to compile kiwix-lib without warning.
2018-04-18 17:05:50 +02:00
Matthieu Gautier
ec097ab267 Try to compile kiwix-lib without warning. 2018-04-18 16:57:27 +02:00
Matthieu Gautier
32ad40a5b0 Merge pull request #133 from kiwix/rpath
Set the RPATH of kiwix-lib.
2018-04-17 17:09:43 +02:00
Matthieu Gautier
d686de7ec3 Set the RPATH of kiwix-lib.
As we cannot change (DY)LD_LIBRARY_PATH on macos, we have to use rpath.
2018-04-17 16:27:31 +02:00
Matthieu Gautier
8d6f1196de Merge pull request #132 from kiwix/ctpp2_lib_dir
Find ctpp2 lib in the normal lib dir and fallback to 'lib'.
2018-04-17 15:35:58 +02:00
Matthieu Gautier
a216ad5a6f Find ctpp2 lib in the normal lib dir and fallback to 'lib'.
ctpp2 libs should be in the "normal" lib dir, so search in it.
The 'lib' dir should only be used as a fallback.
2018-04-17 14:37:19 +02:00
Matthieu Gautier
3849f0ae8b Merge pull request #128 from kiwix/fix_version
New version 1.1.1
2018-03-29 17:49:10 +02:00
Matthieu Gautier
f2413f6680 New version 1.1.1 2018-03-27 17:22:38 +02:00
26 changed files with 1392 additions and 293 deletions

View File

@@ -1,9 +1,10 @@
language: cpp
dist: trusty
sudo: required
sudo: false
cache: ccache
before_install:
- eval "${MATRIX_EVAL}"
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then eval "${MATRIX_EVAL}"; fi
- PATH=$PATH:$HOME/bin
- ${CXX} --version
install: travis/install_deps.sh
script: travis/compile.sh
@@ -23,3 +24,20 @@ addons:
- ubuntu-toolchain-r-test
packages:
- g++-5
- cmake
- python3-pip
- libbz2-dev
- ccache
- zlib1g-dev
- uuid-dev
- libctpp2-dev
- ctpp2-utils
- libmicrohttpd-dev
- g++-mingw-w64-i686
- gcc-mingw-w64-i686
- gcc-mingw-w64-base
- mingw-w64-tools
matrix:
include:
- env: PLATFORM="native_dyn"
os: osx

View File

@@ -1,3 +1,29 @@
kiwix-lib 2.0.0
===============
* Introduce a new API to retrive content from a reader.
* Introduce the `Entry` class.
* Reader's methods return an `Entry`.
* Content and other information can be retrieved from the `Entry`.
* Older Reader's methods are depreciated.
* Add an `OPDSDumper` class to dump a whole `Library` as an OPDS feed.
* Add a tool function to get the content of a file.
* Add a tool function to create a tempory directory.
* Add a `Downloader` class to download a file.
* Allow the manager to populate a `Library` from an OPDS feed.
* Try to locate libctpp2 in default system libdir and then fallback in 'lib'
directory.
* Build kiwix-lib setting RPATH.
* Build kiwix-lib without warning (werror=true)
* Build kiwix-lib on macos.
kiwix-lib 1.1.1
===============
* Correct the name of kiwix-lib (from `kiwixlib`) in meson.build to generate
dist archive with the correct name.
* Libzim version need to be at least 3.2.0
kiwix-lib 1.1.0
===============

View File

@@ -37,6 +37,7 @@ libraries need to be available:
(package libctpp2-dev on Ubuntu)
* Xapian ......................................... https://xapian.org/
(package libxapian-dev on Ubuntu)
* libaria2 .................................. https://aria2.github.io/
These dependencies may or may not be packaged by your operating
system. They may also be packaged but only in an older version. The

24
include/common.h Normal file
View File

@@ -0,0 +1,24 @@
#ifndef _KIWIX_COMMON_H_
#define _KIWIX_COMMON_H_
#include <zim/zim.h>
#ifdef __GNUC__
#define DEPRECATED __attribute__((deprecated))
#elif defined(_MSC_VER)
#define DEPRECATED __declspec(deprecated)
#else
#praga message("WARNING: You need to implement DEPRECATED for this compiler")
#define DEPRECATED
#endif
namespace kiwix {
typedef zim::size_type size_type;
typedef zim::offset_type offset_type;
}
#endif //_KIWIX_COMMON_H_

View File

@@ -51,8 +51,10 @@ string appendToDirectory(const string& directoryPath, const string& filename);
unsigned int getFileSize(const string& path);
string getFileSizeAsString(const string& path);
string getFileContent(const string& path);
bool fileExists(const string& path);
bool makeDirectory(const string& path);
string makeTmpDirectory();
bool copyFile(const string& sourcePath, const string& destPath);
string getLastPathElement(const string& path);
string getExecutablePath();

70
include/downloader.h Normal file
View File

@@ -0,0 +1,70 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_DOWNLOADER_H
#define KIWIX_DOWNLOADER_H
#include <string>
#include <aria2/aria2.h>
#include <pthread.h>
namespace kiwix
{
struct DownloadedFile {
DownloadedFile()
: success(false) {}
bool success;
std::string path;
};
/**
* A tool to download things.
*
*/
class Downloader
{
public:
Downloader();
~Downloader();
/**
* Download a content.
*
* @param url the url to download
* @return the content downloaded.
*/
DownloadedFile download(const std::string& url);
private:
static pthread_mutex_t globalLock;
aria2::Session* session;
DownloadedFile* fileHandle;
std::string tmpDir;
static int downloadEventCallback(aria2::Session* session,
aria2::DownloadEvent event,
aria2::A2Gid gid,
void* userData);
};
}
#endif

191
include/entry.h Normal file
View File

@@ -0,0 +1,191 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_ENTRY_H
#define KIWIX_ENTRY_H
#include <stdio.h>
#include <zim/article.h>
#include <exception>
#include <string>
#include "common.h"
using namespace std;
namespace kiwix
{
class NoEntry : public std::exception {};
/**
* A entry represent an.. entry in a zim file.
*/
class Entry
{
public:
/**
* Default constructor.
*
* Construct an invalid entry.
*/
Entry() = default;
/**
* Construct an entry making reference to an zim article.
*
* @param article a zim::Article object
*/
Entry(zim::Article article);
virtual ~Entry() = default;
/**
* Get the path of the entry.
*
* The path is the "key" of an entry.
*
* @return the path of the entry.
*/
std::string getPath() const;
/**
* Get the title of the entry.
*
* @return the title of the entry.
*/
std::string getTitle() const;
/**
* Get the content of the entry.
*
* The string is a copy of the content.
* If you don't want to do a copy, use get_blob.
*
* @return the content of the entry.
*/
std::string getContent() const;
/**
* Get the blob of the entry.
*
* A blob make reference to the content without copying it.
*
* @param offset The starting offset of the blob.
* @return the blob of the entry.
*/
zim::Blob getBlob(offset_type offset = 0) const;
/**
* Get the blob of the entry.
*
* A blob make reference to the content without copying it.
*
* @param offset The starting offset of the blob.
* @param size The size of the blob.
* @return the blob of the entry.
*/
zim::Blob getBlob(offset_type offset, size_type size) const;
/**
* Get the info for direct access to the content of the entry.
*
* Some entry (ie binary ones) have their content plain stored
* in the zim file. Knowing the offset where the content is stored
* an user can directly read the content in the zim file bypassing the
* kiwix-lib/libzim.
*
* @return A pair specifying where to read the content.
* The string is the real file to read (may be different that .zim
* file if zim is cut).
* The offset is the offset to read in the file.
* Return <"",0> if is not possible to read directly.
*/
std::pair<std::string, offset_type> getDirectAccessInfo() const;
/**
* Get the size of the entry.
*
* @return the size of the entry.
*/
size_type getSize() const;
/**
* Get the mime_type of the entry.
*
* @return the mime_type of the entry.
*/
std::string getMimetype() const;
/**
* Get if the entry is a redirect entry.
*
* @return True if the entry is a redirect.
*/
bool isRedirect() const;
/**
* Get if the entry is a link target entry.
*
* @return True if the entry is a link target.
*/
bool isLinkTarget() const;
/**
* Get if the entry is a deleted entry.
*
* @return True if the entry is a deleted entry.
*/
bool isDeleted() const;
/**
* Get the entry pointed by this entry.
*
* @return the entry pointed.
* @throw NoEntry if the entry is not a redirected entry.
*/
Entry getRedirectEntry() const;
/**
* Get the final entry pointed by this entry.
*
* Follow the redirection until a "not redirecting" entry is found.
* If the entry is not a redirected entry, return the entry itself.
*
* @return the final entry.
*/
Entry getFinalEntry() const;
/**
* Convert the entry to a boolean value.
*
* @return True if the entry is valid.
*/
explicit operator bool() const { return good(); }
private:
zim::Article article;
mutable zim::Article final_article;
bool good() const { return article.good(); }
};
}
#endif // KIWIX_ENTRY_H

View File

@@ -84,10 +84,21 @@ class Manager
* @param libraryPath The library path (used to resolve relative path)
* @return True if the content has been properly parsed.
*/
bool readXml(const string xml,
bool readXml(const string& xml,
const bool readOnly = true,
const string libraryPath = "");
/**
* Load a library content stored in a OPDS stream.
*
* @param content The content of the OPDS stream.
* @param readOnly Set if the library path could be overwritten later with
* updated content.
* @param libraryPath The library path (used to resolve relative path)
* @return True if the content has been properly parsed.
*/
bool readOpds(const string& content, const std::string& urlHost);
/**
* Write the library to a file.
*
@@ -97,8 +108,6 @@ class Manager
bool writeFile(const string path);
string write_OPDS_feed(const string& id, const string& title);
/**
* Remove a book from the library.
*
@@ -256,6 +265,16 @@ class Manager
const string creator,
const string publisher,
const string search);
/**
* Filter the library and generate a new one with the keep elements.
*
* @param search List only books with search in the title or description.
* @return A `Library`.
*/
Library filter(const string& search);
/**
* Get all langagues of the books in the library.
*
@@ -295,6 +314,8 @@ class Manager
bool parseXmlDom(const pugi::xml_document& doc,
const bool readOnly,
const string libraryPath);
bool parseOpdsDom(const pugi::xml_document& doc,
const std::string& urlHost);
private:
void checkAndCleanBookPaths(Book& book, const string& libraryPath);

View File

@@ -1,7 +1,11 @@
headers = [
'common.h',
'library.h',
'manager.h',
'opds_dumper.h',
'downloader.h',
'reader.h',
'entry.h',
'searcher.h'
]

107
include/opds_dumper.h Normal file
View File

@@ -0,0 +1,107 @@
/*
* Copyright 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#ifndef KIWIX_OPDS_DUMPER_H
#define KIWIX_OPDS_DUMPER_H
#include <time.h>
#include <sstream>
#include <string>
#include <pugixml.hpp>
#include "common/base64.h"
#include "common/pathTools.h"
#include "common/regexTools.h"
#include "library.h"
#include "reader.h"
using namespace std;
namespace kiwix
{
/**
* A tool to dump a `Library` into a opds stream.
*
*/
class OPDSDumper
{
public:
OPDSDumper() = default;
OPDSDumper(Library library);
~OPDSDumper();
/**
* Dump the OPDS feed.
*
* @param id The id of the library.
* @return The OPDS feed.
*/
std::string dumpOPDSFeed();
/**
* Set the id of the opds stream.
*
* @param id the id to use.
*/
void setId(const std::string& id) { this->id = id;}
/**
* Set the title oft the opds stream.
*
* @param title the title to use.
*/
void setTitle(const std::string& title) { this->title = title; }
/**
* Set the root location used when generating url.
*
* @param rootLocation the root location to use.
*/
void setRootLocation(const std::string& rootLocation) { this->rootLocation = rootLocation; }
/**
* Set the search url.
*
* @param searchUrl the search url to use.
*/
void setSearchDescriptionUrl(const std::string& searchDescriptionUrl) { this->searchDescriptionUrl = searchDescriptionUrl; }
/**
* Set the library to dump.
*
* @param library The library to dump.
*/
void setLibrary(Library library) { this->library = library; }
protected:
kiwix::Library library;
std::string id;
std::string title;
std::string date;
std::string rootLocation;
std::string searchDescriptionUrl;
private:
pugi::xml_node handleBook(Book book, pugi::xml_node root_node);
};
}
#endif // KIWIX_OPDS_DUMPER_H

View File

@@ -29,6 +29,8 @@
#include <map>
#include <sstream>
#include <string>
#include "common.h"
#include "entry.h"
#include "common/pathTools.h"
#include "common/stringTools.h"
@@ -38,7 +40,7 @@ namespace kiwix
{
/**
* The Reader class is the class who allow to get an article content from a zim
* The Reader class is the class who allow to get an entry content from a zim
* file.
*/
class Reader
@@ -57,11 +59,11 @@ class Reader
~Reader();
/**
* Get the number of "displayable" articles in the zim file.
* Get the number of "displayable" entries in the zim file.
*
* @return If the zim file has a /M/Counter metadata, return the number of
* articles with the 'text/html' MIMEtype specified in the metadata.
* Else return the number of articles in the 'A' namespace.
* entries with the 'text/html' MIMEtype specified in the metadata.
* Else return the number of entries in the 'A' namespace.
*/
unsigned int getArticleCount() const;
@@ -69,16 +71,16 @@ class Reader
* Get the number of media in the zim file.
*
* @return If the zim file has a /M/Counter metadata, return the number of
* articles with the 'image/jpeg', 'image/gif' and 'image/png' in
* entries with the 'image/jpeg', 'image/gif' and 'image/png' in
* the metadata.
* Else return the number of articles in the 'I' namespace.
* Else return the number of entries in the 'I' namespace.
*/
unsigned int getMediaCount() const;
/**
* Get the number of all articles in the zim file.
* Get the number of all entries in the zim file.
*
* @return Return the number of all the articles, whatever their MIMEtype or
* @return Return the number of all the entries, whatever their MIMEtype or
* their namespace.
*/
unsigned int getGlobalCount() const;
@@ -100,25 +102,54 @@ class Reader
/**
* Get the url of a random page.
*
* @return Url of a random page. The page is picked from all articles in
* Deprecated : Use `getRandomPage` instead.
*
* @return Url of a random page. The page is picked from all entries in
* the 'A' namespace.
* The main page is excluded from the potential results.
*/
string getRandomPageUrl() const;
DEPRECATED string getRandomPageUrl() const;
/**
* Get a random page.
*
* @return A random Entry. The entry is picked from all entries in
* the 'A' namespace.
* The main entry is excluded from the potential results.
*/
Entry getRandomPage() const;
/**
* Get the url of the first page.
*
* @return Url of the first article in the 'A' namespace.
* Deprecated : Use `getFirstPage` instead.
*
* @return Url of the first entry in the 'A' namespace.
*/
string getFirstPageUrl() const;
DEPRECATED string getFirstPageUrl() const;
/**
* Get the entry of the first page.
*
* @return The first entry in the 'A' namespace.
*/
Entry getFirstPage() const;
/**
* Get the url of the main page.
*
* Deprecated : Use `getMainPage` instead.
*
* @return Url of the main page as specified in the zim file.
*/
string getMainPageUrl() const;
DEPRECATED string getMainPageUrl() const;
/**
* Get the entry of the main page.
*
* @return Entry of the main page as specified in the zim file.
*/
Entry getMainPage() const;
/**
* Get the content of a metadata.
@@ -207,6 +238,35 @@ class Reader
*/
bool getFavicon(string& content, string& mimeType) const;
/**
* Get an entry associated to an path.
*
* @param path The path of the entry.
* @return The entry.
* @throw NoEntry If no entry correspond to the path.
*/
Entry getEntryFromPath(const std::string& path) const;
/**
* Get an entry associated to an url encoded path.
*
* Equivalent to `getEntryFromPath(urlDecode(path));`
*
* @param path The url encoded path.
* @return The entry.
* @throw NoEntry If no entry correspond to the path.
*/
Entry getEntryFromEncodedPath(const std::string& path) const;
/**
* Get un entry associated to a title.
*
* @param title The title.
* @return The entry
* throw NoEntry If no entry correspond to the url.
*/
Entry getEntryFromTitle(const std::string& title) const;
/**
* Get the url of a page specified by a title.
*
@@ -214,34 +274,34 @@ class Reader
* @param[out] url the url of the page.
* @return True if the page can be found.
*/
bool getPageUrlFromTitle(const string& title, string& url) const;
DEPRECATED bool getPageUrlFromTitle(const string& title, string& url) const;
/**
* Get the mimetype of a article specified by a url.
* Get the mimetype of a entry specified by a url.
*
* @param[in] url the url of the article.
* @param[out] mimetype the mimeType of the article.
* @param[in] url the url of the entry.
* @param[out] mimeType the mimeType of the entry.
* @return True if the mimeType has been found.
*/
bool getMimeTypeByUrl(const string& url, string& mimeType) const;
DEPRECATED bool getMimeTypeByUrl(const string& url, string& mimeType) const;
/**
* Get the content of an article specifed by a url.
* Get the content of an entry specifed by a url.
*
* Alias to `getContentByEncodedUrl`
*/
bool getContentByUrl(const string& url,
DEPRECATED bool getContentByUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
/**
* Get the content of an article specified by a url encoded url.
* Get the content of an entry specified by a url encoded url.
*
* Equivalent to getContentByDecodedUrl(urlDecode(url), ...).
*/
bool getContentByEncodedUrl(const string& url,
DEPRECATED bool getContentByEncodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
@@ -249,54 +309,54 @@ class Reader
string& baseUrl) const;
/**
* Get the content of an article specified by an url encoded url.
* Get the content of an entry specified by an url encoded url.
*
* Equivalent to getContentByEncodedUrl but without baseUrl.
*/
bool getContentByEncodedUrl(const string& url,
DEPRECATED bool getContentByEncodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
/**
* Get the content of an article specified by a url.
* Get the content of an entry specified by a url.
*
* @param[in] url The url of the article.
* @param[out] content The content of the article.
* @param[out] title the title of the article.
* @param[out] contentLength The size of the article (size of content).
* @param[out] contentType The mimeType of the article.
* @param[out] baseUrl Return the true url of the article.
* If the specified article is a redirection, contains
* the url of the targeted article.
* @return True if the article has been found.
* @param[in] url The url of the entry.
* @param[out] content The content of the entry.
* @param[out] title the title of the entry.
* @param[out] contentLength The size of the entry (size of content).
* @param[out] contentType The mimeType of the entry.
* @param[out] baseUrl Return the true url of the entry.
* If the specified entry is a redirection, contains
* the url of the targeted entry.
* @return True if the entry has been found.
*/
bool getContentByDecodedUrl(const string& url,
DEPRECATED bool getContentByDecodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType,
string& baseUrl) const;
/**
* Get the content of an article specified by a url.
* Get the content of an entry specified by a url.
*
* Equivalent to getContentByDecodedUrl but withou the baseUrl.
*/
bool getContentByDecodedUrl(const string& url,
DEPRECATED bool getContentByDecodedUrl(const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType) const;
/**
* Search for articles with title starting with prefix (case sensitive).
* Search for entries with title starting with prefix (case sensitive).
*
* Suggestions are stored in an internal vector and can be retrieved using
* `getNextSuggestion` method.
*
* @param prefix The prefix to search.
* @param suggestionCount How many suggestions to search for.
* @param suggestionsCount How many suggestions to search for.
* @param reset If true, remove previous suggestions in the internal vector.
* If false, add suggestions to the internal vector
* (until internal vector size is suggestionCount (or no more
@@ -308,7 +368,7 @@ class Reader
const bool reset = true);
/**
* Search for articles for the given prefix.
* Search for entries for the given prefix.
*
* If the zim file has a internal fulltext index, the suggestions will be
* searched using it.
@@ -320,7 +380,7 @@ class Reader
* The internal vector will be reset.
*
* @param prefix The prefix to search for.
* @param suggestionCount How many suggestions to search for.
* @param suggestionsCount How many suggestions to search for.
*/
bool searchSuggestionsSmart(const string& prefix,
unsigned int suggestionsCount);
@@ -328,10 +388,20 @@ class Reader
/**
* Check if the url exists in the zim file.
*
* Deprecated : Use `pathExists` instead.
*
* @param url the url to check.
* @return True if the url exits in the zim file.
*/
bool urlExists(const string& url) const;
DEPRECATED bool urlExists(const string& url) const;
/**
* Check if the path exists in the zim file.
*
* @param path the path to check.
* @return True if the path exists in the zim file.
*/
bool pathExists(const string& path) const;
/**
* Check if the zim file has a embedded fulltext index.
@@ -388,7 +458,7 @@ class Reader
* @param[out] title The url (url).
* @return True
*/
bool parseUrl(const string& url, char* ns, string& title) const;
DEPRECATED bool parseUrl(const string& url, char* ns, string& title) const;
/**
* Return the total size of the zim file.
@@ -413,7 +483,7 @@ class Reader
* @param[out] article The libzim article object.
* @return True if the url is good (article.good()).
*/
bool getArticleObjectByDecodedUrl(const string& url,
DEPRECATED bool getArticleObjectByDecodedUrl(const string& url,
zim::Article& article) const;
protected:

View File

@@ -1,7 +1,7 @@
project('kiwixlib', 'cpp',
version : '1.1.0',
project('kiwix-lib', 'cpp',
version : '2.0.0',
license : 'GPL',
default_options : ['c_std=c11', 'cpp_std=c++11'])
default_options : ['c_std=c11', 'cpp_std=c++11', 'werror=true'])
compiler = meson.get_compiler('cpp')
find_library_in_compiler = meson.version().version_compare('>=0.31.0')
@@ -10,8 +10,9 @@ static_deps = get_option('android') or get_option('default_library') == 'static'
thread_dep = dependency('threads')
libicu_dep = dependency('icu-i18n', static:static_deps)
libzim_dep = dependency('libzim', version : '>=3.0.0', static:static_deps)
libzim_dep = dependency('libzim', version : '>=3.3.0', static:static_deps)
pugixml_dep = dependency('pugixml', static:static_deps)
libaria2_dep = dependency('libaria2', static:static_deps)
ctpp2_include_path = ''
has_ctpp2_dep = false
@@ -48,8 +49,14 @@ else
ctpp2_include_args = ['-I'+ctpp2_include_path]
if compiler.has_header('ctpp2/CTPP2Logger.hpp', args:ctpp2_include_args)
ctpp2_include_dir = include_directories(ctpp2_include_path, is_system:true)
ctpp2_lib_path = ctpp2_prefix_install+'/lib'
ctpp2_lib = compiler.find_library('ctpp2', dirs:ctpp2_lib_path)
ctpp2_lib_path = join_paths(ctpp2_prefix_install, get_option('libdir'))
message(ctpp2_lib_path)
ctpp2_lib = compiler.find_library('ctpp2', dirs:ctpp2_lib_path, required:false)
if not ctpp2_lib.found()
ctpp2_lib_path = join_paths(ctpp2_prefix_install, 'lib')
message(ctpp2_lib_path)
ctpp2_lib = compiler.find_library('ctpp2', dirs:ctpp2_lib_path)
endif
ctpp2_link_args = ['-L'+ctpp2_lib_path, '-lctpp2']
if meson.is_cross_build() and host_machine.system() == 'windows'
iconv_lib = compiler.find_library('iconv', required:false)
@@ -66,7 +73,7 @@ endif
xapian_dep = dependency('xapian-core', required:false, static:static_deps)
all_deps = [thread_dep, libicu_dep, libzim_dep, xapian_dep, pugixml_dep]
all_deps = [thread_dep, libicu_dep, libzim_dep, xapian_dep, pugixml_dep, libaria2_dep]
if has_ctpp2_dep
all_deps += [ctpp2_dep]
endif
@@ -82,7 +89,7 @@ subdir('scripts')
subdir('static')
subdir('src')
pkg_requires = ['libzim', 'icu-i18n', 'pugixml']
pkg_requires = ['libzim', 'icu-i18n', 'pugixml', 'libaria2']
if xapian_dep.found()
pkg_requires += ['xapian-core']
endif

View File

@@ -60,7 +60,7 @@ Java_org_kiwix_kiwixlib_JNIKiwixReader_getMainPage(JNIEnv* env, jobject obj)
jstring url;
try {
std::string cUrl = READER->getMainPageUrl();
std::string cUrl = READER->getMainPage().getPath();
url = c2jni(cUrl, env);
} catch (...) {
std::cerr << "Unable to get ZIM main page" << std::endl;
@@ -196,8 +196,8 @@ JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getMimeType(
std::string cUrl = jni2c(url, env);
try {
std::string cMimeType;
READER->getMimeTypeByUrl(cUrl, cMimeType);
auto entry = READER->getEntryFromEncodedPath(cUrl);
auto cMimeType = entry.getMimetype();
mimeType = c2jni(cMimeType, env);
} catch (...) {
std::cerr << "Unable to get mime-type for url " << cUrl << std::endl;
@@ -216,20 +216,20 @@ JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getContent(
/* Retrieve the content */
std::string cUrl = jni2c(url, env);
std::string cData;
std::string cTitle;
std::string cMimeType;
unsigned int cSize = 0;
try {
if (READER->getContentByUrl(cUrl, cData, cTitle, cSize, cMimeType)) {
data = env->NewByteArray(cSize);
env->SetByteArrayRegion(
data, 0, cSize, reinterpret_cast<const jbyte*>(cData.c_str()));
setStringObjValue(cMimeType, mimeTypeObj, env);
setStringObjValue(cTitle, titleObj, env);
setIntObjValue(cSize, sizeObj, env);
}
auto entry = READER->getEntryFromEncodedPath(cUrl);
entry = entry.getFinalEntry();
cSize = entry.getSize();
setIntObjValue(cSize, sizeObj, env);
data = env->NewByteArray(cSize);
env->SetByteArrayRegion(
data, 0, cSize, reinterpret_cast<const jbyte*>(entry.getBlob().data()));
setStringObjValue(entry.getMimetype(), mimeTypeObj, env);
setStringObjValue(entry.getTitle(), titleObj, env);
} catch (...) {
std::cerr << "Unable to get content for url " << cUrl << std::endl;
}
@@ -249,22 +249,13 @@ JNIEXPORT jbyteArray JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getContentPa
unsigned int cOffset = jni2c(offset);
unsigned int cLen = jni2c(len);
try {
zim::Article article;
READER->getArticleObjectByDecodedUrl(kiwix::urlDecode(cUrl), article);
if (! article.good()) {
return data;
}
int loopCounter = 0;
while (article.isRedirect() && ++loopCounter < 42) {
article = article.getRedirectArticle();
}
if (loopCounter == 42) {
return data;
}
auto entry = READER->getEntryFromEncodedPath(cUrl);
entry = entry.getFinalEntry();
if (cLen == 0) {
setIntObjValue(article.getArticleSize(), sizeObj, env);
} else if (cOffset+cLen > article.getArticleSize()) {
auto blob = article.getData(cOffset, cLen);
setIntObjValue(entry.getSize(), sizeObj, env);
} else if (cOffset+cLen < entry.getSize()) {
auto blob = entry.getBlob(cOffset, cLen);
data = env->NewByteArray(cLen);
env->SetByteArrayRegion(
data, 0, cLen, reinterpret_cast<const jbyte*>(blob.data()));
@@ -288,20 +279,9 @@ Java_org_kiwix_kiwixlib_JNIKiwixReader_getDirectAccessInformation(
std::string cUrl = jni2c(url, env);
try {
zim::Article article;
READER->getArticleObjectByDecodedUrl(kiwix::urlDecode(cUrl), article);
if (! article.good()) {
return pair;
}
int loopCounter = 0;
while (article.isRedirect() && ++loopCounter < 42) {
article = article.getRedirectArticle();
}
if (loopCounter == 42) {
return pair;
}
auto part_info = article.getDirectAccessInformation();
auto entry = READER->getEntryFromEncodedPath(cUrl);
entry = entry.getFinalEntry();
auto part_info = entry.getDirectAccessInfo();
setPairObjValue(part_info.first, part_info.second, pair, env);
} catch (...) {
std::cerr << "Unable to locate direct access information for url " << cUrl
@@ -359,20 +339,18 @@ Java_org_kiwix_kiwixlib_JNIKiwixReader_getPageUrlFromTitle(JNIEnv* env,
jstring title,
jobject urlObj)
{
jboolean retVal = JNI_FALSE;
std::string cTitle = jni2c(title, env);
std::string cUrl;
try {
if (READER->getPageUrlFromTitle(cTitle, cUrl)) {
setStringObjValue(cUrl, urlObj, env);
retVal = JNI_TRUE;
}
auto entry = READER->getEntryFromTitle(cTitle);
entry = entry.getFinalEntry();
setStringObjValue(entry.getPath(), urlObj, env);
return JNI_TRUE;
} catch (...) {
std::cerr << "Unable to get URL for title " << cTitle << std::endl;
}
return retVal;
return JNI_FALSE;
}
JNIEXPORT jstring JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getTitle(
@@ -410,7 +388,7 @@ JNIEXPORT jboolean JNICALL Java_org_kiwix_kiwixlib_JNIKiwixReader_getRandomPage(
std::string cUrl;
try {
std::string cUrl = READER->getRandomPageUrl();
std::string cUrl = READER->getRandomPage().getPath();
setStringObjValue(cUrl, urlObj, env);
retVal = JNI_TRUE;
} catch (...) {

View File

@@ -19,13 +19,15 @@
#include <common/networkTools.h>
std::map<std::string, std::string> kiwix::getNetworkInterfaces()
{
std::map<std::string, std::string> interfaces;
#ifdef _WIN32
SOCKET sd = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0);
if (sd == SOCKET_ERROR) {
if (sd == (SOCKET)SOCKET_ERROR) {
std::cerr << "Failed to get a socket. Error " << WSAGetLastError()
<< std::endl;
return interfaces;

View File

@@ -188,6 +188,20 @@ string getFileSizeAsString(const string& path)
return convert.str();
}
string getFileContent(const string& path)
{
std::ifstream f(path, std::ios::in|std::ios::ate);
std::string content;
if (f.is_open()) {
auto size = f.tellg();
content.reserve(size);
f.seekg(0, std::ios::beg);
content.assign((std::istreambuf_iterator<char>(f)),
std::istreambuf_iterator<char>());
}
return content;
}
bool fileExists(const string& path)
{
#ifdef _WIN32
@@ -214,6 +228,30 @@ bool makeDirectory(const string& path)
return status == 0;
}
string makeTmpDirectory()
{
#ifdef _WIN32
char cbase[MAX_PATH+1];
int base_len = GetTempPath(MAX_PATH+1, cbase);
UUID uuid;
UuidCreate(&uuid);
char* dir_name;
UuidToString(&uuid, reinterpret_cast<unsigned char**>(&dir_name));
string dir(cbase, base_len);
dir += dir_name;
_mkdir(dir.c_str());
RpcStringFree(reinterpret_cast<unsigned char**>(&dir_name));
#else
string base = "/tmp";
auto _template = base + "/kiwix-lib_XXXXXX";
char* _template_array = new char[_template.size()+1];
memcpy(_template_array, _template.c_str(), _template.size());
string dir = mkdtemp(_template_array);
delete[] _template_array;
#endif
return dir;
}
/* Try to create a link and if does not work then make a copy */
bool copyFile(const string& sourcePath, const string& destPath)
{

112
src/downloader.cpp Normal file
View File

@@ -0,0 +1,112 @@
/*
* Copyright 2018 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "downloader.h"
#include "common/pathTools.h"
#include <unistd.h>
#include <iostream>
namespace kiwix
{
pthread_mutex_t Downloader::globalLock = PTHREAD_MUTEX_INITIALIZER;
/* Constructor */
Downloader::Downloader()
{
aria2::SessionConfig config;
config.downloadEventCallback = Downloader::downloadEventCallback;
config.userData = this;
tmpDir = makeTmpDirectory();
aria2::KeyVals options;
options.push_back(std::pair<std::string, std::string>("dir", tmpDir));
session = aria2::sessionNew(options, config);
}
/* Destructor */
Downloader::~Downloader()
{
aria2::sessionFinal(session);
rmdir(tmpDir.c_str());
}
int Downloader::downloadEventCallback(aria2::Session* session,
aria2::DownloadEvent event,
aria2::A2Gid gid,
void* userData)
{
Downloader* downloader = static_cast<Downloader*>(userData);
auto fileHandle = downloader->fileHandle;
auto dh = aria2::getDownloadHandle(session, gid);
if (!dh) {
return 0;
}
switch (event) {
case aria2::EVENT_ON_DOWNLOAD_COMPLETE:
{
if (dh->getNumFiles() > 0) {
auto f = dh->getFile(1);
fileHandle->path = f.path;
fileHandle->success = true;
}
}
break;
case aria2::EVENT_ON_DOWNLOAD_ERROR:
{
fileHandle->success = false;
}
break;
default:
break;
}
aria2::deleteDownloadHandle(dh);
return 0;
}
DownloadedFile Downloader::download(const std::string& url) {
pthread_mutex_lock(&globalLock);
DownloadedFile fileHandle;
try {
std::vector<std::string> uris = {url};
aria2::KeyVals options;
aria2::A2Gid gid;
int ret;
DownloadedFile fileHandle;
ret = aria2::addUri(session, &gid, uris, options);
if (ret < 0) {
std::cerr << "Failed to download" << std::endl;
} else {
this->fileHandle = &fileHandle;
aria2::run(session, aria2::RUN_DEFAULT);
}
} catch (...) {};
this->fileHandle = nullptr;
pthread_mutex_unlock(&globalLock);
return fileHandle;
}
}

138
src/entry.cpp Normal file
View File

@@ -0,0 +1,138 @@
/*
* Copyright 2011 Emmanuel Engelhart <kelson@kiwix.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "reader.h"
#include <time.h>
#include <zim/search.h>
namespace kiwix
{
Entry::Entry(zim::Article article)
: article(article)
{
}
#define RETURN_IF_INVALID(WHAT) if(!good()) { return (WHAT); }
std::string Entry::getPath() const
{
RETURN_IF_INVALID("");
return article.getLongUrl();
}
std::string Entry::getTitle() const
{
RETURN_IF_INVALID("");
return article.getTitle();
}
std::string Entry::getContent() const
{
RETURN_IF_INVALID("");
return article.getData();
}
zim::Blob Entry::getBlob(offset_type offset) const
{
RETURN_IF_INVALID(zim::Blob());
return article.getData(offset);
}
zim::Blob Entry::getBlob(offset_type offset, size_type size) const
{
RETURN_IF_INVALID(zim::Blob());
return article.getData(offset, size);
}
std::pair<std::string, offset_type> Entry::getDirectAccessInfo() const
{
RETURN_IF_INVALID(std::make_pair("", 0));
return article.getDirectAccessInformation();
}
size_type Entry::getSize() const
{
RETURN_IF_INVALID(0);
return article.getArticleSize();
}
std::string Entry::getMimetype() const
{
RETURN_IF_INVALID("");
try {
return article.getMimeType();
} catch (exception& e) {
return "application/octet-stream";
}
}
bool Entry::isRedirect() const
{
RETURN_IF_INVALID(false);
return article.isRedirect();
}
bool Entry::isLinkTarget() const
{
RETURN_IF_INVALID(false);
return article.isLinktarget();
}
bool Entry::isDeleted() const
{
RETURN_IF_INVALID(false);
return article.isDeleted();
}
Entry Entry::getRedirectEntry() const
{
RETURN_IF_INVALID(Entry());
if ( !article.isRedirect() ) {
throw NoEntry();
}
auto targeted_article = article.getRedirectArticle();
if ( !targeted_article.good()) {
throw NoEntry();
}
return targeted_article;
}
Entry Entry::getFinalEntry() const
{
RETURN_IF_INVALID(Entry());
if (final_article.good()) {
return final_article;
}
int loopCounter = 42;
final_article = article;
while (final_article.isRedirect() && loopCounter--) {
final_article = final_article.getRedirectArticle();
if ( !final_article.good()) {
throw NoEntry();
}
}
return final_article;
}
}

View File

@@ -18,6 +18,7 @@
*/
#include "manager.h"
#include "downloader.h"
namespace kiwix
{
@@ -88,7 +89,7 @@ bool Manager::parseXmlDom(const pugi::xml_document& doc,
return true;
}
bool Manager::readXml(const string xml,
bool Manager::readXml(const string& xml,
const bool readOnly,
const string libraryPath)
{
@@ -103,6 +104,67 @@ bool Manager::readXml(const string xml,
return true;
}
bool Manager::parseOpdsDom(const pugi::xml_document& doc, const std::string& urlHost)
{
pugi::xml_node libraryNode = doc.child("feed");
for (pugi::xml_node entryNode = libraryNode.child("entry"); entryNode;
entryNode = entryNode.next_sibling("entry")) {
kiwix::Book book;
book.readOnly = false;
book.id = entryNode.child("id").child_value();
book.title = entryNode.child("title").child_value();
book.description = entryNode.child("summary").child_value();
book.language = entryNode.child("language").child_value();
book.date = entryNode.child("updated").child_value();
book.creator = entryNode.child("author").child("name").child_value();
for(pugi::xml_node linkNode = entryNode.child("link"); linkNode;
linkNode = linkNode.next_sibling("link")) {
std::string rel = linkNode.attribute("rel").value();
if (rel == "http://opds-spec.org/image/thumbnail") {
auto faviconUrl = urlHost + linkNode.attribute("href").value();
auto downloader = Downloader();
auto fileHandle = downloader.download(faviconUrl);
if (fileHandle.success) {
auto content = getFileContent(fileHandle.path);
book.favicon = base64_encode((const unsigned char*)content.data(), content.size());
book.faviconMimeType = linkNode.attribute("type").value();
} else {
std::cerr << "Cannot get favicon content from " << faviconUrl << std::endl;
}
} else if (rel == "http://opds-spec.org/acquisition/open-access") {
book.url = linkNode.attribute("href").value();
}
}
/* Update the book properties with the new importer */
library.addBook(book);
}
return true;
}
bool Manager::readOpds(const string& content, const std::string& urlHost)
{
pugi::xml_document doc;
pugi::xml_parse_result result
= doc.load_buffer_inplace((void*)content.data(), content.size());
if (result) {
this->parseOpdsDom(doc, urlHost);
return true;
}
return false;
}
bool Manager::readFile(const string path, const bool readOnly)
{
return this->readFile(path, path, readOnly);
@@ -231,6 +293,7 @@ bool Manager::writeFile(const string path)
return true;
}
bool Manager::setCurrentBookId(const string id)
{
if (library.current.empty() || library.current.top() != id) {
@@ -625,6 +688,24 @@ bool Manager::listBooks(const supportedListMode mode,
return true;
}
Library Manager::filter(const std::string& search) {
Library library;
if (search.empty()) {
return library;
}
for(auto book:this->library.books) {
if (matchRegex(book.title, "\\Q" + search + "\\E")
|| matchRegex(book.description, "\\Q" + search + "\\E")) {
library.addBook(book);
}
}
return library;
}
void Manager::checkAndCleanBookPaths(Book& book, const string& libraryPath)
{
if (!book.path.empty()) {

View File

@@ -1,7 +1,10 @@
kiwix_sources = [
'library.cpp',
'manager.cpp',
'opds_dumper.cpp',
'downloader.cpp',
'reader.cpp',
'entry.cpp',
'searcher.cpp',
'common/base64.cpp',
'common/pathTools.cpp',
@@ -41,4 +44,5 @@ kiwixlib = library('kiwix',
dependencies : all_deps,
version: meson.project_version(),
install: true,
install_dir: install_dir)
install_dir: install_dir,
install_rpath: '$ORIGIN')

135
src/opds_dumper.cpp Normal file
View File

@@ -0,0 +1,135 @@
/*
* Copyright 2017 Matthieu Gautier <mgautier@kymeria.fr>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
* MA 02110-1301, USA.
*/
#include "opds_dumper.h"
namespace kiwix
{
/* Constructor */
OPDSDumper::OPDSDumper(Library library)
: library(library)
{
}
/* Destructor */
OPDSDumper::~OPDSDumper()
{
}
struct xml_string_writer: pugi::xml_writer
{
std::string result;
virtual void write(const void* data, size_t size)
{
result.append(static_cast<const char*>(data), size);
}
};
std::string node_to_string(pugi::xml_node node)
{
xml_string_writer writer;
node.print(writer, " ");
return writer.result;
}
std::string gen_date_str()
{
auto now = time(0);
auto tm = localtime(&now);
std::stringstream is;
is << std::setw(2) << std::setfill('0')
<< 1900+tm->tm_year << "-"
<< std::setw(2) << std::setfill('0') << tm->tm_mon << "-"
<< std::setw(2) << std::setfill('0') << tm->tm_mday << "T"
<< std::setw(2) << std::setfill('0') << tm->tm_hour << ":"
<< std::setw(2) << std::setfill('0') << tm->tm_min << ":"
<< std::setw(2) << std::setfill('0') << tm->tm_sec << "Z";
return is.str();
}
#define ADD_TEXT_ENTRY(node, child, value) (node).append_child((child)).append_child(pugi::node_pcdata).set_value((value).c_str())
pugi::xml_node OPDSDumper::handleBook(Book book, pugi::xml_node root_node) {
auto entry_node = root_node.append_child("entry");
ADD_TEXT_ENTRY(entry_node, "title", book.title);
ADD_TEXT_ENTRY(entry_node, "id", "urn:uuid:"+book.id);
ADD_TEXT_ENTRY(entry_node, "icon", rootLocation + "/meta?name=favicon&content=" + book.getHumanReadableIdFromPath());
ADD_TEXT_ENTRY(entry_node, "updated", date);
ADD_TEXT_ENTRY(entry_node, "summary", book.description);
auto content_node = entry_node.append_child("link");
content_node.append_attribute("type") = "text/html";
content_node.append_attribute("href") = (rootLocation + "/" + book.getHumanReadableIdFromPath()).c_str();
auto author_node = entry_node.append_child("author");
ADD_TEXT_ENTRY(author_node, "name", book.creator);
if (! book.url.empty()) {
auto acquisition_link = entry_node.append_child("link");
acquisition_link.append_attribute("rel") = "http://opds-spec.org/acquisition/open-access";
acquisition_link.append_attribute("type") = "application/x-zim";
acquisition_link.append_attribute("href") = book.url.c_str();
}
if (! book.faviconMimeType.empty() ) {
auto image_link = entry_node.append_child("link");
image_link.append_attribute("rel") = "http://opds-spec.org/image/thumbnail";
image_link.append_attribute("type") = book.faviconMimeType.c_str();
image_link.append_attribute("href") = (rootLocation + "/meta?name=favicon&content=" + book.getHumanReadableIdFromPath()).c_str();
}
return entry_node;
}
string OPDSDumper::dumpOPDSFeed()
{
date = gen_date_str();
pugi::xml_document doc;
auto root_node = doc.append_child("feed");
root_node.append_attribute("xmlns") = "http://www.w3.org/2005/Atom";
root_node.append_attribute("xmlns:opds") = "http://opds-spec.org/2010/catalog";
ADD_TEXT_ENTRY(root_node, "id", id);
ADD_TEXT_ENTRY(root_node, "title", title);
ADD_TEXT_ENTRY(root_node, "updated", date);
auto self_link_node = root_node.append_child("link");
self_link_node.append_attribute("rel") = "self";
self_link_node.append_attribute("href") = "";
self_link_node.append_attribute("type") = "application/atom+xml";
if (!searchDescriptionUrl.empty() ) {
auto search_link = root_node.append_child("link");
search_link.append_attribute("rel") = "search";
search_link.append_attribute("type") = "application/opensearchdescription+xml";
search_link.append_attribute("href") = searchDescriptionUrl.c_str();
}
for (auto book: library.books) {
handleBook(book, root_node);
}
return node_to_string(root_node);
}
}

View File

@@ -190,79 +190,88 @@ string Reader::getId() const
/* Return a page url from a title */
bool Reader::getPageUrlFromTitle(const string& title, string& url) const
{
/* Extract the content from the zim file */
zim::Article article = this->zimFileHandler->getArticleByTitle('A', title);
if (!article.good()) {
try {
auto entry = getEntryFromTitle(title);
entry = entry.getFinalEntry();
url = entry.getPath();
return true;
} catch (NoEntry& e) {
return false;
}
unsigned int loopCounter = 0;
while (article.isRedirect() && loopCounter++ < 42) {
article = article.getRedirectArticle();
}
url = article.getLongUrl();
return true;
}
/* Return an URL from a title */
string Reader::getRandomPageUrl() const
{
return getRandomPage().getPath();
}
Entry Reader::getRandomPage() const
{
if (!this->zimFileHandler) {
throw NoEntry();
}
zim::Article article;
zim::size_type idx;
std::string mainPageUrl = this->getMainPageUrl();
std::string mainPagePath = this->getMainPage().getPath();
int watchdog = 42;
do {
idx = this->firstArticleOffset
auto idx = this->firstArticleOffset
+ (zim::size_type)((double)rand() / ((double)RAND_MAX + 1)
* this->nsACount);
article = zimFileHandler->getArticle(idx);
} while (article.getLongUrl() == mainPageUrl);
if (!watchdog--) {
throw NoEntry();
}
} while (!article.good() && article.getLongUrl() == mainPagePath);
return article.getLongUrl();
return article;
}
/* Return the welcome page URL */
string Reader::getMainPageUrl() const
{
string url = "";
return getMainPage().getPath();
}
if (this->zimFileHandler->getFileheader().hasMainPage()) {
zim::Article article = zimFileHandler->getArticle(
this->zimFileHandler->getFileheader().getMainPage());
url = article.getLongUrl();
if (url.empty()) {
url = getFirstPageUrl();
}
} else {
url = getFirstPageUrl();
Entry Reader::getMainPage() const
{
if (!this->zimFileHandler) {
throw NoEntry();
}
return url;
string url = "";
zim::Article article;
if (this->zimFileHandler->getFileheader().hasMainPage())
{
article = zimFileHandler->getArticle(
this->zimFileHandler->getFileheader().getMainPage());
}
if (!article.good())
{
return getFirstPage();
}
return article;
}
bool Reader::getFavicon(string& content, string& mimeType) const
{
unsigned int contentLength = 0;
string title;
static const char* const paths[] = {"-/favicon.png", "I/favicon.png", "I/favicon", "-/favicon"};
this->getContentByUrl("/-/favicon.png", content, title, contentLength, mimeType);
if (content.empty()) {
this->getContentByUrl("/I/favicon.png", content, title, contentLength, mimeType);
if (content.empty()) {
this->getContentByUrl("/I/favicon", content, title, contentLength, mimeType);
if (content.empty()) {
this->getContentByUrl("/-/favicon", content, title, contentLength, mimeType);
}
}
for (auto &path: paths) {
try {
auto entry = getEntryFromPath(path);
content = entry.getContent();
mimeType = entry.getMimetype();
return true;
} catch(NoEntry& e) {};
}
return content.empty() ? false : true;
return false;
}
string Reader::getZimFilePath() const
@@ -272,11 +281,13 @@ string Reader::getZimFilePath() const
/* Return a metatag value */
bool Reader::getMetatag(const string& name, string& value) const
{
unsigned int contentLength = 0;
string contentType = "";
string title;
return this->getContentByUrl("/M/" + name, value, title, contentLength, contentType);
try {
auto entry = getEntryFromPath("M/"+name);
value = entry.getContent();
return true;
} catch(NoEntry& e) {
return false;
}
}
string Reader::getTitle() const
@@ -375,12 +386,26 @@ string Reader::getOrigId() const
/* Return the first page URL */
string Reader::getFirstPageUrl() const
{
zim::size_type firstPageOffset = zimFileHandler->getNamespaceBeginOffset('A');
zim::Article article = zimFileHandler->getArticle(firstPageOffset);
return article.getLongUrl();
return getFirstPage().getPath();
}
bool Reader::parseUrl(const string& url, char* ns, string& title) const
Entry Reader::getFirstPage() const
{
if (!this->zimFileHandler) {
throw NoEntry();
}
auto firstPageOffset = zimFileHandler->getNamespaceBeginOffset('A');
auto article = zimFileHandler->getArticle(firstPageOffset);
if (! article.good()) {
throw NoEntry();
}
return article;
}
bool _parseUrl(const string& url, char* ns, string& title)
{
/* Offset to visit the url */
unsigned int urlLength = url.size();
@@ -414,6 +439,52 @@ bool Reader::parseUrl(const string& url, char* ns, string& title) const
return true;
}
bool Reader::parseUrl(const string& url, char* ns, string& title) const
{
return _parseUrl(url, ns, title);
}
Entry Reader::getEntryFromPath(const std::string& path) const
{
char ns = 0;
std::string short_url;
if (!this->zimFileHandler) {
throw NoEntry();
}
_parseUrl(path, &ns, short_url);
if (short_url.empty() && ns == 0) {
return getMainPage();
}
auto article = zimFileHandler->getArticle(ns, short_url);
if (!article.good()) {
throw NoEntry();
}
return article;
}
Entry Reader::getEntryFromEncodedPath(const std::string& path) const
{
return getEntryFromPath(urlDecode(path));
}
Entry Reader::getEntryFromTitle(const std::string& title) const
{
if (!this->zimFileHandler) {
throw NoEntry();
}
auto article = this->zimFileHandler->getArticleByTitle('A', title);
if (!article.good()) {
throw NoEntry();
}
return article;
}
/* Return article by url */
bool Reader::getArticleObjectByDecodedUrl(const string& url,
zim::Article& article) const
@@ -425,11 +496,11 @@ bool Reader::getArticleObjectByDecodedUrl(const string& url,
/* Parse the url */
char ns = 0;
string urlStr;
this->parseUrl(url, &ns, urlStr);
_parseUrl(url, &ns, urlStr);
/* Main page */
if (urlStr.empty() && ns == 0) {
this->parseUrl(this->getMainPageUrl(), &ns, urlStr);
_parseUrl(this->getMainPage().getPath(), &ns, urlStr);
}
/* Extract the content from the zim file */
@@ -440,26 +511,53 @@ bool Reader::getArticleObjectByDecodedUrl(const string& url,
/* Return the mimeType without the content */
bool Reader::getMimeTypeByUrl(const string& url, string& mimeType) const
{
if (this->zimFileHandler == NULL) {
return false;
}
zim::Article article;
if (this->getArticleObjectByDecodedUrl(url, article)) {
try {
mimeType = article.getMimeType();
} catch (exception& e) {
cerr << "Unable to get the mimetype for " << url << ":" << e.what()
<< endl;
mimeType = "application/octet-stream";
}
try {
auto entry = getEntryFromPath(url);
mimeType = entry.getMimetype();
return true;
} else {
} catch (NoEntry& e) {
mimeType = "";
return false;
}
}
bool get_content_by_decoded_url(const Reader& reader,
const string& url,
string& content,
string& title,
unsigned int& contentLength,
string& contentType,
string& baseUrl)
{
content = "";
contentType = "";
contentLength = 0;
try {
auto entry = reader.getEntryFromPath(url);
entry = entry.getFinalEntry();
baseUrl = entry.getPath();
contentType = entry.getMimetype();
content = entry.getContent();
contentLength = entry.getSize();
title = entry.getTitle();
/* Try to set a stub HTML header/footer if necesssary */
if (contentType.find("text/html") != string::npos
&& content.find("<body") == std::string::npos
&& content.find("<BODY") == std::string::npos) {
content = "<html><head><title>" + title +
"</title><meta http-equiv=\"Content-Type\" content=\"text/html; "
"charset=utf-8\" /></head><body>" +
content + "</body></html>";
}
return true;
} catch (NoEntry& e) {
return false;
}
}
/* Get a content from a zim file */
bool Reader::getContentByUrl(const string& url,
string& content,
@@ -467,7 +565,14 @@ bool Reader::getContentByUrl(const string& url,
unsigned int& contentLength,
string& contentType) const
{
return this->getContentByEncodedUrl(url, content, title, contentLength, contentType);
std::string stubRedirectUrl;
return get_content_by_decoded_url(*this,
kiwix::urlDecode(url),
content,
title,
contentLength,
contentType,
stubRedirectUrl);
}
bool Reader::getContentByEncodedUrl(const string& url,
@@ -477,8 +582,13 @@ bool Reader::getContentByEncodedUrl(const string& url,
string& contentType,
string& baseUrl) const
{
return this->getContentByDecodedUrl(
kiwix::urlDecode(url), content, title, contentLength, contentType, baseUrl);
return get_content_by_decoded_url(*this,
kiwix::urlDecode(url),
content,
title,
contentLength,
contentType,
baseUrl);
}
bool Reader::getContentByEncodedUrl(const string& url,
@@ -488,12 +598,13 @@ bool Reader::getContentByEncodedUrl(const string& url,
string& contentType) const
{
std::string stubRedirectUrl;
return this->getContentByEncodedUrl(kiwix::urlDecode(url),
content,
title,
contentLength,
contentType,
stubRedirectUrl);
return get_content_by_decoded_url(*this,
kiwix::urlDecode(url),
content,
title,
contentLength,
contentType,
stubRedirectUrl);
}
bool Reader::getContentByDecodedUrl(const string& url,
@@ -503,12 +614,13 @@ bool Reader::getContentByDecodedUrl(const string& url,
string& contentType) const
{
std::string stubRedirectUrl;
return this->getContentByDecodedUrl(kiwix::urlDecode(url),
content,
title,
contentLength,
contentType,
stubRedirectUrl);
return get_content_by_decoded_url(*this,
url,
content,
title,
contentLength,
contentType,
stubRedirectUrl);
}
bool Reader::getContentByDecodedUrl(const string& url,
@@ -518,64 +630,31 @@ bool Reader::getContentByDecodedUrl(const string& url,
string& contentType,
string& baseUrl) const
{
content = "";
contentType = "";
contentLength = 0;
zim::Article article;
if (!this->getArticleObjectByDecodedUrl(url, article)) {
return false;
}
/* If redirect */
unsigned int loopCounter = 0;
while (article.isRedirect() && loopCounter++ < 42) {
article = article.getRedirectArticle();
}
if (loopCounter < 42) {
/* Compute base url (might be different from the url if redirects */
baseUrl
= "/" + std::string(1, article.getNamespace()) + "/" + article.getUrl();
/* Get the content mime-type */
try {
contentType
= string(article.getMimeType().data(), article.getMimeType().size());
} catch (exception& e) {
cerr << "Unable to get the mimetype for " << baseUrl << ":" << e.what()
<< endl;
contentType = "application/octet-stream";
}
/* Get the data */
content = string(article.getData().data(), article.getArticleSize());
title = article.getTitle();
}
/* Try to set a stub HTML header/footer if necesssary */
if (contentType.find("text/html") != string::npos
&& content.find("<body") == std::string::npos
&& content.find("<BODY") == std::string::npos) {
content = "<html><head><title>" + article.getTitle() +
"</title><meta http-equiv=\"Content-Type\" content=\"text/html; "
"charset=utf-8\" /></head><body>" +
content + "</body></html>";
}
/* Get the data length */
contentLength = article.getArticleSize();
return true;
return get_content_by_decoded_url(*this,
url,
content,
title,
contentLength,
contentType,
baseUrl);
}
/* Check if an article exists */
bool Reader::urlExists(const string& url) const
{
return pathExists(url);
}
bool Reader::pathExists(const string& path) const
{
if (!zimFileHandler)
{
return false;
}
char ns = 0;
string titleStr;
this->parseUrl(url, &ns, titleStr);
titleStr = "/" + titleStr;
_parseUrl(path, &ns, titleStr);
zim::File::const_iterator findItr = zimFileHandler->find(ns, titleStr);
return findItr != zimFileHandler->end() && findItr->getUrl() == titleStr;
}
@@ -583,8 +662,13 @@ bool Reader::urlExists(const string& url) const
/* Does the ZIM file has a fulltext index */
bool Reader::hasFulltextIndex() const
{
return ( this->urlExists("/Z/fulltextIndex/xapian")
&& !zimFileHandler->is_multiPart() );
if (!zimFileHandler || zimFileHandler->is_multiPart() )
{
return false;
}
return ( pathExists("Z//fulltextIndex/xapian")
|| pathExists("X/fulltext/xapian"));
}
/* Search titles by prefix */

View File

@@ -43,7 +43,7 @@ namespace kiwix
class _Result : public Result
{
public:
_Result(Searcher* searcher, zim::Search::iterator& iterator);
_Result(zim::Search::iterator& iterator);
virtual ~_Result(){};
virtual std::string get_url();
@@ -56,7 +56,6 @@ class _Result : public Result
virtual int get_readerIndex();
private:
Searcher* searcher;
zim::Search::iterator iterator;
};
@@ -258,7 +257,7 @@ Result* Searcher::getNextResult()
return internal->_xapianSearcher->getNextResult();
} else if (internal->_search &&
internal->current_iterator != internal->_search->end()) {
Result* result = new _Result(this, internal->current_iterator);
Result* result = new _Result(internal->current_iterator);
internal->current_iterator++;
return result;
}
@@ -324,8 +323,8 @@ bool Searcher::setSearchProtocolPrefix(const std::string prefix)
return true;
}
_Result::_Result(Searcher* searcher, zim::Search::iterator& iterator)
: searcher(searcher), iterator(iterator)
_Result::_Result(zim::Search::iterator& iterator)
: iterator(iterator)
{
}

View File

@@ -40,6 +40,7 @@ class MyHtmlParser : public HtmlParser {
void process_text(const string &text);
void opening_tag(const string &tag);
void closing_tag(const string &tag);
using HtmlParser::parse_html;
void parse_html(const string &text, const string &charset_,
bool charset_from_meta_);
MyHtmlParser() :

View File

@@ -193,13 +193,8 @@ std::string XapianResult::get_content()
if (!searcher->reader) {
return "";
}
std::string content;
std::string title;
unsigned int contentLength;
std::string contentType;
searcher->reader->getContentByUrl(
get_url(), content, title, contentLength, contentType);
return content;
auto entry = searcher->reader->getEntryFromEncodedPath(get_url());
return entry.getContent();
}
int XapianResult::get_size()

View File

@@ -29,7 +29,12 @@ case ${PLATFORM} in
esac
cd ${TRAVIS_BUILD_DIR}
export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/x86_64-linux-gnu/pkgconfig
if [[ "${TRAVIS_OS_NAME}" == "osx" ]]
then
export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig
else
export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/x86_64-linux-gnu/pkgconfig
fi
meson . build -Dctpp2-install-prefix=${INSTALL_DIR} ${MESON_OPTION}
cd build
ninja

View File

@@ -3,41 +3,27 @@
set -e
REPO_NAME=${TRAVIS_REPO_SLUG#*/}
ARCHIVE_NAME=deps_${PLATFORM}_${REPO_NAME}.tar.gz
# Packages.
case ${PLATFORM} in
"native_static")
PACKAGES="gcc cmake libbz2-dev ccache zlib1g-dev uuid-dev libctpp2-dev ctpp2-utils"
;;
"native_dyn")
PACKAGES="gcc cmake libbz2-dev ccache zlib1g-dev uuid-dev libctpp2-dev ctpp2-utils libmicrohttpd-dev"
;;
"win32_static")
PACKAGES="g++-mingw-w64-i686 gcc-mingw-w64-i686 gcc-mingw-w64-base mingw-w64-tools ccache ctpp2-utils"
;;
"win32_dyn")
PACKAGES="g++-mingw-w64-i686 gcc-mingw-w64-i686 gcc-mingw-w64-base mingw-w64-tools ccache ctpp2-utils"
;;
"android_arm")
PACKAGES="gcc cmake ccache ctpp2-utils"
;;
"android_arm64")
PACKAGES="gcc cmake ccache ctpp2-utils"
;;
esac
sudo apt-get update -qq
sudo apt-get install -qq python3-pip ${PACKAGES}
sudo pip3 install meson==0.43.0
ARCHIVE_NAME=deps_${TRAVIS_OS_NAME}_${PLATFORM}_${REPO_NAME}.tar.gz
# Ninja
cd $HOME
git clone git://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
sudo cp ninja /bin
if [[ "$TRAVIS_OS_NAME" == "osx" ]]
then
brew update
brew upgrade python3
pip3 install meson==0.43.0
wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-mac.zip
unzip ninja-mac.zip ninja
else
pip3 install --user meson==0.43.0
wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
unzip ninja-linux.zip ninja
fi
mkdir -p $HOME/bin
cp ninja $HOME/bin
# Dependencies comming from kiwix-build.
cd ${HOME}