mirror of
https://github.com/FreshRSS/FreshRSS.git
synced 2026-01-23 04:38:01 -05:00
Add option to sort results by received date (existing, default), publication date, title, URL (link), random. fix https://github.com/FreshRSS/FreshRSS/issues/1771 fix https://github.com/FreshRSS/FreshRSS/issues/2083 fix https://github.com/FreshRSS/FreshRSS/issues/2119 fix https://github.com/FreshRSS/FreshRSS/issues/2596 fix https://github.com/FreshRSS/FreshRSS/issues/3204 fix https://github.com/FreshRSS/FreshRSS/issues/4405 fix https://github.com/FreshRSS/FreshRSS/issues/5529 fix https://github.com/FreshRSS/FreshRSS/issues/5864 fix https://github.com/FreshRSS/Extensions/issues/161 URL parameters: * `&sort=id` (current behaviour, sorting according to newest received articles) * `&sort=date` (publication date, which is not indicative of how new an article is) * `&sort=title` * `&sort=link` * `&sort=rand` (random order - which disables infinite scrolling, at least for now) combined with `&order=ASC` or `&order=DESC`  ## Implementation notes The sorting criteria by *received date* (id), which is the default, and which was the only one before this PR, is the one that has the best sorting characteristics: * *uniqueness*: no entries have the exact same received date * *monotonicity*: new entries always have a higher received date * *performance*: this field is efficiently indexed in database for fast usage, including for paging (indexing could also be done to other fields, but with lower effective performance) In contrary, sorting criteria such as by *publication date*, by *title*, or by *link* are neither unique nor monotonic. In particular, multiple articles may share the same *publication date*, and we may receive articles with a *publication date* far in the future, and then later some new articles with a *publication date* far in the past. To understand why sorting by *publication date* is problematic, it helps to think about sorting by *title* or by *link*, as sorting by *title* and by *publication date* share more or less the same characteristics. ### Problem 1: new articles New articles may be received in the background after what is shown on screen, and before the next user action such as *mark all as read*. Due to the lack of *monotonicity* when sorting by e.g. *publication date* or *title*, users risk marking as read a batch of articles containing some fresh articles without seeing them. Mitigation: A parameter `idMax` tracks the maximum ID related to a batch of actions such as *mark all as read* to exclude articles received after those that are displayed. ### Problem 2: paging / pagination When navigating articles, only a few articles are displayed, and a new "page" of articles needs to be received from the database when scrolling down or when clicking the button to show more articles. When sorting by e.g. *publication date* or *title*, it is not trivial to show the next page without re-showing some of the same articles, and without skipping any. Indeed, views are often with additional criteria such as showing only unread articles, and users may mark some articles as read while viewing them, hereby removing some articles from the previous pages. And like for *Problem 1*, new articles may have been received in the background. Consequently, it is not possible to use `OFFSET` to implement pagination (so the patches suggested by a few users were wrong due to that, in particular). Mitigation: `idMax` is also used (just like for *Problem 1*) and a *Keyset Pagination* approach is used, combining an unstable sorting criterion such as *publication date* or *title*, together with *id* to ensure stable sorting. (So, 2 sorting criteria + 1 filter criteria) See e.g. https://www.alwaysdeveloping.net/dailydrop/2022/07/01-keyset-pagination/ ### Problem 3: performance Sorting by anything else than *received date* (id) is doomed to be slow(er) due to the combination of 3 criteria (see *Problem 2*). An `OFFSET` approach (which is not possible anyway as explained) would be even slower. Furthermore, we have no SQL index at the moment, but they would not necessarily help much due to the multiple sorting criteria needed and involving some `OR` logic which is difficult to optimise for databases. The nicest syntax would be using tuples and corresponding indexes, but that is poorly supported by MySQL https://bugs.mysql.com/bug.php?id=104128 Mitigation: a compatibility SQL syntax is used to implement *Keyset Pagination* ### Problem 4: user confusion Several users have shown that they do not fully understand the difference between *received date* and *publication date*, and particularly not the pitfalls of *publication date*. Mitigation: the menus to mark-as-read *before 1 day* and *before 1 week* are disabled when sorting by anything else than *received date*. Likewise, the separation headers *Today* and *Yesterday* and *Before yesterday* are only shown when sorting by *received date*. Again here, to better understand why, it helps to think about sorting by *title* or by *link*, as sorting by *title* and by *publication date* share more or less the same characteristics. * [ ] We should write a Q&A and/or documentation about the problems associated to *sorting by publication date*: risks of not noticing new publication, of inadvertently marking them as read, of having some articles with a date in the future hanging at the top of the views (vice versa when sorting in ascending order), performance, etc. ### Problem 5: APIs Sorting by anything else than *received date* breaks the guarantees needed for a successful synchronisation via API. Mitigation: sorting by *received date* is ensured for all API calls.
103 lines
3.0 KiB
PHP
103 lines
3.0 KiB
PHP
<?php
|
|
declare(strict_types=1);
|
|
|
|
class FreshRSS_EntryDAOPGSQL extends FreshRSS_EntryDAOSQLite {
|
|
|
|
#[\Override]
|
|
public static function hasNativeHex(): bool {
|
|
return true;
|
|
}
|
|
|
|
#[\Override]
|
|
public static function sqlHexDecode(string $x): string {
|
|
return 'decode(' . $x . ", 'hex')";
|
|
}
|
|
|
|
#[\Override]
|
|
public static function sqlHexEncode(string $x): string {
|
|
return 'encode(' . $x . ", 'hex')";
|
|
}
|
|
|
|
#[\Override]
|
|
public static function sqlIgnoreConflict(string $sql): string {
|
|
return rtrim($sql, ' ;') . ' ON CONFLICT DO NOTHING';
|
|
}
|
|
|
|
#[\Override]
|
|
public static function sqlRandom(): string {
|
|
return 'RANDOM()';
|
|
}
|
|
|
|
#[\Override]
|
|
protected static function sqlRegex(string $expression, string $regex, array &$values): string {
|
|
$matches = static::regexToSql($regex);
|
|
if (isset($matches['pattern'])) {
|
|
$matchType = $matches['matchType'] ?? '';
|
|
if (str_contains($matchType, 'm')) {
|
|
// newline-sensitive matching
|
|
$matches['pattern'] = '(?m)' . $matches['pattern'];
|
|
}
|
|
$values[] = $matches['pattern'];
|
|
if (str_contains($matchType, 'i')) {
|
|
// case-insensitive matching
|
|
return "{$expression} ~* ?";
|
|
} else {
|
|
// case-sensitive matching
|
|
return "{$expression} ~ ?";
|
|
}
|
|
}
|
|
return '';
|
|
}
|
|
|
|
#[\Override]
|
|
protected function registerSqlFunctions(string $sql): void {
|
|
// Nothing to do for PostgreSQL
|
|
}
|
|
|
|
/** @param array{0:string,1:int,2:string} $errorInfo */
|
|
#[\Override]
|
|
protected function autoUpdateDb(array $errorInfo): bool {
|
|
if (isset($errorInfo[0])) {
|
|
if ($errorInfo[0] === FreshRSS_DatabaseDAO::ER_BAD_FIELD_ERROR || $errorInfo[0] === FreshRSS_DatabaseDAOPGSQL::UNDEFINED_COLUMN) {
|
|
$errorLines = explode("\n", (string)$errorInfo[2], 2); // The relevant column name is on the first line, other lines are noise
|
|
foreach (['attributes'] as $column) {
|
|
if (stripos($errorLines[0], $column) !== false) {
|
|
return $this->addColumn($column);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
return false;
|
|
}
|
|
|
|
#[\Override]
|
|
public function commitNewEntries(): bool {
|
|
//TODO: Update to PostgreSQL 9.5+ syntax with ON CONFLICT DO NOTHING
|
|
$sql = 'DO $$
|
|
DECLARE
|
|
maxrank bigint := (SELECT MAX(id) FROM `_entrytmp`);
|
|
rank bigint := (SELECT maxrank - COUNT(*) FROM `_entrytmp`);
|
|
BEGIN
|
|
INSERT INTO `_entry`
|
|
(id, guid, title, author, content, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags, attributes)
|
|
(SELECT rank + row_number() OVER(ORDER BY date, id) AS id, guid, title, author, content,
|
|
link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags, attributes
|
|
FROM `_entrytmp` AS etmp
|
|
WHERE NOT EXISTS (
|
|
SELECT 1 FROM `_entry` AS ereal
|
|
WHERE (etmp.id = ereal.id) OR (etmp.id_feed = ereal.id_feed AND etmp.guid = ereal.guid))
|
|
ORDER BY date, id);
|
|
DELETE FROM `_entrytmp` WHERE id <= maxrank;
|
|
END $$;';
|
|
$hadTransaction = $this->pdo->inTransaction();
|
|
if (!$hadTransaction) {
|
|
$this->pdo->beginTransaction();
|
|
}
|
|
$result = $this->pdo->exec($sql) !== false;
|
|
if (!$hadTransaction) {
|
|
$this->pdo->commit();
|
|
}
|
|
return $result;
|
|
}
|
|
}
|