Commit Graph

319 Commits

Author SHA1 Message Date
Michael Barz
b4788ca45a fix: cleaner debounce timer test 2026-05-12 08:46:33 +02:00
Florian Schade
32dd087b59 Merge pull request #2701 from opencloud-eu/typo-in-opensearch-error-message
fix: remove typo in error message
2026-05-06 09:38:21 +02:00
Dominik Schmidt
30d74f74bf Merge pull request #2633 from opencloud-eu/fix/search-preserve-value-case
fix(search): preserve value case for non-lowercased bleve fields
2026-05-05 12:35:54 +02:00
Thomas Schweiger
c9887c17cf fix: remove typo in error message 2026-04-30 13:13:20 +02:00
Jörn Friedrich Dreyer
2d1cc3fb3a stop metrics tickers on context cancel
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2026-04-28 16:18:45 +02:00
André Duffeck
e9e195789d Adapt to changes in reva/cs3apis 2026-04-24 14:49:11 +02:00
Florian Schade
d0e3f14539 chore: remove loop var references 2026-04-23 17:11:55 +02:00
Florian Schade
288e67cc39 chore: replace interface with any 2026-04-23 09:31:11 +02:00
Dominik Schmidt
365bd94418 refactor(search): use map[string]struct{} for lowercaseFields set
Bring the membership lookup in line with the existing repo convention
for set types (see services/thumbnails/pkg/thumbnail/mimetypes.go for
the same pattern). Storing struct{} values instead of bool makes the
set semantics explicit and rules out accidental false entries.
2026-04-22 10:01:02 +02:00
Dominik Schmidt
87b1f6f630 test(search): cover audio.artist instead of Title for case preservation
The FIXME pointed at #2632 (dotted keys in KQL property restrictions),
which is now merged. Use audio.artist — the originally intended target
field for this regression — so the test matches its name: a nested
string field that is not on the lowercase allowlist.
2026-04-22 09:51:37 +02:00
Dominik Schmidt
796e5fd373 fix(search): tighten lowercaseFields comment
Copilot review pointed out that the comment claimed pre-lowercasing
makes non-analyzed query types (wildcard, fuzzy) match for every
allowlisted field. That is true for Name/Tags/Favorites, whose
lowercaseKeyword analyzer emits a single lowercased token, but the
Content analyzer also stems terms — so the guarantee doesn't hold
there. Drop the specific claim and keep the comment to the intent:
stay consistent with the field's analyzer.
2026-04-22 09:50:19 +02:00
Dominik Schmidt
538c82787c fix(search): preserve value case for non-lowercased bleve fields
The bleve compiler lowercased every query value (except Hidden)
before handing it to the engine. This matched the index tokens
for fields whose analyzer folds case — Name, Tags, Favorites,
Content — but silently broke matching for every other field,
whose default keyword analyzer preserves case. A query like
Title:"Some Title" parsed fine, lowercased to "some title", and
missed the indexed token "Some Title".

Replace the blanket lowercasing with an allowlist of the four
fields whose index mapping actually uses a lowercasing analyzer.
Every other field now passes through unchanged, which keeps
values like "deadmau5" or "Motörhead" intact instead of
normalising them to a case the tag writer didn't choose.
2026-04-22 09:50:19 +02:00
Dominik Schmidt
15d779cb23 refactor(search): rename forceReindexFlag to forceRescanFlag
Address review feedback: now that the flag is read under its
registered name `force-rescan`, line the local variable up with the
operator-facing vocabulary. The proto field `ForceReindex` is left
untouched so the wire format stays the same.
2026-04-21 15:29:04 +02:00
Dominik Schmidt
e7806445dc fix(search): read --force-rescan flag with its registered name
The `opencloud search index` command registers the flag as
`--force-rescan` (see pflag registration below) but reads it via
`GetBool("force-reindex")`, so the value is always false — passing
`--force-rescan` had no effect and no force rescan was ever triggered.

Read the flag under its registered name.
2026-04-21 15:29:04 +02:00
Dominik Schmidt
2fc33d6e60 refactor(search): round xmpDM:duration to the nearest millisecond
Address review feedback: a straight int64 cast truncates toward zero,
so Tika values that produce results like 1234.999... millisecond would
land at 1234 ms instead of 1235 ms. Round before casting so durations
are as accurate as float64 allows.
2026-04-21 15:16:57 +02:00
Dominik Schmidt
3c59935012 fix(search): parse tika xmpDM:duration as a float
Tika emits xmpDM:duration as seconds in floating-point form (for
example "154.57379150390625"), so strconv.ParseInt rejected every
value and the field was silently dropped — every indexed audio item
ended up without a duration.

Parse the value with strconv.ParseFloat and convert to milliseconds
ourselves. Adjust the existing extractor test to cover the fractional
case.
2026-04-21 15:16:57 +02:00
André Duffeck
bcdfbda08d Fix test 2026-04-14 08:20:28 +02:00
André Duffeck
60bcc6b0f2 Add a flag to the reindex command to force a full reindex
That can be helpful when the search service configuration has changed,
e.g. by enabling TIKA. Previously files that had already been indexed
were not indexed again and thus were no part of the fulltext index.

Fixes #2285
Fixes #2578
2026-04-14 08:20:28 +02:00
André Duffeck
71c0a469b9 Reduce default batch size to prevent memory issues with large documents 2026-03-25 14:27:37 +01:00
André Duffeck
428f69416f Commit batches when the limit is reached while iterating over children 2026-03-25 14:27:37 +01:00
André Duffeck
dea306247b Do not remove stopwords by default
Keeping the stop words leads to slightly bigger indexes but fixes
chopped up highlights of search results and phrase accuracy during
search.
2026-03-25 09:41:23 +01:00
André Duffeck
8a83eea742 Limit the highlighter to two fragments 2026-03-25 08:41:48 +01:00
André Duffeck
4fa5198501 Improve highlight support in osu
Co-authored-by: Florian Schade <f.schade@icloud.com>
2026-03-25 08:41:48 +01:00
André Duffeck
a6dd9b9e18 Use the fast vector highlighter for highlighting search results 2026-03-25 08:41:48 +01:00
André Duffeck
9e93f29ffe Introduce opensearch index v2
The new index allows for faster highlighing uses the fvh highlighter and
searching for favorites.
2026-03-25 08:41:48 +01:00
André Duffeck
10bc14130e Do not send back the full content in the search response 2026-03-23 13:52:09 +01:00
André Duffeck
cd0831aa10 We no longer manage favorites via arbitrary metadata 2026-03-13 09:38:28 +01:00
André Duffeck
d214a7535c Make the search index pick up changes to favorites 2026-03-13 09:38:28 +01:00
André Duffeck
ce5ec1b3dc Add support for favorites to the search service 2026-03-13 09:38:28 +01:00
Christian Richter
f7caf637ce consolidate log config in search
Signed-off-by: Christian Richter <c.richter@opencloud.eu>
2026-01-08 13:18:45 +01:00
Jörn Friedrich Dreyer
d0e51010bf replace more .Value.String() occurences
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-12-15 16:40:27 +01:00
Jörn Friedrich Dreyer
6e75e41023 fix search index flags
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-12-15 16:40:27 +01:00
Christian Richter
7be33b0607 refactor interims DefaultAppCobra to DefaultApp
Signed-off-by: Christian Richter <c.richter@opencloud.eu>
2025-12-15 16:40:26 +01:00
Christian Richter
e7a5788634 migrate search from urfave/cli to spf13/cobra
Signed-off-by: Christian Richter <c.richter@opencloud.eu>
2025-12-15 16:40:26 +01:00
Jörn Friedrich Dreyer
56817b7de7 introduce OC_EVENTS_TLS_INSECURE
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-11-28 11:17:39 +01:00
Jörn Friedrich Dreyer
10913ca00a Merge pull request #1918 from opencloud-eu/otlp-tracing
update otlp tracing
2025-11-27 12:57:26 +01:00
Jörn Friedrich Dreyer
a3ef7f6d79 update otlp tracing
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-11-27 12:28:15 +01:00
fschade
60501659c5 chore: bump %%NEXT%% 2025-11-27 10:53:59 +01:00
Jörn Friedrich Dreyer
538e8141b2 fix opensearch client certificate
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-11-21 12:23:07 +01:00
Jörn Friedrich Dreyer
b49cde429d log error
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-09-15 14:49:13 +02:00
Jörn Friedrich Dreyer
be402a3977 allow configuring insecure search client
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-09-15 14:48:50 +02:00
Jörn Friedrich Dreyer
f54582ddc4 fix event consumers
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-09-15 13:49:41 +02:00
Jörn Friedrich Dreyer
99dee5ae77 allow disabling search grpc/event servers
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
2025-09-15 12:42:56 +02:00
Roman Perekhod
c597dfb917 set default timeouts and clean up 2025-09-12 12:18:47 +02:00
Roman Perekhod
9a3fc08dd4 to separate controll ower the http and grpc driven services 2025-09-12 12:18:47 +02:00
Juan Pablo Villafáñez
9e1b80a1be feat: use runners to startup the services 2025-09-12 12:18:47 +02:00
Jörn Friedrich Dreyer
1a8fc4d336 Merge pull request #1416 from opencloud-eu/nats-connection-names
Nats connection names
2025-09-11 10:33:43 +02:00
Ralf Haferkamp
bcc96f1371 fix: re-generate mocks for search service 2025-09-09 17:04:21 +02:00
Juan Pablo Villafáñez
c0b4a5daa0 chore: change constant name to camelcase 2025-09-08 17:32:36 +02:00
Juan Pablo Villafáñez
ca2dc823ef feat: use names for connections to the nats event bus 2025-09-08 17:32:35 +02:00