Use url fragment schema for deep link URIs, borrowing from URL/PDF
schemas. E.g file:///path/to/file.txt#line=<line_no>&#page=<page_no>
Compute line number during (recursive) org-mode entry chunking.
Thoroughly test line number in URI maps to line number of chunk in
actual org mode file.
This deeplink URI with line number is passed to llm as context to
better combine with line range based view file tool.
Grep tool already passed matching line number. This change passes
line number in URIs of org entries matched by the semantic search tool
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
- Text before headings was not being indexed due to buggy orgnode
parsing logic
- Resolved indexing intro text from files with and without headings in
them
- Ensure intro text node has heading set to all title lines collected
from the file
Resolves#165
- Why
The khoj pypi packages should be installed in `khoj' directory.
Previously it was being installed into `src' directory, which is a
generic top level directory name that is discouraged from being used
- Changes
- move src/* to src/khoj/*
- update `setup.py' to `find_packages' in `src' instead of project root
- rename imports to form `from khoj.*' in complete project
- update `constants.web_directory' path to use `khoj' directory
- rename root logger to `khoj' in `main.py'
- fix image_search tests to use the newly rename `khoj' logger
- update config, docs, workflows to reference new path `src/khoj'
- Issue
- Indent regex was previously catching escape sequences like newlines
- This was resulting in entries with only escape sequences in body to
be prepended to property drawers etc during rendering
- Fix
- Update indent regex to only look for spaces in each line
- Only render body when body contains non-escape characters
- Create test to prevent this regression from silently resurfacing
- Having Tags as sets was returning them in a different order
everytime
- This resulted in spuriously identifying existing entries as new
because their tags ordering changed
- Converting tags to list fixes the issue and identifies updated new
entries for incremental update correctly
- This is still clunky but it should be commitable
- General enough that it'll work even when a users notes are not in the home directory
- While solving for the special case where:
- Notes are being processed on a different machine and used on a different machine
- But the notes directory is in the same location relative to home on both the machines