opendir() copied up to PATH_MAX wide characters into DIR::entry
and then wrote the terminator one element past the end of the
buffer, leaving a fixed out-of-bounds write on success.
We do not believe this to be a security issue.
Copy at most entry_count - 1 characters, terminate the last
valid element, keep the append length check aligned with that
bound, and free DIR when uncpath() fails.
CLAM-2487
Fix an EGG comment temp-file bug where the write length used the
number of comments instead of the length of the current comment.
Also switch raw write paths in scanners.c to cli_writen() so short
writes are handled correctly for RAR comments, kept EGG comments,
extracted EGG temp files, normalized script output, and UTF-16 HTML
temp output.
CLAM-2946
When onas_ht_insert() adds an element whose hash maps to an existing
bucket, the bucket is already part of the activated-bucket list. The old
code appended that same bucket to the list again, resetting bckt->next to
NULL and making any following buckets unreachable from ht->head.
OnAccessExcludePath walks the activated-bucket list when removing watched
paths. If a collision corrupts that list, some active directories can be
skipped and remain watched.
Track whether onas_ht_insert() allocated a new bucket, and only link the
bucket into ht->head/ht->tail when it is new. Colliding elements are still
inserted into the existing bucket without changing the bucket list.
Malformed HFS+ metadata could overflow arithmetic in several
block and resource offset calculations before the code validated
those values. That left some bounds checks bypassable and could
redirect reads or seeks to the wrong offsets.
Add inline overflow guards before offset math, validate cmpf
resource headers against the extracted resource fork size, and
reject extents or block tables whose computed ranges exceed the
available data.
CLAM-2976
The initial HFS+ cmpf fix stopped calling inflateEnd() on an
uninitialized stream, but the surrounding parser still had
three brittle spots: initialized zlib streams could bypass
cleanup on error paths, cmpf block tables were not bounded
against the resource length, and tmpname cleanup could loop
if unlink failed.
Tighten block-table validation using the cmpf resource length,
route cmpf block failures through per-block zlib cleanup, add
defensive cleanup to the inline decompression path, and make
tmpname unlink failures reportable without recursive cleanup.
CLAM-2976
A crafted HFS+ cmpf resource block can take the uncompressed
path without initializing a zlib stream, but the block
cleanup still calls inflateEnd(). On our repro attempts this
produced a parser-local Z_STREAM_ERROR rather than a process
crash.
Initialize the z_stream defensively, track whether
inflateInit2() succeeded, and only call inflateEnd() for
initialized streams.
Credit: Mizu
CLAM-2976
The HFS+ compressed-file attribute parser validated the attribute name
length as a UTF-16 character count, but later used that same field as a
byte offset by multiplying it by two. A crafted attribute record could
therefore place the inline attribute record header near the end of the
node and trigger an out-of-bounds read when ClamAV copied the record
header or payload.
Fix this by converting the attribute name length to a checked byte count
before using it in offset calculations. Validate that the inline
attribute record header fits in the node before reading it, and verify
that the claimed attribute payload also fits before copying it.
Credit: Sebastián Alba Vives
CLAM-2969
sigtool --testsigs takes two file references, while the file open code references the correct files, the error in the case of one file not being available was incorrect.
Improve the Rust build configuration used by sanitizer and fuzzing workflows.
This change adds three pieces of support:
- require a nightly Rust toolchain whenever `-Zsanitizer` is present in `RUSTFLAGS`, since Rust sanitizer support is unstable
- fail early when `-Zsanitizer=memory` is requested for an unsupported Rust target instead of surfacing a later Cargo ABI mismatch
- rebuild the Rust standard library with `-Zbuild-std` for MemorySanitizer so `core` and the rest of `std` use the same sanitizer ABI as the project crates
The function `generate_key_aes()` required for creating a key for decrypting
velvetsweatshop encrypted files is really slow.
The main problem seems to be creating new openssl contexts in a loop rather than reusing
an existing context. Specifically, cl_sha1() is expensive per call.
Each cl_sha1() goes through cl_hash_data() (crypto.c (line 790)), which
allocates/initializes OpenSSL digest state every call (and on OpenSSL 3 also
creates/fetches provider context). Doing that 50k+ times is very costly.
CLAM-2957
It is possible that invalid UTF-8 characters may trigger a Rust panic
(crash) when parsing CSS style blocks to extract images.
The issue is using `split_at()` instead of `split_at_checked()`.
I also found a few places where I could use string trim methods rather
than doing that logic manually.
Thank you to Krishnap7p for reporting this issue.
CLAM-2819
CLAM-2828
On Windows, when scanning some files with `--leave-temps` enabled and
with `--tempdir` set to something like `C:\temp`, it may fail to create
a new subdirectory to store the temp files because the absolute file
path is too long for the `mkdir()` function.
The `mkdir()` function may fail on Windows if the filepath is longer than
the legacy MAX_PATH.
Fixing this in C or C++ is rather difficult, requiring either a registry
key + application manifest change, or else converting the path to UTF16
and UNC format (i.e. `"\\?\C:\temp"`) to pass to `CreateDirectoryW()`.
The solution in this commit is to use the Rust `std` library instead. It
is able to handle the longer file paths without issue.
CLAM-2924
If pushing a new layer to the recursion stack in the scan context fails,
we need to restore the original recursion level, and undo any changes to
the convenience `ctx->this_level_...` convenience pointers.
This fixes a crash observed when scanning certain files on Windows with
`--leave-temps` enabled and also `--tempdir` set to "C:\temp".
CLAM-2924
Fix issue reading from a pointer which can cause a crash on systems that
have strict pointer alignment requirements.
Thank you to Hsuan-Ming Chen at Synology PSIRT for identify this issue
and proposing this fix.
Freshclam's feature to load-test a newly downloaded database is not using the
CVDCertsDirectory option to determine the certs directory.
The environment variable works, however.
In addition to the environment variable, you can work around the issue by
disabling load testing.
To fix this issue, we need to extend the libfreshclam API adding a
`fc_test_database_ex()` function that takes the certs directory as a
parameter.
Resolves: https://github.com/Cisco-Talos/clamav/issues/1630
CLAM-2930
There is a bug affecting ClamAV 1.5.0 and 1.5.1 where it attemps to
download the .sign file and verify the .sign or legacy RSA digital
signature for CLD files when using the PrivateMirror option.
For context, the PrivateMirror option enables you to download either
CLD or CVD files from a mirror such that you can serve a private mirror
where the signatures are pulled using Freshclam.
Note that it is better to use the CVDUpdate utility which always serves
digitally signed CVD and CDIFF files. The PrivateMirror capability
predates CVDUpdate and is still used by many today.
Resolves https://github.com/Cisco-Talos/clamav/issues/1626
CLAM-2941
The default CVD certs directory on unix systems is `/usr/local/etc/certs`, which
is not unique to ClamAV and may already exist and include other certificates,
some of which may have restricted user permissions inaccessible to the running
ClamAV application. In this scenario, ClamAV should skip over the cert files it
cannot read.
Resolves: https://github.com/Cisco-Talos/clamav/issues/1665
CLAM-2942
Add SELFCHECK command and related configuration options
Document SELFCHECK command in clamd.conf.sample.
Document EnableSelfCheckCommand in clamd.conf.
Implement robust SELFCHECK test with retries.
Added robust SELFCHECK test to handle RELOADING state.
The regex parsing code treated anything that did not return CL_SUCCESS
as CL_EMEM in several places, resulting in a return code indicative of a
malformed database returning a memory error instead. This change passes
the real return code up the chain, and also prevents trying to realloc 0
bytes when we don't have anything to realloc due to a malformed
database.
CLAM-2191
More PDF statistics were requested for feature parity.
If metadata collection is enabled, the following additional PDF
statistics will be collected:
- Number of Automatic Actions
- Number of Streams
- Number of Objects
- Number of Object Streams
- Number of Trailers
- Number of URIs
- Number of Xrefs
Additionally, some of the parsing logic was fixed during testing of
these features.
CLAM-2820
Add support for Y-type signatures in wdb files.
There are some cases where it is desired to allow a single domain to
have any displayed address and not count that as phishing. An example of
this would be the domain for outlook URL checker, or Google safe
browsing. If a wdb file contains a Y type entry only the real domain
will be matched, not the real and displayed domain.
CLAM-2426
The MyDoom heuristic has been causing too many false positives.
Since we already have MyDoom coverage through signature detection, the
hard-coded heuristic that causes many false positives is no longer
needed. This commit removes the hard-coded heuristic.
CLAM-2766
Don't concatenate pythonpath and str
Fix test program path for Windows Ninja Multi-Config
Use environment variable to pass cl_cvdunpack_test location to test
Use ENABLE_EXAMPLES instead of ENABLE_TESTS to gate ex_cl_cvdunpack
Directly use the target's path on Windows
Also change the program finding for ex_scan_callbacks_test
Fix python paths handling
Fix the order of some expected results which were incorrect and for
some reason only failed when tested on Windows.
This final fix in this commit provided by Val Snyder.
The prescan and file_inspection callbacks are deprecated and should not be used.
They are replaced by a more unified scan callback system which is demonstrated in
ex_scan_callbacks.c.
In inflate64.c on Windows, as zlib.h is included,
it sets ZEXTERN to " __declspec(dllimport)". In consequence,
our internal functions have the wrong linkage and linking fails
because of the missing dll functions.
Build on windows-2022, as windows-2019 is no longer available
Use updated vcpkg github action
Try to use vcpkg manifest mode
Provide vcpkg baseline commit hash
Correctly request dependency feature
Fix printing vcpkg triplet
Try building with CMakePresets.json
The install BINDIR, SBINDIR, LIBDIR, and INCLUDEDIR may be set in
the top level CMakeLists.txt and does not need to be set for each
target. This also enables a build to customize those so that the
bins COULD go to the "bin" directory.
Also add CMake option "ENABLE_WINDOWS_INSTALL_THIRDPARTY_DEPENDENCIES".
- When enabled (default), the install will include required system runtime
libraries and also a copy of the other third party library dependencies
(e.g. libpcre2, libxml2, libcrypto, etc.).
- When disabled, those files are not included.
One situation where this is useful is if vcpkg is being used to build
ClamAV itself (including its dependencies).
If the current layer has a file descriptor, ClamAV is passing the path
for that file to the UnRAR module, even if the RAR we want to scan is
just some small embedded bit (e.g. detected by RARSFX signature).
We need to drop the RAR portion to a new file for the UnRAR module
because it does not accept file buffers to be scanned, only file paths.
CLAM-2900
Note this commit also changes `scanners.c` to use `access()` on Windows
instead of `_access_s()`. ClamAV defines `access()` to map to a custom
`access_w32()` function on Windows. We already use it everywhere else.
Large range testing identified some files where image fuzzy hashing
produces different hashes with ClamAV 1.5 vs 1.4.
With my investigation, I found the issue is with changes in Rust library
dependencies, though it actually wasn't any change with the 'image' or
'jpeg-decoder' crates. After running a simple `cargo update` to update
all non-pinned versions.
I confirmed that this does not affect the minimum supported Rust version
(MSRV).
CLAM-2899
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.
For example, imagine a PE with four concatenated DLL's, like so:
```
[ EXE file | DLL #1 | DLL #2 | DLL #3 | DLL #4 ]
```
And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.
Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
embedded MSEXE #1: false positive,
embedded MSEXE #2: false positive,
embedded MSEXE #3: false positive,
embedded MSEXE #4: DLL #1: {
embedded MSEXE #1: false positive,
embedded MSEXE #2: DLL #2: {
embedded MSEXE #1: DLL #3: {
embedded MSEXE #1: false positive,
embedded MSEXE #2: false positive,
embedded MSEXE #3: false positive,
embedded MSEXE #4: false positive,
embedded MSEXE #5: DLL #4
}
embedded MSEXE #2: false positive,
embedded MSEXE #3: false positive,
embedded MSEXE #4: false positive,
embedded MSEXE #5: false positive,
embedded MSEXE #6: DLL #4
}
embedded MSEXE #3: DLL #3,
embedded MSEXE #4: false positive,
embedded MSEXE #5: false positive,
embedded MSEXE #6: false positive,
embedded MSEXE #7: false positive,
embedded MSEXE #8: DLL #4
}
}
```
This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
embedded MSEXE #1: false positive,
embedded MSEXE #2: false positive,
embedded MSEXE #3: false positive,
embedded MSEXE #4: DLL #1,
embedded MSEXE #5: false positive,
embedded MSEXE #6: DLL #2,
embedded MSEXE #7: DLL #3,
embedded MSEXE #8: false positive,
embedded MSEXE #9: false positive,
embedded MSEXE #10: false positive,
embedded MSEXE #11: false positive,
embedded MSEXE #12: DLL #4
}
```
That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.
My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.
CLAM-2897