Commit Graph

226 Commits

Author SHA1 Message Date
Micah Snyder
db013a2bfd libclamav: Fix scan recursion tracking
Scan recursion is the process of identifying files embedded in other
files and then scanning them, recursively.

Internally this process is more complex than it may sound because a file
may have multiple layers of types before finding a new "file".

At present we treat the recursion count in the scanning context as an
index into both our fmap list AND our container list. These two lists
are conceptually a part of the same thing and should be unified.

But what's concerning is that the "recursion level" isn't actually
incremented or decremented at the same time that we add a layer to the
fmap or container lists but instead is more touchy-feely, increasing
when we find a new "file".

To account for this shadiness, the size of the fmap and container lists
has always been a little longer than our "max scan recursion" limit so
we don't accidentally overflow the fmap or container arrays (!).

I've implemented a single recursion-stack as an array, similar to before,
which includes a pointer to each fmap at each layer, along with the size
and type. Push and pop functions add and remove layers whenever a new
fmap is added. A boolean argument when pushing indicates if the new layer
represents a new buffer or new file (descriptor). A new buffer will reset
the "nested fmap level" (described below).

This commit also provides a solution for an issue where we detect
embedded files more than once during scan recursion.

For illustration, imagine a tarball named foo.tar.gz with this structure:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |

But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| baz.exe                   | PE    | 0         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| │   └── hello.txt         | ASCII | 2         | 0                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |
|     └── world.txt         | ASCII | 2         | 0                 |

(A) If we scan for embedded files at any layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| ├── foo.tar               | TAR   | 1         | 0                 |
| │   ├── bar.zip           | ZIP   | 2         | 1                 |
| │   │   └── hola.txt      | ASCII | 3         | 0                 |
| │   ├── baz.exe           | PE    | 2         | 1                 |
| │   │   ├── sfx.zip       | ZIP   | 3         | 1                 |
| │   │   │   └── hello.txt | ASCII | 4         | 0                 |
| │   │   └── sfx.7z        | 7Z    | 3         | 1                 |
| │   │       └── world.txt | ASCII | 4         | 0                 |
| │   ├── sfx.zip           | ZIP   | 2         | 1                 |
| │   │   └── hello.txt     | ASCII | 3         | 0                 |
| │   └── sfx.7z            | 7Z    | 2         | 1                 |
| │       └── world.txt     | ASCII | 3         | 0                 |
| ├── sfx.zip               | ZIP   | 1         | 1                 |
| └── sfx.7z                | 7Z    | 1         | 1                 |

(A) is bad because it scans content more than once.

Note that for the GZ layer, it may detect the ZIP and 7Z if the
signature hits on the compressed data, which it might, though
extracting the ZIP and 7Z will likely fail.

The reason the above doesn't happen now is that we restrict embedded
type scans for a bunch of archive formats to include GZ and TAR.

(B) If we scan for embedded files at the foo.tar layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     ├── baz.exe           | PE    | 2         | 1                 |
|     ├── sfx.zip           | ZIP   | 2         | 1                 |
|     │   └── hello.txt     | ASCII | 3         | 0                 |
|     └── sfx.7z            | 7Z    | 2         | 1                 |
|         └── world.txt     | ASCII | 3         | 0                 |

(B) is almost right. But we can achieve it easily enough only scanning for
embedded content in the current fmap when the "nested fmap level" is 0.
The upside is that it should safely detect all embedded content, even if
it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe.

The biggest risk I can think of affects ZIPs. SFXZIP detection
is identical to ZIP detection, which is why we don't allow SFXZIP to be
detected if insize of a ZIP. If we only allow embedded type scanning at
fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP
if the bar.exe was not compressed in foo.zip and if non-compressed files
extracted from ZIPs aren't extracted as new buffers:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.zip                   | ZIP   | 0         | 0                 |
| └── bar.exe               | PE    | 1         | 1                 |
|     └── sfx.zip           | ZIP   | 2         | 2                 |

Provided that we ensure all files extracted from zips are scanned in
new buffers, option (B) should be safe.

(C) If we scan for embedded files at the baz.exe layer, we may detect:
| description               | type  | rec level | nested fmap level |
| ------------------------- | ----- | --------- | ----------------- |
| foo.tar.gz                | GZ    | 0         | 0                 |
| └── foo.tar               | TAR   | 1         | 0                 |
|     ├── bar.zip           | ZIP   | 2         | 1                 |
|     │   └── hola.txt      | ASCII | 3         | 0                 |
|     └── baz.exe           | PE    | 2         | 1                 |
|         ├── sfx.zip       | ZIP   | 3         | 1                 |
|         │   └── hello.txt | ASCII | 4         | 0                 |
|         └── sfx.7z        | 7Z    | 3         | 1                 |
|             └── world.txt | ASCII | 4         | 0                 |

(C) is right. But it's harder to achieve. For this example we can get it by
restricting 7ZSFX and ZIPSFX detection only when scanning an executable.
But that may mean losing detection of archives embedded elsewhere.
And we'd have to identify allowable container types for each possible
embedded type, which would be very difficult.

So this commit aims to solve the issue the (B)-way.

Note that in all situations, we still have to scan with file typing
enabled to determine if we need to reassign the current file type, such
as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2-
compressed. Detection of DMG and a handful of other types rely on
finding data partway through or near the ned of a file before
reassigning the entire file as the new type.

Other fixes and considerations in this commit:

- The utf16 HTML parser has weak error handling, particularly with respect
  to creating a nested fmap for scanning the ascii decoded file.
  This commit cleans up the error handling and wraps the nested scan with
  the recursion-stack push()/pop() for correct recursion tracking.

  Before this commit, each container layer had a flag to indicate if the
  container layer is valid.
  We need something similar so that the cli_recursion_stack_get_*()
  functions ignore normalized layers. Details...

  Imagine an LDB signature for HTML content that specifies a ZIP
  container. If the signature actually alerts on the normalized HTML and
  you don't ignore normalized layers for the container check, it will
  appear as though the alert is in an HTML container rather than a ZIP
  container.

  This commit accomplishes this with a boolean you set in the scan context
  before scanning a new layer. Then when the new fmap is created, it will
  use that flag to set similar flag for the layer. The context flag is
  reset those that anything after this doesn't have that flag.
  The flag allows the new recursion_stack_get() function to ignore
  normalized layers when iterating the stack to return a layer at a
  requested index, negative or positive.

  Scanning normalized extracted/normalized javascript and VBA should also
  use the 'layer is normalized' flag.

- This commit also fixes Heuristic.Broken.Executable alert for ELF files
  to make sure that:

  A) these only alert if cli_append_virus() returns CL_VIRUS (aka it
  respects the FP check).

  B) all broken-executable alerts for ELF only happen if the
  SCAN_HEURISTIC_BROKEN option is enabled.

- This commit also cleans up the error handling in cli_magic_scan_dir().
  This was needed so we could correctly apply the layer-is-normalized-flag
  to all VBA macros extracted to a directory when scanning the directory.

- Also fix an issue where exceeding scan maximums wouldn't cause embedded
  file detection scans to abort. Granted we don't actually want to abort
  if max filesize or max recursion depth are exceeded... only if max
  scansize, max files, and max scantime are exceeded.

  Add 'abort_scan' flag to scan context, to protect against depending on
  correct error propagation for fatal conditions. Instead, setting this
  flag in the scan context should guarantee that a fatal condition deep in
  scan recursion isn't lost which result in more stuff being scanned
  instead of aborting. This shouldn't be necessary, but some status codes
  like CL_ETIMEOUT never used to be fatal and it's easier to do this than
  to verify every parser only returns CL_ETIMEOUT and other "fatal
  status codes" in fatal conditions.

- Remove duplicate is_tar() prototype from filestypes.c and include
  is_tar.h instead.

- Presently we create the fmap hash when creating the fmap.
  This wastes a bit of CPU if the hash is never needed.
  Now that we're creating fmap's for all embedded files discovered with
  file type recognition scans, this is a much more frequent occurence and
  really slows things down.

  This commit fixes the issue by only creating fmap hashes as needed.
  This should not only resolve the perfomance impact of creating fmap's
  for all embedded files, but also should improve performance in general.

- Add allmatch check to the zip parser after the central-header meta
  match. That way we don't multiple alerts with the same match except in
  allmatch mode. Clean up error handling in the zip parser a tiny bit.

- Fixes to ensure that the scan limits such as scansize, filesize,
  recursion depth, # of embedded files, and scantime are always reported
  if AlertExceedsMax (--alert-exceeds-max) is enabled.

- Fixed an issue where non-fatal alerts for exceeding scan maximums may
  mask signature matches later on. I changed it so these alerts use the
  "possibly unwanted" alert-type and thus only alert if no other alerts
  were found or if all-match or heuristic-precedence are enabled.

- Added the "Heuristics.Limits.Exceeded.*" events to the JSON metadata
  when the --gen-json feature is enabled. These will show up once under
  "ParseErrors" the first time a limit is exceeded. In the present
  implementation, only one limits-exceeded events will be added, so as to
  prevent a malicious or malformed sample from filling the JSON buffer
  with millions of events and using a tonne of RAM.
2021-10-25 16:02:29 -07:00
Micah Snyder
4b400b9b1e Test: Verify that pdf bytecode hooks execute 2021-07-19 14:47:25 -07:00
Micah Snyder
d46832d5cf clamav.net URL update for new docs, github issues
Replace new bugzilla ticket links with links to github issues.
Replace clamav.net/documentation links with docs.clamav.net equivalents.
2021-07-17 15:28:02 -07:00
Micah Snyder (micasnyd)
b9ca6ea103 Update copyright dates for 2021
Also fixes up clang-format.
2021-03-19 15:12:26 -07:00
Micah Snyder
2552cfd0d1 CMake: Add CTest support to match Autotools checks
An ENABLE_TESTS CMake option is provided so that users can disable
testing if they don't want it. Instructions for how to use this
included in the INSTALL.cmake.md file.

If you run `ctest`, each testcase will write out a log file to the
<build>/unit_tests directory.

As with Autotools' make check, the test files are from test/.split
and unit_tests/.split files, but for CMake these are generated at
build time instead of at test time.

On Posix systems, sets the LD_LIBRARY_PATH so that ClamAV-compiled
libraries can be loaded when running tests.

On Windows systems, CTest will identify and collect all library
dependencies and assemble a temporarily install under the
build/unit_tests directory so that the libraries can be loaded when
running tests.

The same feature is used on Windows when using CMake to install to
collect all DLL dependencies so that users don't have to install them
manually afterwards.

Each of the CTest tests are run using a custom wrapper around Python's
unittest framework, which is also responsible for finding and inserting
valgrind into the valgrind tests on Posix systems.

Unlike with Autotools, the CMake CTest Valgrind-tests are enabled by
default, if Valgrind can be found. There's no need to set VG=1.
CTest's memcheck module is NOT supported, because we use Python to
orchestrate our tests.

Added a bunch of Windows compatibility changes to the unit tests.
These were primarily changing / to PATHSEP and making adjustments
to use Win32 C headers and ifdef out the POSIX ones which aren't
available on Windows. Also disabled a bunch of tests on Win32
that don't work on Windows, notably the mmap ones and FD-passing
(i.e. FILEDES) ones.

Add JSON_C_HAVE_INTTYPES_H definition to clamav-config.h to eliminate
warnings on Windows where json.h is included after inttypes.h because
json-c's inttypes replacement relies on it.
This is a it of a hack and may be removed if json-c fixes their
inttypes header stuff in the future.

Add preprocessor definitions on Windows to disable MSVC warnings about
CRT secure and nonstandard functions. While there may be a better
solution, this is needed to be able to see other more serious warnings.

Add missing file comment block and copyright statement for clamsubmit.c.
Also change json-c/json.h include filename to json.h in clamsubmit.c.
The directory name is not required.

Changed the hash table data integer type from long, which is poorly
defined, to size_t -- which is capable of storing a pointer. Fixed a
bunch of casts regarding this variable to eliminate warnings.

Fixed two bugs causing utf8 encoding unit tests to fail on Windows:
- The in_size variable should be the number of bytes, not the character
  count. This was was causing the SHIFT_JIS (japanese codepage) to UTF8
  transcoding test to only transcode half the bytes.
- It turns out that the MultiByteToWideChar() API can't transcode
  UTF16-BE to UTF16-LE. The solution is to just iterate over the buffer
  and flip the bytes on each uint16_t. This but was causing the UTF16-BE
  to UTF8 tests to fail.

I also split up the utf8 transcoding tests into separate tests so I
could see all of the failures instead of just the first one.

Added a flags parameter to the unit test function to open testfiles
because it turns out that on Windows if a file contains the \r\n it will
replace it with just \n if you opened the file as a text file instead of
as binary. However, if we open the CBC files as binary, then a bunch of
bytecode tests fail. So I've changed the tests to open the CBC files in
the bytecode tests as text files and open all other files as binary.

Ported the feature tests from shell scripts to Python using a modified
version of our QA test-framework, which is largely compatible and will
allow us to migrate some QA tests into this repo. I'd like to add GitHub
Actions pipelines in the future so that all public PR's get some testing
before anyone has to manually review them.

The clamd --log option was missing from the help string, though it
definitely works. I've added it in this commit.
It appears that clamd.c was never clang-format'd, so this commit also
reformats clamd.c.

Some of the check_clamd tests expected the path returned by clamd to
match character for character with original path sent to clamd. However,
as we now evaluate real paths before a scan, the path returned by clamd
isn't going to match the relative (and possibly symlink-ridden) path
passed to clamdscan. I fixed this test by changing the test to search
for the basename: <signature> FOUND within the response instead of
matching the exact path.

Autotools: Link check_clamd with libclamav so we can use our utility
functions in check_clamd.c.
2021-02-25 11:41:26 -08:00
Micah Snyder (micasnyd)
9e20cdf6ea Add CMake build tooling
This patch adds experimental-quality CMake build tooling.

The libmspack build required a modification to use "" instead of <> for
header #includes. This will hopefully be included in the libmspack
upstream project when adding CMake build tooling to libmspack.

Removed use of libltdl when using CMake.

Flex & Bison are now required to build.

If -DMAINTAINER_MODE, then GPERF is also required, though it currently
doesn't actually do anything.  TODO!

I found that the autotools build system was generating the lexer output
but not actually compiling it, instead using previously generated (and
manually renamed) lexer c source. As a consequence, changes to the .l
and .y files weren't making it into the build. To resolve this, I
removed generated flex/bison files and fixed the tooling to use the
freshly generated files. Flex and bison are now required build tools.
On Windows, this adds a dependency on the winflexbison package,
which can be obtained using Chocolatey or may be manually installed.

CMake tooling only has partial support for building with external LLVM
library, and no support for the internal LLVM (to be removed in the
future). I.e. The CMake build currently only supports the bytecode
interpreter.

Many files used include paths relative to the top source directory or
relative to the current project, rather than relative to each build
target. Modern CMake support requires including internal dependency
headers the same way you would external dependency headers (albeit
with "" instead of <>). This meant correcting all header includes to
be relative to the build targets and not relative to the workspace.

For example, ...

```c
include "../libclamav/clamav.h"
include "clamd/clamd_others.h"
```

... becomes:

```c
// libclamav
include "clamav.h"

// clamd
include "clamd_others.h"
```

Fixes header name conflicts by renaming a few of the files.

Converted the "shared" code into a static library, which depends on
libclamav. The ironically named "shared" static library provides
features common to the ClamAV apps which are not required in
libclamav itself and are not intended for use by downstream projects.
This change was required for correct modern CMake practices but was
also required to use the automake "subdir-objects" option.
This eliminates warnings when running autoreconf which, in the next
version of autoconf & automake are likely to break the build.

libclamav used to build in multiple stages where an earlier stage is
a static library containing utils required by the "shared" code.
Linking clamdscan and clamdtop with this libclamav utils static lib
allowed these two apps to function without libclamav. While this is
nice in theory, the practical gains are minimal and it complicates
the build system. As such, the autotools and CMake tooling was
simplified for improved maintainability and this feature was thrown
out. clamdtop and clamdscan now require libclamav to function.

Removed the nopthreads version of the autotools
libclamav_internal_utils static library and added pthread linking to
a couple apps that may have issues building on some platforms without
it, with the intention of removing needless complexity from the
source. Kept the regular version of libclamav_internal_utils.la
though it is no longer used anywhere but in libclamav.

Added an experimental doxygen build option which attempts to build
clamav.h and libfreshclam doxygen html docs.

The CMake build tooling also may build the example program(s), which
isn't a feature in the Autotools build system.

Changed C standard to C90+ due to inline linking issues with socket.h
when linking libfreshclam.so on Linux.

Generate common.rc for win32.

Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef
from misc.c.

Add CMake files to the automake dist, so users can try the new
CMake tooling w/out having to build from a git clone.

clamonacc changes:
- Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other
  similar macros.
- Added a new clamav-clamonacc.service systemd unit file, based on
  the work of ChadDevOps & Aaron Brighton.
- Added missing clamonacc man page.

Updates to clamdscan man page, add missing options.

Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use
libclamav).

Rename Windows mspack.dll to libmspack.dll so all ClamAV-built
libraries have the lib-prefix with Visual Studio as with CMake.
2020-08-13 00:25:34 -07:00
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Jonas Zaddach (jzaddach)
cd977727f0 Add LZMA & BZip2 decompression to bytecode API
Adds LZMA and BZip2 decompression routines to the bytecode API.
The ability to decompress LZMA and BZip2 streams is particularly
useful for bytecode signatures that extend clamav executable
unpacking capabilities.

Of note, the LZMA format is not well standardized. This API
expects the stream to start with the LZMA_Alone header.

Also fixed a bug in LZMA dictionary size setting.
2020-04-29 09:26:07 -07:00
Micah Snyder
50455664a7 libclamav: Fix fmap leak in bytecode runtime
Fixes an fmap leak in the bytecode switch_input() API.  The
switch_input() API provides a way to read from an extracted file instead
of reading from the current file.  The issue is that the current
implementation fails to free the fmap created to read from the extracted
file on cleanup or when switching back to the original fmap.  In
addition, it fails to use the cli_bytecode_context_setfile() function
to restore the file_size in the context for the current fmap.

Fixes a couple fmap leaks in the unit tests.
2020-04-20 11:26:43 -07:00
Jonas Zaddach (jzaddach)
a627d09689 Correct disassembling of bytecode ICMP_ULT instruction 2020-02-12 19:59:47 -08:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
97a0647e88 Additional variable type changes for correctness and to silence warnings. A handful of other minor changes to silence warnings. Corrected a number of function definitions so they return cl_error_t rather than int. 2019-10-02 16:08:25 -04:00
Andrew
4de072327a Rename MAX_BC to MAX_TRACKED_BC for consistency 2019-10-02 16:08:23 -04:00
Mickey Sola
1b5a59c416 bytecode - J867 - fix memory leak that occurs withn the bytecode interpreter while libjson is enabled 2019-10-02 16:08:21 -04:00
Jonas Zaddach
c84683f2f4 Mach-O bytecode unpackers 2019-10-02 16:08:21 -04:00
Jonas Zaddach
2b776e4b89 Linux bytecode unpackers 2019-10-02 16:08:21 -04:00
Andrew
df8dfda9cd Address code-review comments, fix several memleaks
Changes include:
 - Fixing several memory leaks noticed when running with ASan
 - Adds documentation for several functions and structs
 - Simplifies the interface for using cli_targetinfo_init/destroy
   and cli_exe_info_init/destroy
 - A few other minor changes
2019-10-02 16:08:20 -04:00
Andrew
7ba310e605 PE parsing code improvements, db loading bug fixes
Consolidate the PE parsing code into one function.  I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below).  If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later.  Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.

I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later

PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements

Also:
 - Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS

    When doing performance testing with the latest CVD, MAX_BC and
    MAX_TRACKED_PCRE need to be raised to track all the events.
    Allow these to be specified via CFLAGS by not redefining them
    if they are already defined

- Fix an issue preventing wildcard sizes in .MDB/.MSB rules

    I'm not sure what the original intent of the check I removed was,
    but it prevents using wildcard sizes in .MDB/.MSB rules.  AFAICT
    these wildcard sizes should be handled appropriately by the MD5
    section hash computation code, so I don't think a check on that
    is needed.

- Fix several issues related to db loading
     - .imp files will now get loaded if they exist in a directory passed
       via clamscan's '-d' flag
     - .pwdb files will now get loaded if they exist in a directory passed
       via clamscan's '-d' flag even when compiling without yara support
     - Changes to .imp, .ign, and .ign2 files will now be reflected in calls
       to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
       compiling without yara support)
     - The contents of .sfp files won't be included in some of the signature
       counts, and the contents of .cud files will be
     - Any local.gdb files will no longer be loaded twice

- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
2019-10-02 16:08:20 -04:00
Jonas Zaddach
d1f7ff12a3 Prettify printing of bytecode arguments 2019-10-02 16:08:19 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder
964a1e7321 Converting http urls to https urls. Primary focus was on clamav.net urls. I updated a couple others and fixes a few broken links as well. There are many (non-clamav.net) urls I didn't address, especially in 3rd party or contrib code. 2018-04-02 07:58:33 -04:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Micah Snyder
c9a070c9d3 More cleanup re: variables possibly used before initialized. 2018-02-08 16:00:24 -05:00
Micah Snyder
653b471b5b eliminating format-string related warnings that appear on ubuntu 16.04 x64. 2017-10-11 15:03:33 -04:00
Micah Snyder
7e64560ce5 eliminating warnings that cropped up in 32bit ubuntu (16.04) 2017-08-31 11:00:34 -04:00
Micah Snyder
d18d72219f Eliminating warnings, converting iterator variables to size_t when used to compare against sizeof(). added a couple of missing #includes. 2017-08-11 16:01:50 -04:00
Mickey Sola
60aad52faf bc - adding bc_idx sanity check when running bc lsigs 2017-06-21 15:53:14 -04:00
Steven Morgan
d32e039654 fix cli_bcapi_extract_new() return code path virus reporting. 2017-06-20 17:15:15 -04:00
Steven Morgan
cbf5017a7d bb11805 fix multiple results. Refactor false positive and heuristic precedence logic. 2017-04-18 12:07:06 -04:00
Steven Morgan
fd43d6103c bb11742 - fix compiler warnings. Patch contributed by Ruga. 2017-02-10 12:53:24 -05:00
Steven Morgan
22cb38ed24 pull request #53(2/4): Spelling fix by klemens(ka7). 2016-10-19 15:57:45 -04:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Steven Morgan
a80453e6e9 Merge master to features/yara. 2015-05-01 18:36:48 -04:00
Kevin Lin
fe54f710fc clambc info option updated for new hook type 2015-03-03 16:12:22 -05:00
Kevin Lin
47c2d618cd added BC_PRECLASS hook support; replaces target type 13 2015-03-03 15:00:55 -05:00
Kevin Lin
c648e6b490 Merge branch 'master' into klin/pcre_support
Conflicts:
	clamconf/clamconf.c
	clamscan/manager.c
	docs/signatures.tex
	sigtool/sigtool.c
2014-10-31 11:10:41 -04:00
Kevin Lin
90379a9e98 fixed formatting for short names in perf tracking 2014-10-22 17:56:49 -04:00
Kevin Lin
5c2c723361 added pcre execution time and match performance tracking
fixed an issue with statistics reporting with no signatures loaded
2014-09-16 15:56:56 -04:00
Kevin Lin
99e22630f4 opts: converted bytecode-statistics to generic statistics w/ strarg 2014-09-16 15:26:19 -04:00
Kevin Lin
032ec2192e fixed issue in bytecode statistics avg time reporting 2014-09-09 17:48:35 -04:00
Kevin Lin
7c9c4fab22 bytecode: various changes from code review 2014-08-26 17:02:41 -04:00
David Raynor
0b28c74878 Assign the right type in cli_bytetype_helper 2014-08-26 11:03:17 -04:00
Kevin Lin
5b5be2a65d win32: fixed additional OS specific build issues
bc2llvm: removed redundant macro causing issues in win32
2014-07-25 20:10:23 -04:00
Kevin Lin
0ff13b3138 clambc: added diagnostic tools for bytecode IR
clambc: added option to print bytecode IR
TODO: add diagnostic functions to win32 project

Conflicts:

	shared/optparser.c
2014-07-25 12:06:13 -04:00
Shawn Webb
cd94be7a52 Silence a bunch of compiler warnings in libclamav 2014-07-10 18:11:49 -04:00
Shawn Webb
60d8d2c352 Move all the crypto API to clamav.h 2014-07-01 19:38:01 -04:00