Commit Graph

112 Commits

Author SHA1 Message Date
Micah Snyder (micasnyd)
b9ca6ea103 Update copyright dates for 2021
Also fixes up clang-format.
2021-03-19 15:12:26 -07:00
Micah Snyder (micasnyd)
9e20cdf6ea Add CMake build tooling
This patch adds experimental-quality CMake build tooling.

The libmspack build required a modification to use "" instead of <> for
header #includes. This will hopefully be included in the libmspack
upstream project when adding CMake build tooling to libmspack.

Removed use of libltdl when using CMake.

Flex & Bison are now required to build.

If -DMAINTAINER_MODE, then GPERF is also required, though it currently
doesn't actually do anything.  TODO!

I found that the autotools build system was generating the lexer output
but not actually compiling it, instead using previously generated (and
manually renamed) lexer c source. As a consequence, changes to the .l
and .y files weren't making it into the build. To resolve this, I
removed generated flex/bison files and fixed the tooling to use the
freshly generated files. Flex and bison are now required build tools.
On Windows, this adds a dependency on the winflexbison package,
which can be obtained using Chocolatey or may be manually installed.

CMake tooling only has partial support for building with external LLVM
library, and no support for the internal LLVM (to be removed in the
future). I.e. The CMake build currently only supports the bytecode
interpreter.

Many files used include paths relative to the top source directory or
relative to the current project, rather than relative to each build
target. Modern CMake support requires including internal dependency
headers the same way you would external dependency headers (albeit
with "" instead of <>). This meant correcting all header includes to
be relative to the build targets and not relative to the workspace.

For example, ...

```c
include "../libclamav/clamav.h"
include "clamd/clamd_others.h"
```

... becomes:

```c
// libclamav
include "clamav.h"

// clamd
include "clamd_others.h"
```

Fixes header name conflicts by renaming a few of the files.

Converted the "shared" code into a static library, which depends on
libclamav. The ironically named "shared" static library provides
features common to the ClamAV apps which are not required in
libclamav itself and are not intended for use by downstream projects.
This change was required for correct modern CMake practices but was
also required to use the automake "subdir-objects" option.
This eliminates warnings when running autoreconf which, in the next
version of autoconf & automake are likely to break the build.

libclamav used to build in multiple stages where an earlier stage is
a static library containing utils required by the "shared" code.
Linking clamdscan and clamdtop with this libclamav utils static lib
allowed these two apps to function without libclamav. While this is
nice in theory, the practical gains are minimal and it complicates
the build system. As such, the autotools and CMake tooling was
simplified for improved maintainability and this feature was thrown
out. clamdtop and clamdscan now require libclamav to function.

Removed the nopthreads version of the autotools
libclamav_internal_utils static library and added pthread linking to
a couple apps that may have issues building on some platforms without
it, with the intention of removing needless complexity from the
source. Kept the regular version of libclamav_internal_utils.la
though it is no longer used anywhere but in libclamav.

Added an experimental doxygen build option which attempts to build
clamav.h and libfreshclam doxygen html docs.

The CMake build tooling also may build the example program(s), which
isn't a feature in the Autotools build system.

Changed C standard to C90+ due to inline linking issues with socket.h
when linking libfreshclam.so on Linux.

Generate common.rc for win32.

Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef
from misc.c.

Add CMake files to the automake dist, so users can try the new
CMake tooling w/out having to build from a git clone.

clamonacc changes:
- Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other
  similar macros.
- Added a new clamav-clamonacc.service systemd unit file, based on
  the work of ChadDevOps & Aaron Brighton.
- Added missing clamonacc man page.

Updates to clamdscan man page, add missing options.

Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use
libclamav).

Rename Windows mspack.dll to libmspack.dll so all ClamAV-built
libraries have the lib-prefix with Visual Studio as with CMake.
2020-08-13 00:25:34 -07:00
Micah Snyder
11ef77007b Improve tmp sub-directory names
At present many parsers create tmp subdirectories to store extracted
files.  For parsers like the vba parser, this is required as the
directory is later scanned.  For other parsers, these subdirectories are
probably not helpful now that we provide recursive sub-dirs when
--leave-temps is enabled.  It's not quite as simple as removing the extra
subdirectories, however.  Certain parsers, like autoit, don't create very
unique filenames and would result in file name collisions when
--leave-temps is not enabled.

The best thing to do would be to make sure each parser uses unique
filenames and doesn't rely on cli_magic_scan_dir() to scan extracted
content before removing the extra subdirectory.  In the meantime, this
commit gives the extra subdirectories meaningful names to improve
readability.

This commit also:

- Provides the 'bmp' prefix for extracted PE icons.

- Removes empty tmp subdirs when extracting rtf files, to eliminate
  clutter.

- The PDF parser sometimes creates tmp files when decompressing streams
  before it knows if there is actually any content to decompress.  This
  resulted in a large number of empty files.  While it would be best to
  avoid creating empty files in the first place, that's not quite as
  as it sounds.  This commit does the next best thing and deletes the
  tmp files if nothing was actually extracted, even if --leave-temps is
  enabled.

- Removes the "scantemp" prefix for unnamed fmaps scanned with
  cli_magic_scan().  The 5-character hashes given to tmp files with
  prefixes resulted in occasional file name collisions when extracting
  certain file types with thousands of embedded files.

- The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX,
  resulting in truncated file paths and failed extraction  when
  --leave-temps is enabled and a lot of recursion is in play.  This commit
  switches them from NAME_MAX to PATH_MAX.
2020-06-03 11:00:53 -04:00
Micah Snyder
9b9999d778 Rename core scanning functions
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.

| current name              | new name                          |
| ------------------------- | --------------------------------- |
| magic_scandesc()          | cli_magic_scan()                  |
| cli_magic_scandesc_type() | <delete>                          |
| cli_magic_scandesc()      | cli_magic_scan_desc()             |
| cli_base_scandesc()       | cli_magic_scan_desc_type()        |
| cli_partition_scandesc()  | <delete>                          |
| cli_map_scandesc()        | magic_scan_nested_fmap_type()     |
| cli_map_scan()            | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc()        | cli_magic_scan_buff()             |
| cli_scanbuff()            | cli_scan_buff()                   |
| cli_scandesc()            | cli_scan_desc()                   |
| cli_fmap_scandesc()       | cli_scan_fmap()                   |
| cli_scanfile()            | cli_magic_scan_file()             |
| cli_scandir()             | cli_magic_scan_dir()              |
| cli_filetype2()           | cli_determine_fmap_type()         |
| cli_filetype()            | cli_compare_ftm_file()            |
| cli_partitiontype()       | cli_compare_ftm_partition()       |
| cli_scanraw()             | scanraw()                         |
2020-06-03 11:00:40 -04:00
Micah Snyder
005cbf5a37 Record names of extracted files
A way is needed to record scanned file names for two purposes:

1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.

2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.

This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.

To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required.  The zip and gpt parsers required some modification to record
file names.  The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.

Also:

- Added recursive extraction to the tmp directory when the
  --leave-temps option is enabled.  When not enabled, the tmp directory
  structure remains flat so as to prevent the likelihood of exceeding
  MAX_PATH.  The current tmp directory is stored in the scan context.

- Made the cli_scanfile() internal API non-static and added it to
  scanners.h so it would be accessible outside of scanners.c in order to
  remove code duplication within libmspack.c.

- Added function comments to scanners.h and matcher.h

- Converted a TDB-type macros and LSIG-type macros to enums for improved
  type safey.

- Converted more return status variables from `int` to `cl_error_t` for
  improved type safety, and corrected ooxml file typing functions so
  they use `cli_file_t` exclusively rather than mixing types with
  `cl_error_t`.

- Restructured the magic_scandesc() function to use goto's for error
  handling and removed the early_ret_from_magicscan() macro and
  magic_scandesc_cleanup() function.  This makes the code easier to
  read and made it easier to add the recursive tmp directory cleanup to
  magic_scandesc().

- Corrected zip, egg, rar filename extraction issues.

- Removed use of extra sub-directory layer for zip, egg, and rar file
  extraction.  For Zip, this also involved changing the extracted
  filenames to be randomly generated rather than using the "zip.###"
  file name scheme.
2020-06-03 10:39:18 -04:00
Jonas Zaddach (jzaddach)
cd977727f0 Add LZMA & BZip2 decompression to bytecode API
Adds LZMA and BZip2 decompression routines to the bytecode API.
The ability to decompress LZMA and BZip2 streams is particularly
useful for bytecode signatures that extend clamav executable
unpacking capabilities.

Of note, the LZMA format is not well standardized. This API
expects the stream to start with the LZMA_Alone header.

Also fixed a bug in LZMA dictionary size setting.
2020-04-29 09:26:07 -07:00
Micah Snyder
50455664a7 libclamav: Fix fmap leak in bytecode runtime
Fixes an fmap leak in the bytecode switch_input() API.  The
switch_input() API provides a way to read from an extracted file instead
of reading from the current file.  The issue is that the current
implementation fails to free the fmap created to read from the extracted
file on cleanup or when switching back to the original fmap.  In
addition, it fails to use the cli_bytecode_context_setfile() function
to restore the file_size in the context for the current fmap.

Fixes a couple fmap leaks in the unit tests.
2020-04-20 11:26:43 -07:00
Micah Snyder
206dbaefe8 Update copyright dates for 2020 2020-01-03 15:44:07 -05:00
Micah Snyder
53c2cb1b02 Error handling improvements in bytecode api function to alleviate coverity complaints. 2019-10-02 16:08:25 -04:00
Micah Snyder
4524c398f3 Argument and return types for fmap_readn(), cli_writen(), cli_readn() converted to use size_t instead of int. 2019-10-02 16:08:25 -04:00
Micah Snyder
ee40795fe2 Converted mpool calls to macros when USE_MPOOL is defined to clearly differentiate between function and macro behavior. 2019-10-02 16:08:25 -04:00
Micah Snyder
479a9a235a Fixes for issues identified by coverity. 2019-10-02 16:08:19 -04:00
Micah Snyder
52cddcbcfd Updating and cleaning up copyright notices. 2019-10-02 16:08:18 -04:00
Micah Snyder
b3e82e5e61 Replacing libclamav/cltypes.h with clamav-types.h.in, which generates a header clamav-types.h that we install alongside clamav.h. 2019-10-02 16:08:17 -04:00
Micah Snyder
72fd33c8b2 clang-format'd using new .clang-format rules. 2019-10-02 16:08:16 -04:00
Micah Snyder
d39cb6581f Updating libclamunrar from legacy C implementation to modern unrar 5.6.5. API changes and supporting changes included to pass the filepath of the scanned file into libclamav through the cli_ctx structure, required by the unrar library to open archives. The filename argument may be optional for the scandesc scanning variant, but libclamav will make a best effort to identify the filename from the file descriptor if it was not provided. In addition, included the ability to prefix temp file and directory names with file basenames. 2018-12-02 23:06:59 -05:00
Micah Snyder (micasnyd)
f61e92da8f Changing numerous scan options' names, primarily those of heuristic signatature alert options. Original options (command line and clamd) will remain as deprecated & undocumented for a couple releases. Added 2 extra scan options to allow users to differentiate between alerting on encrypted archives vs encrypted documents (bb11911). 2018-12-02 23:06:59 -05:00
Micah Snyder
d7979d4ff7 Restructured scan options flags from a single bitflag field to a structure containing multiple bitflag fields. This also required adding a new function to the bytecode API to get scan options a la carte, and modifying the existing function to hand back scan options in the old/deprecated uint32_t bitflag format. Re-generated bytecode iface header files.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.
2018-12-02 23:06:59 -05:00
Micah Snyder (micasnyd)
89d5207b31 Added new pdf object stream parsing capability. 2018-12-02 23:06:58 -05:00
Josh Soref
7cd9337a70 Spelling Adjustments (#30)
* spelling: accessed

* spelling: alignment

* spelling: amalgamated

* spelling: answers

* spelling: another

* spelling: acquisition

* spelling: apitid

* spelling: ascii

* spelling: appending

* spelling: appropriate

* spelling: arbitrary

* spelling: architecture

* spelling: asynchronous

* spelling: attachments

* spelling: argument

* spelling: authenticode

* spelling: because

* spelling: boundary

* spelling: brackets

* spelling: bytecode

* spelling: calculation

* spelling: cannot

* spelling: changes

* spelling: check

* spelling: children

* spelling: codegen

* spelling: commands

* spelling: container

* spelling: concatenated

* spelling: conditions

* spelling: continuous

* spelling: conversions

* spelling: corresponding

* spelling: corrupted

* spelling: coverity

* spelling: crafting

* spelling: daemon

* spelling: definition

* spelling: delivered

* spelling: delivery

* spelling: delimit

* spelling: dependencies

* spelling: dependency

* spelling: detection

* spelling: determine

* spelling: disconnects

* spelling: distributed

* spelling: documentation

* spelling: downgraded

* spelling: downloading

* spelling: endianness

* spelling: entities

* spelling: especially

* spelling: empty

* spelling: expected

* spelling: explicitly

* spelling: existent

* spelling: finished

* spelling: flexibility

* spelling: flexible

* spelling: freshclam

* spelling: functions

* spelling: guarantee

* spelling: hardened

* spelling: headaches

* spelling: heighten

* spelling: improper

* spelling: increment

* spelling: indefinitely

* spelling: independent

* spelling: inaccessible

* spelling: infrastructure

Conflicts:
	docs/html/node68.html

* spelling: initializing

* spelling: inited

* spelling: instream

* spelling: installed

* spelling: initialization

* spelling: initialize

* spelling: interface

* spelling: intrinsics

* spelling: interpreter

* spelling: introduced

* spelling: invalid

* spelling: latency

* spelling: lawyers

* spelling: libclamav

* spelling: likelihood

* spelling: loop

* spelling: maximum

* spelling: million

* spelling: milliseconds

* spelling: minimum

* spelling: minzhuan

* spelling: multipart

* spelling: misled

* spelling: modifiers

* spelling: notifying

* spelling: objects

* spelling: occurred

* spelling: occurs

* spelling: occurrences

* spelling: optimization

* spelling: original

* spelling: originated

* spelling: output

* spelling: overridden

* spelling: parenthesis

* spelling: partition

* spelling: performance

* spelling: permission

* spelling: phishing

* spelling: portions

* spelling: positives

* spelling: preceded

* spelling: properties

* spelling: protocol

* spelling: protos

* spelling: quarantine

* spelling: recursive

* spelling: referring

* spelling: reorder

* spelling: reset

* spelling: resources

* spelling: resume

* spelling: retrieval

* spelling: rewrite

* spelling: sanity

* spelling: scheduled

* spelling: search

* spelling: section

* spelling: separator

* spelling: separated

* spelling: specify

* spelling: special

* spelling: statement

* spelling: streams

* spelling: succession

* spelling: suggests

* spelling: superfluous

* spelling: suspicious

* spelling: synonym

* spelling: temporarily

* spelling: testfiles

* spelling: transverse

* spelling: turkish

* spelling: typos

* spelling: unable

* spelling: unexpected

* spelling: unexpectedly

* spelling: unfinished

* spelling: unfortunately

* spelling: uninitialized

* spelling: unlocking

* spelling: unnecessary

* spelling: unpack

* spelling: unrecognized

* spelling: unsupported

* spelling: usable

* spelling: wherever

* spelling: wishlist

* spelling: white

* spelling: infrastructure

* spelling: directories

* spelling: overridden

* spelling: permission

* spelling: yesterday

* spelling: initialization

* spelling: intrinsics

* space adjustment for spelling changes

* minor modifications by klin
2018-02-27 22:00:09 -05:00
Steven Morgan
48fef7b8ec 11898 - fix unit test failure with zlib 1.2.9+. Patch provided by Marc Deslauriers. 2017-08-18 16:06:26 -04:00
Steven Morgan
ea4ab2bccc bb11742 fix compile error in bytecode_api.c on Mac OS X. 2017-02-15 14:07:50 -05:00
Mickey Sola
631cb6a005 Fixes and updates to intermediate container sig rules based on code review 2017-02-01 17:33:15 -05:00
klin
031fe00a4d restructure container typing system to use array (#2) 2017-01-19 12:24:46 -05:00
Mickey Sola
46a35abe56 mass update of copyright headers 2015-09-17 13:41:26 -04:00
Shawn Webb
cd94be7a52 Silence a bunch of compiler warnings in libclamav 2014-07-10 18:11:49 -04:00
Shawn Webb
60d8d2c352 Move all the crypto API to clamav.h 2014-07-01 19:38:01 -04:00
Steven Morgan
6c048b8a30 Use json_object_object_get_ex() rather than json_object_object_get(), which is deprecated in json-c 0.10 2014-06-06 14:38:45 -04:00
Kevin Lin
9048572cec bytecode_api: fixed variable assignment issue 2014-06-03 12:43:23 -04:00
Kevin Lin
c6a3b294a9 bytecode: fixed a compiler issue and warnings 2014-06-03 11:47:57 -04:00
Kevin Lin
3107a6c24f bytecode: fixed issue with older versions of g++ 2014-06-03 11:19:01 -04:00
Steven Morgan
51f8cc3c18 More json header includes. 2014-05-23 10:11:32 -04:00
Kevin Lin
546e168bb7 api: added safety checks 2014-05-06 18:18:05 -04:00
Kevin Lin
61e3637d08 bytecode api: added support for querying int and booleans from json properties 2014-05-06 16:15:08 -04:00
Kevin Lin
fa7ae4ccbc bytecode api: updated copyright information
bytecode api: added json properties reading implementation
2014-05-06 16:13:48 -04:00
Shawn Webb
b2e7c931d0 Use OpenSSL for hashing. 2014-02-08 00:31:12 -05:00
Kevin Lin
90c0acc762 formatted a number of bytecode files, converted tabs to spaces 2014-01-16 17:57:40 -05:00
Shawn Webb
9691454612 bb6091 - check lseek() return 2013-02-28 19:32:29 -05:00
David Raynor
4a836f4310 CID #10418 2013-02-13 14:21:37 -05:00
Ryan Pentney
791868e80e I don't always test my code, but when I do... I do it in production. 2013-02-07 11:23:31 -08:00
Steve Morgan
6ad45a2931 add initial allscan/allmatch mode to libclamav, clamd, clamdscan, and clamscan with unit tests 2012-10-18 14:12:58 -07:00
Shawn webb
6a049897d9 BB#5455 2012-07-10 13:17:45 -04:00
Török Edvin
cc4d540831 bb #4324
memcpy() crashes because GCC sees 'struct cli_exe_section*'
and assumes that section is aligned to at least 4 bytes.
But it isn't, so change the parameter to just 'void*'.

(Casting doesn't help, as GCC sees through it).

Also fixes part 1 of bb #3789.
2012-02-29 17:04:16 +02:00
Török Edvin
f304dc688a fmapify: fix const-ness warnings 2012-01-05 14:16:09 +02:00
Török Edvin
3d664817f6 fix recursion level crash (bb #3706).
Thanks to Stephane Chazelas for the analysis.
2011-10-08 12:12:22 +03:00
Török Edvin
acc8bccb89 bb #2307. 2010-10-19 16:23:19 +03:00
Török Edvin
e4fedabef4 Warn about zlib version mismatches (bb #2072).
In libclamav: if zlib version at runtime is older than at compile time, warn.
If they are the same, or newer don't warn.

clamconf warns always on mismatch.

Mismatch can happen if:
 - you build zlib yourself, but as static lib and compiler picks old shared lib
 (but new headers!)
 - you have 2 zlibs installed, and the old one takes precedence

Libclamav doesn't warn about mismatches due to zlib upgrades since this is
normal.
2010-10-18 14:16:43 +03:00
Török Edvin
4116c65d1b Add bytecode API to determine whether running under JIT. 2010-10-18 12:35:39 +03:00
Török Edvin
d7531f2ad2 Fix warnings. 2010-10-18 12:24:11 +03:00
Török Edvin
ae8dc8c2bc Gather bytecode events from bytecode API. 2010-10-18 10:48:18 +03:00