The AVX2 get_checksum1_avx2_64() read mul_one before initializing it,
which is undefined behavior. Replace the cmpeq/abs trick with
_mm256_set1_epi8(1) to match the SSSE3 and SSE2 versions.
Add a TEST_SIMD_CHECKSUM1 test mode that verifies all SIMD paths
(SSE2, SSSE3, AVX2, and the full dispatch chain) produce identical
results to the C reference, across multiple buffer sizes with both
aligned and unaligned buffers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Building with clang-16 complains with:
./simd-checksum-x86_64.cpp:204:25: warning: passing 1-byte aligned argument to
16-byte aligned parameter 1 of '_mm_store_si128' may result in an unaligned pointer
access [-Walign-mismatch]
Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
- Make the SIMD ASM code off by default. Use configure --enable-simd-asm
to enable.
- Allow MD5 ASM code to be requested even when OpenSSL is handling MD4
checksums. Use configure --enable-md5-asm to enable.
* x86-64 SIMD build fixes
configure.ac was modified to detect g++ >=5 and clang++ >=7. Additionally
some script malfunctions on FreeBSD were corrected.
The get_checksum1() code has been modified to fix clang and g++ 10
compilation.
This version of the code and configure.ac has been tested on:
Ubuntu 16 - gcc 7.3.0, clang 6.0.0
Debian 10 - gcc 5.4.0, 6.4.0, 7.2.0, 8.4.0, 9.2.1, 10.0.1, clang 5.0.2,
6.0.1, 7.0.1, 8.0.0, 9.0.0, 10.0.0
ArchLinux 20200605 - gcc 10.1.0, clang 10.0.0
FreeBSD 12.1 - gcc 9.3.0, clang 8.0.1
It is unknown if it will work on gcc 5.0-5.3, but the script currently
allows it.
Additionally restructures build switches and defines from SSE2 to SIMD,
to allow potential reuse should patches become available with SIMD
instructions for other processor architectures.
(Some minor tweaks of Jorrit's patch to avoid requiring GNU make and to
avoid C++ comments in .c files.)