1) integrated the library; 2) splitted units by CPU architecture; 3)
extended makefile and configure script to detect CPU architecture and
use appropriate compiler flags; 4) runtime CPU features detection for
x86 and ARM with dynamic code dispatching; 5) temporary (for test
purposes) printing info about SIMD support to stdout on program
startup; 6) new SIMD routines are not yet used in the program