Compare commits

...

22 Commits

Author SHA1 Message Date
Wayne Davison
c225330aaf Preparing for release of 3.2.0 2020-06-19 14:11:01 -07:00
Wayne Davison
3c56896d21 Simplify a variable. 2020-06-19 11:07:02 -07:00
Wayne Davison
deb8353d2c Yes, we know we're discarding a return value. 2020-06-19 10:56:32 -07:00
Wayne Davison
73053f26bc Simple change to recv_token(). 2020-06-19 09:55:48 -07:00
Holger Hoffstätte
0c13e1b3f8 Prevent unnecessary xattr warning by reordering header inclusion. (#22)
xattr headers have been provided by glibc (at least on Linux/glibc)
for many years now. Reorder the inclusion of xattr headers to
attempt compatibility/legacy after the common case.
This prevents the warning without changing compatibility to
non-glibc systems.

* Add dependency on lib/sysxattrs.h header in Makefile

Co-authored-by: Wayne Davison <wayne@opencoder.net>
2020-06-19 08:22:54 -07:00
Wayne Davison
9da38f2f99 A few minor man page tweaks. 2020-06-19 00:26:43 -07:00
Wayne Davison
a93ffb1ae9 More non-breaking space/dash improvements
- In html, use css more for non-breakability.
- In nroff, mark more dashes as non-breaking in code->bold sections,
  and get rid of backslashed dashes in preformatted blocks.
2020-06-18 23:55:51 -07:00
Wayne Davison
e08f600378 Use -&#8288; instead of &#8209;
Using a non-breaking zero-width char after a dash makes the browser
avoiding breaking on that dash and also makes it match a dash in a
search.  This is better than a non-breaking dash char, which does not
match a dash in a search.
2020-06-18 22:58:11 -07:00
Wayne Davison
e406845542 Comment must be indented to avoid ending the list item. 2020-06-18 21:57:34 -07:00
Wayne Davison
a93eb4cf38 Handle a missing c++ too. 2020-06-18 17:02:46 -07:00
Wayne Davison
7fd24bef0f Make SIMD enabled by default again (for x86_64) 2020-06-18 16:28:28 -07:00
Wayne Davison
1a9a184145 Check extra rounding using an int64. 2020-06-18 15:45:39 -07:00
Wayne Davison
4965ccf283 We need to use nawk or gawk on Solaris, not their weird awk. 2020-06-18 14:53:55 -07:00
Wayne Davison
c6f89cbf9c Complain if we can't enable simd on non-x86_64. 2020-06-18 14:27:00 -07:00
Wayne Davison
2921779c1f Fix clang check. 2020-06-18 13:46:01 -07:00
Wayne Davison
cbed522ef4 Get rid of useless -e with sed. 2020-06-18 13:31:50 -07:00
Chainfire
4f539ccf21 x86-64 SIMD build fixes (#20)
* x86-64 SIMD build fixes

configure.ac was modified to detect g++ >=5 and clang++ >=7. Additionally
some script malfunctions on FreeBSD were corrected.

The get_checksum1() code has been modified to fix clang and g++ 10
compilation.

This version of the code and configure.ac has been tested on:

Ubuntu 16 - gcc 7.3.0, clang 6.0.0
Debian 10 - gcc 5.4.0, 6.4.0, 7.2.0, 8.4.0, 9.2.1, 10.0.1, clang 5.0.2,
6.0.1, 7.0.1, 8.0.0, 9.0.0, 10.0.0
ArchLinux 20200605 - gcc 10.1.0, clang 10.0.0
FreeBSD 12.1 - gcc 9.3.0, clang 8.0.1

It is unknown if it will work on gcc 5.0-5.3, but the script currently
allows it.
2020-06-18 13:20:44 -07:00
Wayne Davison
b5e539fc5a Use documentation to extract 2 more .h lists
- Change default_cvsignore char[] into a define.
- Make the DEFAULT_DONT_COMPRESS and DEFAULT_CVSIGNORE defines get set
  based on their info in rsync.1.md.
- Add a few more don't-compress suffixes from Simon Matter.
2020-06-18 11:20:57 -07:00
Wayne Davison
88c18ef648 Make the g++ check more lenient. 2020-06-18 09:31:47 -07:00
Wayne Davison
7dc9431f60 A few minor man page improvements. 2020-06-17 11:25:38 -07:00
Wayne Davison
07a3e1f939 Enhance compatibility with older python3 versions. 2020-06-17 10:52:02 -07:00
Wayne Davison
93223719c9 A couple more NEWS tweaks. 2020-06-17 10:30:32 -07:00
20 changed files with 438 additions and 217 deletions

View File

@@ -18,7 +18,7 @@ jobs:
- name: prepare-source
run: ./prepare-source
- name: configure
run: ./configure --with-included-popt --with-included-zlib --enable-simd
run: ./configure --with-included-popt --with-included-zlib
- name: make
run: make
- name: version-summary

2
.gitignore vendored
View File

@@ -19,6 +19,8 @@ aclocal.m4
/rsync*.5
/rsync*.html
/help-rsync*.h
/default-cvsignore.h
/default-dont-compress.h
/.md2man-works
/autom4te*.cache
/confdefs.h

View File

@@ -10,6 +10,7 @@ mandir=@mandir@
LIBS=@LIBS@
CC=@CC@
AWK=@AWK@
CFLAGS=@CFLAGS@
CPPFLAGS=@CPPFLAGS@
CXX=@CXX@
@@ -99,12 +100,18 @@ rsync$(EXEEXT): $(OBJS)
$(OBJS): $(HEADERS)
$(CHECK_OBJS): $(HEADERS)
tls.o xattrs.o: lib/sysxattrs.h
options.o: latest-year.h help-rsync.h help-rsyncd.h
exclude.o: default-cvsignore.h
loadparm.o: default-dont-compress.h
flist.o: rounding.h
default-cvsignore.h default-dont-compress.h: rsync.1.md define-from-md.awk
$(AWK) -f $(srcdir)/define-from-md.awk -v hfile=$@ $(srcdir)/rsync.1.md
help-rsync.h help-rsyncd.h: rsync.1.md help-from-md.awk
awk -f $(srcdir)/help-from-md.awk -v helpfile=$@ $(srcdir)/rsync.1.md
$(AWK) -f $(srcdir)/help-from-md.awk -v hfile=$@ $(srcdir)/rsync.1.md
rounding.h: rounding.c rsync.h proto.h
@for r in 0 1 3; do \
@@ -220,7 +227,7 @@ proto.h: proto.h-tstamp
@if test -f proto.h; then :; else cp -p $(srcdir)/proto.h .; fi
proto.h-tstamp: $(srcdir)/*.c $(srcdir)/lib/compat.c config.h
awk -f $(srcdir)/mkproto.awk $(srcdir)/*.c $(srcdir)/lib/compat.c
$(AWK) -f $(srcdir)/mkproto.awk $(srcdir)/*.c $(srcdir)/lib/compat.c
.PHONY: man
man: rsync.1 rsync-ssl.1 rsyncd.conf.5

14
NEWS.md
View File

@@ -1,4 +1,4 @@
# NEWS for rsync 3.2.0 (UNRELEASED)
# NEWS for rsync 3.2.0 (19 Jun 2020)
Protocol: 31 (unchanged)
@@ -170,16 +170,16 @@ Protocol: 31 (unchanged)
algorithms, extra checksum algorithms, and allow use of openssl's crypto
lib for (potentially) faster MD4/MD5 checksums.
- Add _build_ dependency for g++ (on x86_64 systems) to enable the SIMD
checksum optimizations. This is auto-disabled on non-x86_64 build_cpu, or
if g++ isn't found on non-Linux systems. Run configure with
`--disable-simd` if you run into a build problem.
- Add _build_ dependency for g++ or clang++ on x86_64 systems to enable the
SIMD checksum optimizations.
- Add _build_ dependency for _either_ python3-cmarkcfm or python3-commonmark
to allow for patching of man pages or building a git release. This is not
required for a release-tar build, since it comes with pre-built man pages.
(Note that cmarkcfm is faster than commonmark, but they generate the same
data.)
Note that cmarkcfm is faster than commonmark, but they generate the same
data. The commonmark dependency is easiest to install since it's native
python, and can be installed via `pip3 install --user commonmark` if you
want to just install it for the build user (or omit `--user`).
- Remove yodl _build_ dependency (if it was even listed before).

View File

@@ -3794,7 +3794,7 @@ Protocol: 25 (changed)
| RELEASE DATE | VER. | DATE OF COMMIT\* | PROTOCOL |
|--------------|--------|------------------|-------------|
| ?? Jun 2020 | 3.2.0 | | 31 |
| 19 Jun 2020 | 3.2.0 | | 31 |
| 28 Jan 2018 | 3.1.3 | | 31 |
| 21 Dec 2015 | 3.1.2 | | 31 |
| 22 Jun 2014 | 3.1.1 | | 31 |

View File

@@ -394,7 +394,7 @@ void set_env_num(const char *var, long num)
/* Used for both early exec & pre-xfer exec */
static pid_t start_pre_exec(const char *cmd, int *arg_fd_ptr, int *error_fd_ptr)
{
int arg_fds[2], error_fds[2], arg_fd, error_fd;
int arg_fds[2], error_fds[2], arg_fd;
pid_t pid;
if ((error_fd_ptr && pipe(error_fds) < 0) || (arg_fd_ptr && pipe(arg_fds) < 0) || (pid = fork()) < 0)
@@ -406,8 +406,7 @@ static pid_t start_pre_exec(const char *cmd, int *arg_fd_ptr, int *error_fd_ptr)
if (error_fd_ptr) {
close(error_fds[0]);
error_fd = error_fds[1];
set_blocking(error_fd);
set_blocking(error_fds[1]);
}
if (arg_fd_ptr) {
@@ -436,8 +435,8 @@ static pid_t start_pre_exec(const char *cmd, int *arg_fd_ptr, int *error_fd_ptr)
if (error_fd_ptr) {
close(STDIN_FILENO);
dup2(error_fd, STDOUT_FILENO);
close(error_fd);
dup2(error_fds[1], STDOUT_FILENO);
close(error_fds[1]);
}
status = shell_exec(cmd);
@@ -449,8 +448,8 @@ static pid_t start_pre_exec(const char *cmd, int *arg_fd_ptr, int *error_fd_ptr)
if (error_fd_ptr) {
close(error_fds[1]);
error_fd = *error_fd_ptr = error_fds[0];
set_blocking(error_fd);
*error_fd_ptr = error_fds[0];
set_blocking(error_fds[0]);
}
if (arg_fd_ptr) {

View File

@@ -1,6 +1,6 @@
dnl Process this file with autoconf to produce a configure script.
AC_INIT([rsync],[3.2.0pre3],[http://rsync.samba.org/bugzilla.html])
AC_INIT([rsync],[3.2.0],[http://rsync.samba.org/bugzilla.html])
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_SRCDIR([byteorder.h])
@@ -42,6 +42,7 @@ dnl Checks for programs.
AC_PROG_CC
AC_PROG_CPP
AC_PROG_CXX
AC_PROG_AWK
AC_PROG_EGREP
AC_PROG_INSTALL
AC_PROG_MKDIR_P
@@ -197,28 +198,57 @@ SIMD=
AC_MSG_CHECKING([whether to enable SIMD optimizations])
AC_ARG_ENABLE(simd,
AS_HELP_STRING([--enable-simd],[enable SIMD optimizations (requires g++)]))
AS_HELP_STRING([--disable-simd],[disable SIMD optimizations (requires c++)]))
if test x"$enable_simd" = x"yes"; then
# For x86-64 SIMD, g++ is also required
if test x"$enable_simd" != x"no"; then
# For x86-64 SIMD, g++ >=5 or clang++ >=7 is required
if test x"$build_cpu" = x"x86_64"; then
if test x"$CXX" = x"g++"; then
# AC_MSG_RESULT() called below
CXX_OK=
if test x"$CXX" != x""; then
CXX_VERSION=`$CXX --version 2>/dev/null | head -n 1`
case "$CXX_VERSION" in
g++*)
CXX_VERSION=`$CXX -dumpversion | sed 's/\..*//g'`
if test "$CXX_VERSION" -ge "5"; then
CXX_OK=yes
fi
;;
*clang*)
# $CXX -dumpversion would have been ideal, but is broken on older clang
CXX_VERSION=`echo "$CXX_VERSION" | sed 's/.*version //g' | sed 's/\..*//g'`
if test "$CXX_VERSION" -ge "7"; then
CXX_OK=yes
fi
;;
*)
CXX_VERSION='Unknown'
;;
esac
else
CXX='No c++'
CXX_VERSION='Unknown'
fi
if test x"$CXX_OK" = x"yes"; then
# AC_MSG_RESULT() is called below.
SIMD="$SIMD x86_64"
else
AC_MSG_RESULT(no)
AC_MSG_ERROR(Failed to find g++ for SIMD speedups.
Omit --enable-simd to continue without it.)
AC_MSG_RESULT(error)
AC_MSG_ERROR([Failed to find g++ >=5 or clang++ >=7 for SIMD optimizations.
Specify --disable-simd to continue without it. ($CXX, $CXX_VERSION)])
fi
elif test x"$enable_simd" = x"yes"; then
AC_MSG_RESULT(unavailable)
AC_MSG_ERROR(The SIMD optimizations are currently x86_64 only.
Omit --enable-simd to continue without it.)
fi
fi
if test x"$SIMD" != x""; then
SIMD=`echo "$SIMD" | sed -e 's/^ *//'`
SIMD=`echo "$SIMD" | sed 's/^ *//'`
AC_MSG_RESULT([yes ($SIMD)])
AC_DEFINE(HAVE_SIMD, 1, [Define to 1 to enable SIMD optimizations])
SIMD=`echo "$SIMD" | sed -e 's/[[^ ]]\+/$(SIMD_&)/g'`
# We only use g++ for its target attribute dispatching, disable unneeded bulky features
SIMD=`echo "\\\$(SIMD_$SIMD)" | sed 's/ /) $(SIMD_/g'`
# We only use c++ for its target attribute dispatching, disable unneeded bulky features
CXXFLAGS="$CXXFLAGS -fno-exceptions -fno-rtti"
else
AC_MSG_RESULT(no)
@@ -630,7 +660,7 @@ size_t iconv();
#endif
]], [[]])],[am_cv_proto_iconv_arg1=""],[am_cv_proto_iconv_arg1="const"])
am_cv_proto_iconv="extern size_t iconv (iconv_t cd, $am_cv_proto_iconv_arg1 char * *inbuf, size_t *inbytesleft, char * *outbuf, size_t *outbytesleft);"])
am_cv_proto_iconv=`echo "[$]am_cv_proto_iconv" | tr -s ' ' | sed -e 's/( /(/'`
am_cv_proto_iconv=`echo "[$]am_cv_proto_iconv" | tr -s ' ' | sed 's/( /(/'`
AC_MSG_RESULT([$]{ac_t:-
}[$]am_cv_proto_iconv)
AC_DEFINE_UNQUOTED(ICONV_CONST, $am_cv_proto_iconv_arg1,

41
define-from-md.awk Executable file
View File

@@ -0,0 +1,41 @@
#!/usr/bin/awk -f
# The caller must pass args: -v hfile=NAME rsync.1.md
BEGIN {
heading = "/* DO NOT EDIT THIS FILE! It is auto-generated from a list of values in " ARGV[1] "! */"
if (hfile ~ /compress/) {
define = "#define DEFAULT_DONT_COMPRESS"
prefix = "*."
} else {
define = "#define DEFAULT_CVSIGNORE"
prefix = ""
}
value_list = ""
}
/^ > [^ ]+$/ {
gsub(/`/, "")
if (value_list != "") value_list = value_list " "
value_list = value_list prefix $2
next
}
value_list ~ /\.gz / && hfile ~ /compress/ {
exit
}
value_list ~ /SCCS / && hfile ~ /cvsignore/ {
exit
}
value_list = ""
END {
if (value_list != "")
print heading "\n\n" define " \"" value_list "\"" > hfile
else {
print "Failed to find a value list in " ARGV[1] " for " hfile
exit 1
}
}

View File

@@ -21,6 +21,7 @@
*/
#include "rsync.h"
#include "default-cvsignore.h"
extern int am_server;
extern int am_sender;
@@ -1051,16 +1052,6 @@ static filter_rule *parse_rule_tok(const char **rulestr_ptr,
return rule;
}
static char default_cvsignore[] =
/* These default ignored items come from the CVS manual. */
"RCS SCCS CVS CVS.adm RCSLOG cvslog.* tags TAGS"
" .make.state .nse_depinfo *~ #* .#* ,* _$* *$"
" *.old *.bak *.BAK *.orig *.rej .del-*"
" *.a *.olb *.o *.obj *.so *.exe"
" *.Z *.elc *.ln core"
/* The rest we added to suit ourself. */
" .svn/ .git/ .hg/ .bzr/";
static void get_cvs_excludes(uint32 rflags)
{
static int initialized = 0;
@@ -1070,7 +1061,7 @@ static void get_cvs_excludes(uint32 rflags)
return;
initialized = 1;
parse_filter_str(&cvs_filter_list, default_cvsignore,
parse_filter_str(&cvs_filter_list, DEFAULT_CVSIGNORE,
rule_template(rflags | (protocol_version >= 30 ? FILTRULE_PERISHABLE : 0)),
0);

View File

@@ -1,10 +1,10 @@
#!/usr/bin/awk -f
# The caller must set -v helpfile=help-NAME.h and pass arg NAME.NUM.md
# The caller must pass args: -v hfile=help-NAME.h NAME.NUM.md
BEGIN {
heading = "/* DO NOT EDIT THIS FILE! It is auto-generated from the option list in " ARGV[1] "! */"
findcomment = helpfile
findcomment = hfile
sub("\\.", "\\.", findcomment)
findcomment = "\\[comment\\].*" findcomment
backtick_cnt = 0
@@ -32,9 +32,9 @@ $0 ~ findcomment {
END {
if (foundcomment && backtick_cnt > 1)
print heading "\n" prints > helpfile
print heading "\n" prints > hfile
else {
print "Failed to find " helpfile " section in " ARGV[1]
print "Failed to find " hfile " section in " ARGV[1]
exit 1
}
}

View File

@@ -1,9 +1,9 @@
#ifdef SUPPORT_XATTRS
#if defined HAVE_ATTR_XATTR_H
#include <attr/xattr.h>
#elif defined HAVE_SYS_XATTR_H
#if defined HAVE_SYS_XATTR_H
#include <sys/xattr.h>
#elif defined HAVE_ATTR_XATTR_H
#include <attr/xattr.h>
#elif defined HAVE_SYS_EXTATTR_H
#include <sys/extattr.h>
#endif

View File

@@ -42,6 +42,7 @@
#include "rsync.h"
#include "itypes.h"
#include "default-dont-compress.h"
extern item_list dparam_list;
@@ -52,11 +53,6 @@ extern item_list dparam_list;
#define LOG_DAEMON 0
#endif
#define DEFAULT_DONT_COMPRESS "*.gz *.zip *.z *.rpm *.deb *.iso *.bz2" \
" *.t[gb]z *.7z *.mp[34] *.mov *.avi *.ogg *.jpg *.jpeg *.png" \
" *.lzo *.rzip *.lzma *.rar *.ace *.gpg *.xz *.txz *.lz *.tlz" \
" *.ogv *.web[mp] *.squashfs"
/* the following are used by loadparm for option lists */
typedef enum {
P_BOOL, P_BOOLREV, P_CHAR, P_INTEGER,

41
md2man
View File

@@ -35,6 +35,7 @@ body, b, strong, u {
code {
font-family: 'Roboto Mono', monospace;
font-weight: bold;
white-space: pre;
}
pre code {
display: block;
@@ -64,7 +65,9 @@ MAN_END = """\
NORM_FONT = ('\1', r"\fP")
BOLD_FONT = ('\2', r"\fB")
ULIN_FONT = ('\3', r"\fI")
UNDR_FONT = ('\3', r"\fI")
NBR_DASH = ('\4', r"\-")
NBR_SPACE = ('\xa0', r"\ ")
md_parser = None
@@ -102,11 +105,11 @@ def main():
m = re.match(r'^(\w+)=(.+)', line)
if not m:
continue
var, val = (m[1], m[2])
var, val = (m.group(1), m.group(2))
if var == 'prefix' and env_subs[var] is not None:
continue
while re.search(r'\$\{', val):
val = re.sub(r'\$\{(\w+)\}', lambda m: env_subs[m[1]], val)
val = re.sub(r'\$\{(\w+)\}', lambda m: env_subs[m.group(1)], val)
env_subs[var] = val
if var == 'VERSION':
break
@@ -212,7 +215,7 @@ class HtmlToManPage(HTMLParser):
st.txt += BOLD_FONT[0]
elif tag == 'em' or tag == 'i':
tag = 'u' # Change it into underline to be more like the man page
st.txt += ULIN_FONT[0]
st.txt += UNDR_FONT[0]
elif tag == 'ol':
start = 1
for var, val in attrs_list:
@@ -305,15 +308,21 @@ class HtmlToManPage(HTMLParser):
st.at_first_tag_in_dd = True
def handle_data(self, data):
def handle_data(self, txt):
st = self.state
if args.debug:
self.output_debug('DATA', (data,))
if st.in_code:
data = re.sub(r'\s', '\xa0', data) # nbsp in non-pre code
data = re.sub(r'\s--\s', '\xa0-- ', data)
st.html_out.append(htmlify(data))
st.txt += data
self.output_debug('DATA', (txt,))
if st.in_pre:
html = htmlify(txt)
else:
txt = re.sub(r'\s--(\s)', NBR_SPACE[0] + r'--\1', txt).replace('--', NBR_DASH[0]*2)
txt = re.sub(r'(^|\W)-', r'\1' + NBR_DASH[0], txt)
html = htmlify(txt)
if st.in_code:
txt = re.sub(r'\s', NBR_SPACE[0], txt)
html = html.replace(NBR_DASH[0], '-').replace(NBR_SPACE[0], ' ') # <code> is non-breaking in CSS
st.html_out.append(html.replace(NBR_SPACE[0], '&nbsp;').replace(NBR_DASH[0], '-&#8288;'))
st.txt += txt
def output_debug(self, event, extra):
@@ -331,17 +340,15 @@ class HtmlToManPage(HTMLParser):
def manify(txt):
return re.sub(r"^(['.])", r'\&\1', txt.replace('\\', '\\\\')
.replace("\xa0", r'\ ') # non-breaking space
.replace('--', r'\-\-') # non-breaking double dash
.replace(NBR_SPACE[0], NBR_SPACE[1])
.replace(NBR_DASH[0], NBR_DASH[1])
.replace(NORM_FONT[0], NORM_FONT[1])
.replace(BOLD_FONT[0], BOLD_FONT[1])
.replace(ULIN_FONT[0], ULIN_FONT[1]), flags=re.M)
.replace(UNDR_FONT[0], UNDR_FONT[1]), flags=re.M)
def htmlify(txt):
return re.sub(r'(\W)-', r'\1&#8209;',
txt.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;')
.replace('--', '&#8209;&#8209;').replace("\xa0-", '&nbsp;&#8209;').replace("\xa0", '&nbsp;'))
return txt.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;')
def warn(*msg):

View File

@@ -1,9 +1,9 @@
Summary: A fast, versatile, remote (and local) file-copying tool
Name: rsync
Version: 3.2.0
%define fullversion %{version}pre3
Release: 0.1.pre3
%define srcdir src-previews
%define fullversion %{version}
Release: 1
%define srcdir src
Group: Applications/Internet
License: GPL
Source0: http://rsync.samba.org/ftp/rsync/%{srcdir}/rsync-%{fullversion}.tar.gz
@@ -79,8 +79,8 @@ rm -rf $RPM_BUILD_ROOT
%dir /etc/rsync-ssl/certs
%changelog
* Wed Jun 17 2020 Wayne Davison <wayne@opencoder.net>
Released 3.2.0pre3.
* Fri Jun 19 2020 Wayne Davison <wayne@opencoder.net>
Released 3.2.0.
* Fri Mar 21 2008 Wayne Davison <wayne@opencoder.net>
Added installation of /etc/xinetd.d/rsync file and some commented-out

View File

@@ -24,11 +24,11 @@
struct test {
union file_extras extras[ARRAY_LEN];
struct file_struct file;
int64 test;
};
#define ACTUAL_SIZE SIZEOF(struct test)
#define EXPECTED_SIZE (SIZEOF(union file_extras) * ARRAY_LEN + SIZEOF(struct file_struct))
#define EXPECTED_SIZE (SIZEOF(union file_extras) * ARRAY_LEN + SIZEOF(int64))
int main(UNUSED(int argc), UNUSED(char *argv[]))
{

View File

@@ -13,6 +13,13 @@ rsync-ssl [--type=SSL_TYPE] RSYNC_ARGS
The rsync-ssl script helps you to run an rsync copy to/from an rsync daemon
that requires ssl connections.
The script requires that you specify an rsync-daemon arg in the style of either
`hostname::` (with 2 colons) or `rsync://hostname/`. The default port used for
connecting is 874 (one higher than the normal 873) unless overridden in the
environment. You can specify an overriding port via `--port` or by including
it in the normal spot in the URL format, though both of those require your
rsync version to be at least 3.2.0.
# OPTIONS
If the **first** arg is a `--type=SSL_TYPE` option, the script will only use
@@ -23,7 +30,7 @@ option must specify one of `openssl` or `stunnel`. The equal sign is
required for this particular option.
All the other options are passed through to the rsync command, so consult the
**rsync** manpage for more information on how it works.
**rsync**(1) manpage for more information on how it works.
# ENVIRONMENT VARIABLES
@@ -53,9 +60,13 @@ The ssl helper scripts are affected by the following environment variables:
# EXAMPLES
> rsync-ssl -aiv example.com::src/ dest
> rsync-ssl -aiv example.com::mod/ dest
> rsync-ssl --type=openssl -aiv example.com::src/ dest
> rsync-ssl --type=openssl -aiv example.com::mod/ dest
> rsync-ssl -aiv --port 9874 example.com::mod/ dest
> rsync-ssl -aiv rsync://example.com:9874/mod/ dest
# SEE ALSO

View File

@@ -498,7 +498,7 @@ parameter, the parameter is only listed after the long variant, even though it
must also be specified for the short. When specifying a parameter, you can
either use the form `--option=param` or replace the '=' with whitespace. The
parameter may need to be quoted in some manner for it to survive the shell's
command-line parsing. Keep in mind that a leading tilde (\~) in a filename is
command-line parsing. Keep in mind that a leading tilde (`~`) in a filename is
substituted by your shell, so `--option=~/foo` will not change the tilde into
your home directory (remove the '=' for that).
@@ -1852,6 +1852,8 @@ your home directory (remove the '=' for that).
The exclude list is initialized to exclude the following items (these
initial items are marked as perishable -- see the FILTER RULES section):
[comment]: # (This list gets used for the default-cvsignore.h file.)
> `RCS`
> `SCCS`
> `CVS`
@@ -2318,14 +2320,14 @@ your home directory (remove the '=' for that).
possible.
The **LIST** should be one or more file suffixes (without the dot) separated
by slashes (/). You may specify an empty string to indicate that no files
by slashes (`/`). You may specify an empty string to indicate that no files
should be skipped.
Simple character-class matching is supported: each must consist of a list
of letters inside the square brackets (e.g. no special classes, such as
"[:alpha:]", are supported, and '-' has no special meaning).
The characters asterisk (\*) and question-mark (?) have no special meaning.
The characters asterisk (`*`) and question-mark (`?`) have no special meaning.
Here's an example that specifies 6 suffixes to skip (since 1 of the 5 rules
matches 2 suffixes):
@@ -2335,38 +2337,68 @@ your home directory (remove the '=' for that).
The default file suffixes in the skip-compress list in this version of
rsync are:
[comment]: # (This list gets used for the default-dont-compress.h file.)
> 7z
> ace
> apk
> avi
> bz2
> deb
> flac
> gpg
> gz
> iso
> jar
> jpeg
> jpg
> lz
> lz4
> lzma
> lzo
> mkv
> mov
> mp3
> mp4
> odb
> odf
> odg
> odi
> odm
> odp
> ods
> odt
> ogg
> ogv
> opus
> otg
> oth
> otp
> ots
> ott
> oxt
> png
> rar
> rpm
> rz
> rzip
> squashfs
> sxc
> sxd
> sxg
> sxm
> sxw
> tbz
> tgz
> tlz
> txz
> tzo
> webm
> webp
> xz
> z
> zip
> zst
This list will be replaced by your `--skip-compress` list in all but one
situation: a copy from a daemon rsync will add your skipped suffixes to its
@@ -2399,7 +2431,7 @@ your home directory (remove the '=' for that).
You may specify usernames or user IDs for the **FROM** and **TO** values,
and the **FROM** value may also be a wild-card string, which will be
matched against the sender's names (wild-cards do NOT match against ID
numbers, though see below for why a '\*' matches everything). You may
numbers, though see below for why a '`*`' matches everything). You may
instead specify a range of ID numbers via an inclusive range: LOW-HIGH.
For example:
@@ -2417,7 +2449,7 @@ your home directory (remove the '=' for that).
Any IDs that do not have a name on the sending side are treated as having
an empty name for the purpose of matching. This allows them to be matched
via a "\*" or using an empty name. For instance:
via a "`*`" or using an empty name. For instance:
> --usermap=:nobody --groupmap=*:nobody
@@ -2478,8 +2510,9 @@ your home directory (remove the '=' for that).
which may make transfers faster (or slower!). Read the man page for the
`setsockopt()` system call for details on some of the options you may be
able to set. By default no special socket options are set. This only
affects direct socket connections to a remote rsync daemon. This option
also exists in the `--daemon` mode section.
affects direct socket connections to a remote rsync daemon.
This option also exists in the `--daemon` mode section.
0. `--blocking-io`
@@ -2686,7 +2719,7 @@ your home directory (remove the '=' for that).
The escape idiom that started in 2.6.7 is to output a literal backslash
(`\`) and a hash (`#`), followed by exactly 3 octal digits. For example, a
newline would output as "`\\#012`". A literal backslash that is in a
newline would output as "`\#012`". A literal backslash that is in a
filename is not escaped unless it is followed by a hash and 3 digits (0-9).
0. `--human-readable`, `-h`
@@ -3082,8 +3115,6 @@ your home directory (remove the '=' for that).
have no effect. The `rsync -V` output will contain "`no IPv6`" if is the
case.
See also these options in the `--daemon` mode section.
0. `--checksum-seed=NUM`
Set the checksum seed to the integer NUM. This 4 byte checksum seed is
@@ -3267,9 +3298,9 @@ include/exclude rules each specify a pattern that is matched against the names
of the files that are going to be transferred. These patterns can take several
forms:
- if the pattern starts with a / then it is anchored to a particular spot in
- if the pattern starts with a `/` then it is anchored to a particular spot in
the hierarchy of files, otherwise it is matched against the end of the
pathname. This is similar to a leading ^ in regular expressions. Thus
pathname. This is similar to a leading `^` in regular expressions. Thus
`/foo` would match a name of "foo" at either the "root of the transfer" (for
a global rule) or in the merge-file's directory (for a per-directory rule).
An unqualified `foo` would match a name of "foo" anywhere in the tree because
@@ -3279,24 +3310,24 @@ forms:
was found within a directory named "sub". See the section on ANCHORING
INCLUDE/EXCLUDE PATTERNS for a full discussion of how to specify a pattern
that matches at the root of the transfer.
- if the pattern ends with a / then it will only match a directory, not a
- if the pattern ends with a `/` then it will only match a directory, not a
regular file, symlink, or device.
- rsync chooses between doing a simple string match and wildcard matching by
checking if the pattern contains one of these three wildcard characters:
'`*`', '`?`', and '`[`' .
- a '`*`' matches any path component, but it stops at slashes.
- use '`**`' to match anything, including slashes.
- a '?' matches any character except a slash (/).
- a '[' introduces a character class, such as [a-z] or [[:alpha:]].
- a '`?`' matches any character except a slash (`/`).
- a '`[`' introduces a character class, such as `[a-z]` or `[[:alpha:]]`.
- in a wildcard pattern, a backslash can be used to escape a wildcard
character, but it is matched literally when no wildcards are present. This
means that there is an extra level of backslash removal when a pattern
contains wildcard characters compared to a pattern that has none. e.g. if
you add a wildcard to "`foo\bar`" (which matches the backslash) you would
need to use "`foo\\bar*`" to avoid the "`\b`" becoming just "b".
- if the pattern contains a / (not counting a trailing /) or a "`**`", then it
- if the pattern contains a `/` (not counting a trailing /) or a "`**`", then it
is matched against the full pathname, including any leading directories. If
the pattern doesn't contain a / or a "`**`", then it is matched only against
the pattern doesn't contain a `/` or a "`**`", then it is matched only against
the final component of the filename. (Remember that the algorithm is applied
recursively so "full filename" can actually be any portion of a path from the
starting directory on down.)
@@ -3311,20 +3342,20 @@ include/exclude patterns are applied recursively to the pathname of each node
in the filesystem's tree (those inside the transfer). The exclude patterns
short-circuit the directory traversal stage as rsync finds the files to send.
For instance, to include "/foo/bar/baz", the directories "/foo" and "/foo/bar"
For instance, to include "`/foo/bar/baz`", the directories "`/foo`" and "`/foo/bar`"
must not be excluded. Excluding one of those parent directories prevents the
examination of its content, cutting off rsync's recursion into those paths and
rendering the include for "/foo/bar/baz" ineffectual (since rsync can't match
rendering the include for "`/foo/bar/baz`" ineffectual (since rsync can't match
something it never sees in the cut-off section of the directory hierarchy).
The concept path exclusion is particularly important when using a trailing '\*'
The concept path exclusion is particularly important when using a trailing '`*`'
rule. For instance, this won't work:
> + /some/path/this-file-will-not-be-found
> + /file-is-included
> - *
This fails because the parent directory "some" is excluded by the '\*' rule, so
This fails because the parent directory "some" is excluded by the '`*`' rule, so
rsync never visits any of the files in the "some" or "some/path" directories.
One solution is to ask for all directories in the hierarchy to be included by
using a single rule: "`+ */`" (put it somewhere before the "`- *`" rule), and

View File

@@ -45,9 +45,10 @@
* the available xmm registers, this optimized version may not be faster than
* the pure C version anyway. Note that all x86-64 CPUs support at least SSE2.
*
* This file is compiled using GCC 4.8+'s C++ front end to allow the use of
* the target attribute, selecting the fastest code path based on runtime
* detection of CPU capabilities.
* This file is compiled using GCC 4.8+/clang 6+'s C++ front end to allow the
* use of the target attribute, selecting the fastest code path based on
* dispatch priority (GCC 5) or runtime detection of CPU capabilities (GCC 6+).
* GCC 4.x are not supported to ease configure.ac logic.
*/
#ifdef __x86_64__
@@ -59,73 +60,34 @@
#include <immintrin.h>
/* Compatibility functions to let our SSSE3 algorithm run on SSE2 */
/* Some clang versions don't like it when you use static with multi-versioned functions: linker errors */
#ifdef __clang__
#define MVSTATIC
#else
#define MVSTATIC static
#endif
__attribute__ ((target("sse2"))) static inline __m128i sse_interleave_odd_epi16(__m128i a, __m128i b)
{
return _mm_packs_epi32(
_mm_srai_epi32(a, 16),
_mm_srai_epi32(b, 16)
);
}
// Missing from the headers on gcc 6 and older, clang 8 and older
typedef long long __m128i_u __attribute__((__vector_size__(16), __may_alias__, __aligned__(1)));
typedef long long __m256i_u __attribute__((__vector_size__(32), __may_alias__, __aligned__(1)));
__attribute__ ((target("sse2"))) static inline __m128i sse_interleave_even_epi16(__m128i a, __m128i b)
{
return sse_interleave_odd_epi16(
_mm_slli_si128(a, 2),
_mm_slli_si128(b, 2)
);
}
/* Compatibility macros to let our SSSE3 algorithm run with only SSE2.
These used to be neat individual functions with target attributes switching between SSE2 and SSSE3 implementations
as needed, but though this works perfectly with GCC, clang fails to inline those properly leading to a near 50%
performance drop - combined with static and inline modifiers gets you linker errors and even compiler crashes...
*/
__attribute__ ((target("sse2"))) static inline __m128i sse_mulu_odd_epi8(__m128i a, __m128i b)
{
return _mm_mullo_epi16(
_mm_srli_epi16(a, 8),
_mm_srai_epi16(b, 8)
);
}
#define SSE2_INTERLEAVE_ODD_EPI16(a, b) _mm_packs_epi32(_mm_srai_epi32(a, 16), _mm_srai_epi32(b, 16))
#define SSE2_INTERLEAVE_EVEN_EPI16(a, b) SSE2_INTERLEAVE_ODD_EPI16(_mm_slli_si128(a, 2), _mm_slli_si128(b, 2))
#define SSE2_MULU_ODD_EPI8(a, b) _mm_mullo_epi16(_mm_srli_epi16(a, 8), _mm_srai_epi16(b, 8))
#define SSE2_MULU_EVEN_EPI8(a, b) _mm_mullo_epi16(_mm_and_si128(a, _mm_set1_epi16(0xFF)), _mm_srai_epi16(_mm_slli_si128(b, 1), 8))
__attribute__ ((target("sse2"))) static inline __m128i sse_mulu_even_epi8(__m128i a, __m128i b)
{
return _mm_mullo_epi16(
_mm_and_si128(a, _mm_set1_epi16(0xFF)),
_mm_srai_epi16(_mm_slli_si128(b, 1), 8)
);
}
#define SSE2_HADDS_EPI16(a, b) _mm_adds_epi16(SSE2_INTERLEAVE_EVEN_EPI16(a, b), SSE2_INTERLEAVE_ODD_EPI16(a, b))
#define SSE2_MADDUBS_EPI16(a, b) _mm_adds_epi16(SSE2_MULU_EVEN_EPI8(a, b), SSE2_MULU_ODD_EPI8(a, b))
__attribute__ ((target("sse2"))) static inline __m128i sse_hadds_epi16(__m128i a, __m128i b)
{
return _mm_adds_epi16(
sse_interleave_even_epi16(a, b),
sse_interleave_odd_epi16(a, b)
);
}
__attribute__ ((target("ssse3"))) static inline __m128i sse_hadds_epi16(__m128i a, __m128i b)
{
return _mm_hadds_epi16(a, b);
}
__attribute__ ((target("sse2"))) static inline __m128i sse_maddubs_epi16(__m128i a, __m128i b)
{
return _mm_adds_epi16(
sse_mulu_even_epi8(a, b),
sse_mulu_odd_epi8(a, b)
);
}
__attribute__ ((target("ssse3"))) static inline __m128i sse_maddubs_epi16(__m128i a, __m128i b)
{
return _mm_maddubs_epi16(a, b);
}
/* These don't actually get called, but we need to define them. */
__attribute__ ((target("default"))) static inline __m128i sse_interleave_odd_epi16(__m128i a, __m128i b) { return a; }
__attribute__ ((target("default"))) static inline __m128i sse_interleave_even_epi16(__m128i a, __m128i b) { return a; }
__attribute__ ((target("default"))) static inline __m128i sse_mulu_odd_epi8(__m128i a, __m128i b) { return a; }
__attribute__ ((target("default"))) static inline __m128i sse_mulu_even_epi8(__m128i a, __m128i b) { return a; }
__attribute__ ((target("default"))) static inline __m128i sse_hadds_epi16(__m128i a, __m128i b) { return a; }
__attribute__ ((target("default"))) static inline __m128i sse_maddubs_epi16(__m128i a, __m128i b) { return a; }
__attribute__ ((target("default"))) MVSTATIC int32 get_checksum1_avx2_64(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2) { return i; }
__attribute__ ((target("default"))) MVSTATIC int32 get_checksum1_ssse3_32(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2) { return i; }
__attribute__ ((target("default"))) MVSTATIC int32 get_checksum1_sse2_32(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2) { return i; }
/*
Original loop per 4 bytes:
@@ -146,12 +108,7 @@ __attribute__ ((target("default"))) static inline __m128i sse_maddubs_epi16(__m1
s1 += (uint32)(t1[0] + t1[1] + t1[2] + t1[3] + t1[4] + t1[5] + t1[6] + t1[7]) +
32*CHAR_OFFSET;
*/
/*
Both sse2 and ssse3 targets must be specified here or we lose (a lot) of
performance, possibly due to not unrolling+inlining the called targeted
functions.
*/
__attribute__ ((target("sse2", "ssse3"))) static int32 get_checksum1_sse2_32(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
__attribute__ ((target("ssse3"))) MVSTATIC int32 get_checksum1_ssse3_32(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
{
if (len > 32) {
int aligned = ((uintptr_t)buf & 15) == 0;
@@ -167,16 +124,11 @@ __attribute__ ((target("sse2", "ssse3"))) static int32 get_checksum1_sse2_32(sch
for (; i < (len-32); i+=32) {
// Load ... 2*[int8*16]
// SSSE3 has _mm_lqqdu_si128, but this requires another
// target function for each SSE2 and SSSE3 loads. For reasons
// unknown (to me) we lose about 10% performance on some CPUs if
// we do that right here. We just use _mm_loadu_si128 as for all
// but a handful of specific old CPUs they are synonymous, and
// take the 1-5% hit on those specific CPUs where it isn't.
__m128i in8_1, in8_2;
if (!aligned) {
in8_1 = _mm_loadu_si128((__m128i_u*)&buf[i]);
in8_2 = _mm_loadu_si128((__m128i_u*)&buf[i + 16]);
// Synonymous with _mm_loadu_si128 on all but a handful of old CPUs
in8_1 = _mm_lddqu_si128((__m128i_u*)&buf[i]);
in8_2 = _mm_lddqu_si128((__m128i_u*)&buf[i + 16]);
} else {
in8_1 = _mm_load_si128((__m128i_u*)&buf[i]);
in8_2 = _mm_load_si128((__m128i_u*)&buf[i + 16]);
@@ -185,13 +137,13 @@ __attribute__ ((target("sse2", "ssse3"))) static int32 get_checksum1_sse2_32(sch
// (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8]
// Fastest, even though multiply by 1
__m128i mul_one = _mm_set1_epi8(1);
__m128i add16_1 = sse_maddubs_epi16(mul_one, in8_1);
__m128i add16_2 = sse_maddubs_epi16(mul_one, in8_2);
__m128i add16_1 = _mm_maddubs_epi16(mul_one, in8_1);
__m128i add16_2 = _mm_maddubs_epi16(mul_one, in8_2);
// (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8]
__m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << 16) + (1 << 24));
__m128i mul_add16_1 = sse_maddubs_epi16(mul_const, in8_1);
__m128i mul_add16_2 = sse_maddubs_epi16(mul_const, in8_2);
__m128i mul_add16_1 = _mm_maddubs_epi16(mul_const, in8_1);
__m128i mul_add16_2 = _mm_maddubs_epi16(mul_const, in8_2);
// s2 += 32*s1
ss2 = _mm_add_epi32(ss2, _mm_slli_epi32(ss1, 5));
@@ -224,7 +176,111 @@ __attribute__ ((target("sse2", "ssse3"))) static int32 get_checksum1_sse2_32(sch
// [t1[0] + t1[1], t1[2] + t1[3] ...] [int16*8]
// We could've combined this with generating sum_add32 above and
// save an instruction but benchmarking shows that as being slower
__m128i add16 = sse_hadds_epi16(add16_1, add16_2);
__m128i add16 = _mm_hadds_epi16(add16_1, add16_2);
// [t1[0], t1[1], ...] -> [t1[0]*28 + t1[1]*24, ...] [int32*4]
__m128i mul32 = _mm_madd_epi16(add16, mul_t1);
// [sum(mul32), X, X, X] [int32*4]; faster than multiple _mm_hadd_epi32
mul32 = _mm_add_epi32(mul32, _mm_srli_si128(mul32, 4));
mul32 = _mm_add_epi32(mul32, _mm_srli_si128(mul32, 8));
// s2 += 28*t1[0] + 24*t1[1] + 20*t1[2] + 16*t1[3] + 12*t1[4] + 8*t1[5] + 4*t1[6]
ss2 = _mm_add_epi32(ss2, mul32);
#if CHAR_OFFSET != 0
// s1 += 32*CHAR_OFFSET
__m128i char_offset_multiplier = _mm_set1_epi32(32 * CHAR_OFFSET);
ss1 = _mm_add_epi32(ss1, char_offset_multiplier);
// s2 += 528*CHAR_OFFSET
char_offset_multiplier = _mm_set1_epi32(528 * CHAR_OFFSET);
ss2 = _mm_add_epi32(ss2, char_offset_multiplier);
#endif
}
_mm_store_si128((__m128i_u*)x, ss1);
*ps1 = x[0];
_mm_store_si128((__m128i_u*)x, ss2);
*ps2 = x[0];
}
return i;
}
/*
Same as SSSE3 version, but using macros defined above to emulate SSSE3 calls that are not available with SSE2.
For GCC-only the SSE2 and SSSE3 versions could be a single function calling other functions with the right
target attributes to emulate SSSE3 calls on SSE2 if needed, but clang doesn't inline those properly leading
to a near 50% performance drop.
*/
__attribute__ ((target("sse2"))) MVSTATIC int32 get_checksum1_sse2_32(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
{
if (len > 32) {
int aligned = ((uintptr_t)buf & 15) == 0;
uint32 x[4] = {0};
x[0] = *ps1;
__m128i ss1 = _mm_loadu_si128((__m128i_u*)x);
x[0] = *ps2;
__m128i ss2 = _mm_loadu_si128((__m128i_u*)x);
const int16 mul_t1_buf[8] = {28, 24, 20, 16, 12, 8, 4, 0};
__m128i mul_t1 = _mm_loadu_si128((__m128i_u*)mul_t1_buf);
for (; i < (len-32); i+=32) {
// Load ... 2*[int8*16]
__m128i in8_1, in8_2;
if (!aligned) {
in8_1 = _mm_loadu_si128((__m128i_u*)&buf[i]);
in8_2 = _mm_loadu_si128((__m128i_u*)&buf[i + 16]);
} else {
in8_1 = _mm_load_si128((__m128i_u*)&buf[i]);
in8_2 = _mm_load_si128((__m128i_u*)&buf[i + 16]);
}
// (1*buf[i] + 1*buf[i+1]), (1*buf[i+2], 1*buf[i+3]), ... 2*[int16*8]
// Fastest, even though multiply by 1
__m128i mul_one = _mm_set1_epi8(1);
__m128i add16_1 = SSE2_MADDUBS_EPI16(mul_one, in8_1);
__m128i add16_2 = SSE2_MADDUBS_EPI16(mul_one, in8_2);
// (4*buf[i] + 3*buf[i+1]), (2*buf[i+2], buf[i+3]), ... 2*[int16*8]
__m128i mul_const = _mm_set1_epi32(4 + (3 << 8) + (2 << 16) + (1 << 24));
__m128i mul_add16_1 = SSE2_MADDUBS_EPI16(mul_const, in8_1);
__m128i mul_add16_2 = SSE2_MADDUBS_EPI16(mul_const, in8_2);
// s2 += 32*s1
ss2 = _mm_add_epi32(ss2, _mm_slli_epi32(ss1, 5));
// [sum(t1[0]..t1[7]), X, X, X] [int32*4]; faster than multiple _mm_hadds_epi16
// Shifting left, then shifting right again and shuffling (rather than just
// shifting right as with mul32 below) to cheaply end up with the correct sign
// extension as we go from int16 to int32.
__m128i sum_add32 = _mm_add_epi16(add16_1, add16_2);
sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 2));
sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 4));
sum_add32 = _mm_add_epi16(sum_add32, _mm_slli_si128(sum_add32, 8));
sum_add32 = _mm_srai_epi32(sum_add32, 16);
sum_add32 = _mm_shuffle_epi32(sum_add32, 3);
// [sum(t2[0]..t2[7]), X, X, X] [int32*4]; faster than multiple _mm_hadds_epi16
__m128i sum_mul_add32 = _mm_add_epi16(mul_add16_1, mul_add16_2);
sum_mul_add32 = _mm_add_epi16(sum_mul_add32, _mm_slli_si128(sum_mul_add32, 2));
sum_mul_add32 = _mm_add_epi16(sum_mul_add32, _mm_slli_si128(sum_mul_add32, 4));
sum_mul_add32 = _mm_add_epi16(sum_mul_add32, _mm_slli_si128(sum_mul_add32, 8));
sum_mul_add32 = _mm_srai_epi32(sum_mul_add32, 16);
sum_mul_add32 = _mm_shuffle_epi32(sum_mul_add32, 3);
// s1 += t1[0] + t1[1] + t1[2] + t1[3] + t1[4] + t1[5] + t1[6] + t1[7]
ss1 = _mm_add_epi32(ss1, sum_add32);
// s2 += t2[0] + t2[1] + t2[2] + t2[3] + t2[4] + t2[5] + t2[6] + t2[7]
ss2 = _mm_add_epi32(ss2, sum_mul_add32);
// [t1[0] + t1[1], t1[2] + t1[3] ...] [int16*8]
// We could've combined this with generating sum_add32 above and
// save an instruction but benchmarking shows that as being slower
__m128i add16 = SSE2_HADDS_EPI16(add16_1, add16_2);
// [t1[0], t1[1], ...] -> [t1[0]*28 + t1[1]*24, ...] [int32*4]
__m128i mul32 = _mm_madd_epi16(add16, mul_t1);
@@ -270,7 +326,7 @@ __attribute__ ((target("sse2", "ssse3"))) static int32 get_checksum1_sse2_32(sch
s1 += (uint32)(t1[0] + t1[1] + t1[2] + t1[3] + t1[4] + t1[5] + t1[6] + t1[7] + t1[8] + t1[9] + t1[10] + t1[11] + t1[12] + t1[13] + t1[14] + t1[15]) +
64*CHAR_OFFSET;
*/
__attribute__ ((target("avx2"))) static int32 get_checksum1_avx2_64(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
__attribute__ ((target("avx2"))) MVSTATIC int32 get_checksum1_avx2_64(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
{
if (len > 64) {
// Instructions reshuffled compared to SSE2 for slightly better performance
@@ -377,17 +433,7 @@ __attribute__ ((target("avx2"))) static int32 get_checksum1_avx2_64(schar* buf,
return i;
}
__attribute__ ((target("default"))) static int32 get_checksum1_avx2_64(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
{
return i;
}
__attribute__ ((target("default"))) static int32 get_checksum1_sse2_32(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
{
return i;
}
static inline int32 get_checksum1_default_1(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
static int32 get_checksum1_default_1(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2)
{
uint32 s1 = *ps1;
uint32 s2 = *ps2;
@@ -403,9 +449,10 @@ static inline int32 get_checksum1_default_1(schar* buf, int32 len, int32 i, uint
return i;
}
extern "C" {
uint32 get_checksum1(char *buf1, int32 len)
/* With GCC 10 putting this implementation inside 'extern "C"' causes an
assembler error. That worked fine on GCC 5-9 and clang 6-10...
*/
static inline uint32 get_checksum1_cpp(char *buf1, int32 len)
{
int32 i = 0;
uint32 s1 = 0;
@@ -414,7 +461,10 @@ uint32 get_checksum1(char *buf1, int32 len)
// multiples of 64 bytes using AVX2 (if available)
i = get_checksum1_avx2_64((schar*)buf1, len, i, &s1, &s2);
// multiples of 32 bytes using SSE2/SSSE3 (if available)
// multiples of 32 bytes using SSSE3 (if available)
i = get_checksum1_ssse3_32((schar*)buf1, len, i, &s1, &s2);
// multiples of 32 bytes using SSE2 (if available)
i = get_checksum1_sse2_32((schar*)buf1, len, i, &s1, &s2);
// whatever is left
@@ -423,7 +473,70 @@ uint32 get_checksum1(char *buf1, int32 len)
return (s1 & 0xffff) + (s2 << 16);
}
} // "C"
extern "C" {
uint32 get_checksum1(char *buf1, int32 len)
{
return get_checksum1_cpp(buf1, len);
}
} // extern "C"
#ifdef BENCHMARK_SIMD_CHECKSUM1
#pragma clang optimize off
#pragma GCC push_options
#pragma GCC optimize ("O0")
#define ROUNDS 1024
#define BLOCK_LEN 1024*1024
#ifndef CLOCK_MONOTONIC_RAW
#define CLOCK_MONOTONIC_RAW CLOCK_MONOTONIC
#endif
static void benchmark(const char* desc, int32 (*func)(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2), schar* buf, int32 len) {
struct timespec start, end;
uint64_t us;
uint32_t cs, s1, s2;
int i, next;
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
for (i = 0; i < ROUNDS; i++) {
s1 = s2 = 0;
next = func((schar*)buf, len, 0, &s1, &s2);
get_checksum1_default_1((schar*)buf, len, next, &s1, &s2);
}
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
us = next == 0 ? 0 : (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_nsec - start.tv_nsec) / 1000;
cs = next == 0 ? 0 : (s1 & 0xffff) + (s2 << 16);
printf("%-5s :: %5.0f MB/s :: %08x\n", desc, us ? (float)(len / (1024 * 1024) * ROUNDS) / ((float)us / 1000000.0f) : 0, cs);
}
static int32 get_checksum1_auto(schar* buf, int32 len, int32 i, uint32* ps1, uint32* ps2) {
uint32 cs = get_checksum1((char*)buf, len);
*ps1 = cs & 0xffff;
*ps2 = cs >> 16;
return len;
}
int main() {
int i;
unsigned char* buf = (unsigned char*)malloc(BLOCK_LEN);
for (i = 0; i < BLOCK_LEN; i++) buf[i] = (i + (i % 3) + (i % 11)) % 256;
benchmark("Auto", get_checksum1_auto, (schar*)buf, BLOCK_LEN);
benchmark("Raw-C", get_checksum1_default_1, (schar*)buf, BLOCK_LEN);
benchmark("SSE2", get_checksum1_sse2_32, (schar*)buf, BLOCK_LEN);
benchmark("SSSE3", get_checksum1_ssse3_32, (schar*)buf, BLOCK_LEN);
benchmark("AVX2", get_checksum1_avx2_64, (schar*)buf, BLOCK_LEN);
free(buf);
return 0;
}
#pragma GCC pop_options
#pragma clang optimize on
#endif /* BENCHMARK_SIMD_CHECKSUM1 */
#endif /* HAVE_SIMD */
#endif /* __cplusplus */

15
token.c
View File

@@ -1135,30 +1135,23 @@ void send_token(int f, int32 token, struct map_struct *buf, OFF_T offset,
*/
int32 recv_token(int f, char **data)
{
int tok;
switch (do_compression) {
case CPRES_NONE:
tok = simple_recv_token(f,data);
break;
return simple_recv_token(f,data);
case CPRES_ZLIB:
case CPRES_ZLIBX:
tok = recv_deflated_token(f, data);
break;
return recv_deflated_token(f, data);
#ifdef SUPPORT_ZSTD
case CPRES_ZSTD:
tok = recv_zstd_token(f, data);
break;
return recv_zstd_token(f, data);
#endif
#ifdef SUPPORT_LZ4
case CPRES_LZ4:
tok = recv_compressed_token(f, data);
break;
return recv_compressed_token(f, data);
#endif
default:
assert(0);
}
return tok;
}
/*

View File

@@ -527,14 +527,14 @@ const char *getallgroups(uid_t uid, item_list *gid_list)
return "getpwuid failed";
gid_list->count = 0; /* We're overwriting any items in the list */
EXPAND_ITEM_LIST(gid_list, gid_t, 32);
(void)EXPAND_ITEM_LIST(gid_list, gid_t, 32);
size = gid_list->malloced;
/* Get all the process's groups, with the pw_gid group first. */
if (getgrouplist(pw->pw_name, pw->pw_gid, gid_list->items, &size) < 0) {
if (size > (int)gid_list->malloced) {
gid_list->count = gid_list->malloced;
EXPAND_ITEM_LIST(gid_list, gid_t, size);
(void)EXPAND_ITEM_LIST(gid_list, gid_t, size);
if (getgrouplist(pw->pw_name, pw->pw_gid, gid_list->items, &size) < 0)
size = -1;
} else
@@ -553,7 +553,7 @@ const char *getallgroups(uid_t uid, item_list *gid_list)
break;
}
if (j == size) { /* The default group wasn't found! */
EXPAND_ITEM_LIST(gid_list, gid_t, size+1);
(void)EXPAND_ITEM_LIST(gid_list, gid_t, size+1);
gid_array = gid_list->items;
}
gid_array[j] = gid_array[0];