Files
rsync/rsync-web/tech_report/node3.html
Andrew Tridgell 0af88421dc import rsync-web website content as a subdirectory
Fold the standalone rsync-web repo into the rsync source tree as
rsync-web/, eliminating the sibling-checkout convention and the
drift it causes between the release-time HTML snapshot in
../release/rsync-html and the source of truth in ../rsync-web.

Flat-copy import (no git history merge).  The standalone repo at
github.com/RsyncProject/rsync-web is retained for historical
reference and will be archived once the in-tree copy proves itself.

Add /rsync-web/ to .gitattributes with export-ignore so the
website content does not bloat the release source tarball
produced by 'git archive' in packaging/release.py step_7_tarball.

A follow-up commit repoints HTML_SRC in packaging/release.py at
the new in-tree location.
2026-05-20 15:36:44 +10:00

188 lines
5.3 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
* with significant contributions from:
Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
<HTML>
<HEAD>
<TITLE>Rolling checksum</TITLE>
<META NAME="description" CONTENT="Rolling checksum">
<META NAME="keywords" CONTENT="tech_report">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<LINK REL="STYLESHEET" HREF="tech_report.css">
<LINK REL="next" HREF="node4.html">
<LINK REL="previous" HREF="node2.html">
<LINK REL="up" HREF="tech_report.html">
<LINK REL="next" HREF="node4.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A NAME="tex2html42"
HREF="node4.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
SRC="next.gif"></A>
<A NAME="tex2html40"
HREF="tech_report.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
SRC="up.gif"></A>
<A NAME="tex2html34"
HREF="node2.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
SRC="previous.gif"></A>
<BR>
<B> Next:</B> <A NAME="tex2html43"
HREF="node4.html">Checksum searching</A>
<B> Up:</B> <A NAME="tex2html41"
HREF="tech_report.html">The rsync algorithm</A>
<B> Previous:</B> <A NAME="tex2html35"
HREF="node2.html">The rsync algorithm</A>
<BR>
<BR>
<!--End of Navigation Panel-->
<H1><A NAME="SECTION00030000000000000000">
Rolling checksum</A>
</H1>
<P>
The weak rolling checksum used in the rsync algorithm needs to have
the property that it is very cheap to calculate the checksum of a
buffer
<!-- MATH: $X_2 .. X_{n+1}$ -->
<I>X</I><SUB>2</SUB> .. <I>X</I><SUB><I>n</I>+1</SUB> given the checksum of buffer
<!-- MATH: $X_1 .. X_n$ -->
<I>X</I><SUB>1</SUB> .. <I>X</I><SUB><I>n</I></SUB> and
the values of the bytes <I>X</I><SUB>1</SUB> and <I>X</I><SUB><I>n</I>+1</SUB>.
<P>
The weak checksum algorithm we used in our implementation was inspired
by Mark Adler's adler-32 checksum. Our checksum is defined by
<BR><P></P>
<DIV ALIGN="CENTER">
<!-- MATH: \begin{displaymath}
a(k,l) = (\sum_{i=k}^l X_i) \bmod M
\end{displaymath} -->
<IMG
WIDTH="176" HEIGHT="56"
SRC="img3.gif"
ALT="\begin{displaymath}a(k,l) = (\sum_{i=k}^l X_i) \bmod M \end{displaymath}">
</DIV>
<BR CLEAR="ALL">
<P></P>
<BR><P></P>
<DIV ALIGN="CENTER">
<!-- MATH: \begin{displaymath}
b(k,l) = (\sum_{i=k}^l (l-i+1)X_i) \bmod M
\end{displaymath} -->
<IMG
WIDTH="240" HEIGHT="56"
SRC="img4.gif"
ALT="\begin{displaymath}b(k,l) = (\sum_{i=k}^l (l-i+1)X_i) \bmod M \end{displaymath}">
</DIV>
<BR CLEAR="ALL">
<P></P>
<BR><P></P>
<DIV ALIGN="CENTER">
<!-- MATH: \begin{displaymath}
s(k,l) = a(k,l) + 2^{16} b(k,l)
\end{displaymath} -->
<I>s</I>(<I>k</I>,<I>l</I>) = <I>a</I>(<I>k</I>,<I>l</I>) + 2<SUP>16</SUP> <I>b</I>(<I>k</I>,<I>l</I>)
</DIV>
<BR CLEAR="ALL">
<P></P>
<P>
where <I>s</I>(<I>k</I>,<I>l</I>) is the rolling checksum of the bytes
<!-- MATH: $X_k \ldots X_l$ -->
<IMG
WIDTH="67" HEIGHT="29" ALIGN="MIDDLE" BORDER="0"
SRC="img5.gif"
ALT="$X_k \ldots X_l$">.
For simplicity and speed, we use
<!-- MATH: $M = 2^{16}$ -->
<I>M</I> = 2<SUP>16</SUP>.
<P>
The important property of this checksum is that successive values can
be computed very efficiently using the recurrence relations
<P>
<BR><P></P>
<DIV ALIGN="CENTER">
<!-- MATH: \begin{displaymath}
a(k+1,l+1) = (a(k,l) - X_k + X_{l+1}) \bmod M
\end{displaymath} -->
<IMG
WIDTH="322" HEIGHT="28"
SRC="img6.gif"
ALT="\begin{displaymath}a(k+1,l+1) = (a(k,l) - X_k + X_{l+1}) \bmod M \end{displaymath}">
</DIV>
<BR CLEAR="ALL">
<P></P>
<BR><P></P>
<DIV ALIGN="CENTER">
<!-- MATH: \begin{displaymath}
b(k+1,l+1) = (b(k,l) - (l-k+1) X_k + a(k+1,l+1)) \bmod M
\end{displaymath} -->
<IMG
WIDTH="454" HEIGHT="28"
SRC="img7.gif"
ALT="\begin{displaymath}b(k+1,l+1) = (b(k,l) - (l-k+1) X_k + a(k+1,l+1)) \bmod M \end{displaymath}">
</DIV>
<BR CLEAR="ALL">
<P></P>
<P>
Thus the checksum can be calculated for blocks of length S at all
possible offsets within a file in a ``rolling'' fashion, with very
little computation at each point.
<P>
Despite its simplicity, this checksum was found to be quite adequate as
a first level check for a match of two file blocks. We have found in
practice that the probability of this checksum matching when the
blocks are not equal is quite low. This is important because the much
more expensive strong checksum must be calculated for each block where
the weak checksum matches.
<P>
<HR>
<!--Navigation Panel-->
<A NAME="tex2html42"
HREF="node4.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
SRC="next.gif"></A>
<A NAME="tex2html40"
HREF="tech_report.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
SRC="up.gif"></A>
<A NAME="tex2html34"
HREF="node2.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
SRC="previous.gif"></A>
<BR>
<B> Next:</B> <A NAME="tex2html43"
HREF="node4.html">Checksum searching</A>
<B> Up:</B> <A NAME="tex2html41"
HREF="tech_report.html">The rsync algorithm</A>
<B> Previous:</B> <A NAME="tex2html35"
HREF="node2.html">The rsync algorithm</A>
<!--End of Navigation Panel-->
<ADDRESS>
<I>Andrew Tridgell</I>
<BR><I>1998-11-09</I>
</ADDRESS>
</BODY>
</HTML>