Remove obsolete design notes

rsync3.txt and rsyncsh.txt are Martin Pool's 2001 design proposals ("notes towards a new version of rsync", an interactive rsync shell), neither of which reflects the current implementation. doc/profile.txt is stale profiling notes. None are referenced by the build, tests, or docs.
2026-06-08 14:15:46 -04:00 · 2026-06-05 10:49:44 +10:00
parent 5e88945a3c
commit a2ce82b35e
3 changed files with 0 additions and 535 deletions
--- a/doc/profile.txt
+++ b/doc/profile.txt
@@ -1,42 +0,0 @@
-Notes on rsync profiling
-
-strlcpy is hot:
-
-                0.00    0.00       1/7735635     push_dir [68]
-                0.00    0.00       1/7735635     pop_dir [71]
-                0.00    0.00       1/7735635     send_file_list [15]
-                0.01    0.00   18857/7735635     send_files [4]
-                0.04    0.00  129260/7735635     send_file_entry [18]
-                0.04    0.00  129260/7735635     make_file [20]
-                0.04    0.00  141666/7735635     send_directory <cycle 1> [36]
-                2.29    0.00 7316589/7735635     f_name [13]
-[14]    11.7    2.42    0.00 7735635         strlcpy [14]
-
-
-Here's the top few functions:
-
- 46.23      9.57     9.57 13160929     0.00     0.00  mdfour64
- 14.78     12.63     3.06 13160929     0.00     0.00  copy64
- 11.69     15.05     2.42  7735635     0.00     0.00  strlcpy
- 10.05     17.13     2.08    41438     0.05     0.38  sum_update
-  4.11     17.98     0.85 13159996     0.00     0.00  mdfour_update
-  1.50     18.29     0.31                             file_compare
-  1.45     18.59     0.30   129261     0.00     0.01  send_file_entry
-  1.23     18.84     0.26  2557585     0.00     0.00  f_name
-  1.11     19.07     0.23  1483750     0.00     0.00  u_strcmp
-  1.11     19.30     0.23   118129     0.00     0.00  writefd_unbuffered
-  0.92     19.50     0.19  1085011     0.00     0.00  writefd
-  0.43     19.59     0.09   156987     0.00     0.00  read_timeout
-  0.43     19.68     0.09   129261     0.00     0.00  clean_fname
-  0.39     19.75     0.08    32887     0.00     0.38  matched
-  0.34     19.82     0.07        1    70.00 16293.92  send_files
-  0.29     19.89     0.06   129260     0.00     0.00  make_file
-  0.29     19.95     0.06    75430     0.00     0.00  read_unbuffered
-
-
-
-mdfour could perhaps be made faster:
-
-/* NOTE: This code makes no attempt to be fast!  */
-
-There might be an optimized version somewhere that we can borrow.
--- a/rsync3.txt
+++ b/rsync3.txt
@@ -1,467 +0,0 @@
-*- indented-text -*-
-
-Notes towards a new version of rsync
-Martin Pool <mbp@samba.org>, September 2001.
-
-
-Good things about the current implementation:
-
-  - Widely known and adopted.
-
-  - Fast/efficient, especially for moderately small sets of files over
-    slow links (transoceanic or modem.)
-
-  - Fairly reliable.
-
-  - The choice of running over a plain TCP socket or tunneling over
-    ssh.
-
-  - rsync operations are idempotent: you can always run the same
-    command twice to make sure it worked properly without any fear.
-    (Are there any exceptions?)
-
-  - Small changes to files cause small deltas.
-
-  - There is a way to evolve the protocol to some extent.
-
-  - rdiff and rsync --write-batch allow generation of standalone patch
-    sets.  rsync+ is pretty cheesy, though.  xdelta seems cleaner.
-
-  - Process triangle is creative, but seems to provoke OS bugs.
-
-  - "Morning-after property": you don't need to know anything on the
-    local machine about the state of the remote machine, or about
-    transfers that have been done in the past.
-
-  - You can easily push or pull simply by switching the order of
-    files.
-
-  - The "modules" system has some neat features compared to
-    e.g. Apache's per-directory configuration.  In particular, because
-    you can set a userid and chroot directory, there is strong
-    protection between different modules.  I haven't seen any calls
-    for a more flexible system.
-
-
-Bad things about the current implementation:
-
-  - Persistent and hard-to-diagnose hang bugs remain
-
-  - Protocol is sketchily documented, tied to this implementation, and
-    hard to modify/extend
-
-  - Both the program and the protocol assume a single non-interactive
-    one-way transfer
-
-  - A list of all files are held in memory for the entire transfer,
-    which cripples scalability to large file trees
-
-  - Opening a new socket for every operation causes problems,
-    especially when running over SSH with password authentication.
-
-  - Renamed files are not handled: the old file is removed, and the
-    new file created from scratch.
-
-  - The versioning approach assumes that future versions of the
-    program know about all previous versions, and will do the right
-    thing.
-
-  - People always get confused about ':' vs '::'
-
-  - Error messages can be cryptic.
-
-  - Default behaviour is not intuitive: in too many cases rsync will
-    happily do nothing.  Perhaps -a should be the default?
-
-  - People get confused by trailing slashes, though it's hard to think
-    of another reasonable way to make this necessary distinction
-    between a directory and its contents.
-
-
-Protocol philosophy:
-
-   *The* big difference between protocols like HTTP, FTP, and NFS is
-    that their fundamental operations are "read this file", "delete
-    this file", and "make this directory", whereas rsync is "make this
-    directory like this one".
-
-
-Questionable features:
-
-  These are neat, but not necessarily clean or worth preserving.
-
-  - The remote rsync can be wrapped by some other program, such as in
-    tridge's rsync-mail scripts.  The general feature of sending and
-    retrieving mail over rsync is good, but this is perhaps not the
-    right way to implement it.
-
-
-Desirable features:
-
-  These don't really require architectural changes; they're just
-  something to keep in mind.
-
-  - Synchronize ACLs and extended attributes
-
-  - Anonymous servers should be efficient
-
-  - Code should be portable to non-UNIX systems
-
-  - Should be possible to document the protocol in RFC form
-
-  - --dry-run option
-
-  - IPv6 support.  Pretty straightforward.
-
-  - Allow the basis and destination files to be different.  For
-    example, you could use this when you have a CD-ROM and want to
-    download an updated image onto a hard drive.
-
-  - Efficiently interrupt and restart a transfer.  We can write a
-    checkpoint file that says where we're up to in the filesystem.
-    Alternatively, as long as transfers are idempotent, we can just
-    restart the whole thing.  [NFSv4]
-
-  - Scripting support.
-
-  - Propagate atimes and do not modify them.  This is very ugly on
-    Unix.  It might be better to try to add O_NOATIME to kernels, and
-    call that.
-
-  - Unicode.  Probably just use UTF-8 for everything.
-
-  - Open authentication system.  Can we use PAM?  Is SASL an adequate
-    mapping of PAM to the network, or useful in some other way?
-
-  - Resume interrupted transfers without the --partial flag.  We need
-    to leave the temporary file behind, and then know to use it.  This
-    leaves a risk of large temporary files accumulating, which is not
-    good.  Perhaps it should be off by default.
-
-  - tcpwrappers support.  Should be trivial; can already be done
-    through tcpd or inetd.
-
-  - Socks support built in.  It's not clear this is any better than
-    just linking against the socks library, though.
-
-  - When run over SSH, invoke with predictable command-line arguments,
-    so that people can restrict what commands sshd will run.  (Is this
-    really required?)
-
-  - Comparison mode: give a list of which files are new, gone, or
-    different.  Set return code depending on whether anything has
-    changed.
-
-  - Internationalized messages (gettext?)
-
-  - Optionally use real regexps rather than globs?
-
-  - Show overall progress.  Pretty hard to do, especially if we insist
-    on not scanning the directory tree up front.
-
-
-Regression testing:
-
-  - Support automatic testing.
-
-  - Have hard internal timeouts against hangs.
-
-  - Be deterministic.
-
-  - Measure performance.
-
-
-Hard links:
-
-  At the moment, we can recreate hard links, but it's a bit
-  inefficient: it depends on holding a list of all files in the tree.
-  Every time we see a file with a linkcount >1, we need to search for
-  another known name that has the same (fsid,inum) tuple.  We could do
-  that more efficiently by keeping a list of only files with
-  linkcount>1, and removing files from that list as all their names
-  become known.
-
-
-Command-line options:
-
-  We have rather a lot at the moment.  We might get more if the tool
-  becomes more flexible.  Do we need a .rc or configuration file?
-  That wouldn't really fit with its pattern of use: cp and tar don't
-  have them, though ssh does.
-
-
-Scripting issues:
-
-  - Perhaps support multiple scripting languages: candidates include
-    Perl, Python, Tcl, Scheme (guile?), sh, ...
-
-  - Simply running a subprocess and looking at its stdout/exit code
-    might be sufficient, though it could also be pretty slow if it's
-    called often.
-
-  - There are security issues about running remote code, at least if
-    it's not running in the users own account.  So we can either
-    disallow it, or use some kind of sandbox system.
-
-  - Python is a good language, but the syntax is not so good for
-    giving small fragments on the command line.
-
-  - Tcl is broken Lisp.
-
-  - Lots of sysadmins know Perl, though Perl can give some bizarre or
-    confusing errors.  The built in stat operators and regexps might
-    be useful.
-
-  - Sadly probably not enough people know Scheme.
-
-  - sh is hard to embed.
-
-
-Scripting hooks:
-
-  - Whether to transfer a file
-
-  - What basis file to use
-
-  - Logging
-
-  - Whether to allow transfers (for public servers)
-
-  - Authentication
-
-  - Locking
-
-  - Cache
-
-  - Generating backup path/name.
-
-  - Post-processing of backups, e.g. to do compression.
-
-  - After transfer, before replacement: so that we can spit out a diff
-    of what was changed, or kick off some kind of reconciliation
-    process.
-
-
-VFS:
-
-  Rather than talking straight to the filesystem, rsyncd talks through
-  an internal API.  Samba has one.  Is it useful?
-
-  - Could be a tidy way to implement cached signatures.
-
-  - Keep files compressed on disk?
-
-
-Interactive interface:
-
-  - Something like ncFTP, or integration into GNOME-vfs.  Probably
-    hold a single socket connection open.
-
-  - Can either call us as a separate process, or as a library.
-
-  - The standalone process needs to produce output in a form easily
-    digestible by a calling program, like the --emacs feature some
-    have.  Same goes for output: rpm outputs a series of hash symbols,
-    which are easier for a GUI to handle than "\r30% complete"
-    strings.
-
-  - Yow!  emacs support.  (You could probably build that already, of
-    course.)  I'd like to be able to write a simple script on a remote
-    machine that rsyncs it to my workstation, edits it there, then
-    pushes it back up.
-
-
-Pie-in-the-sky features:
-
-  These might have a severe impact on the protocol, and are not
-  clearly in our core requirements.  It looks like in many of them
-  having scripting hooks will allow us
-
-  - Transport over UDP multicast.  The hard part is handling multiple
-    destinations which have different basis files.  We can look at
-    multicast-TFTP for inspiration.
-
-  - Conflict resolution.  Possibly general scripting support will be
-    sufficient.
-
-  - Integrate with locking.  It's hard to see a good general solution,
-    because Unix systems have several locking mechanisms, and grabbing
-    the lock from programs that don't expect it could cause deadlocks,
-    timeouts, or other problems.  Scripting support might help.
-
-  - Replicate in place, rather than to a temporary file.  This is
-    dangerous in the case of interruption, and it also means that the
-    delta can't refer to blocks that have already been overwritten.
-    On the other hand we could semi-trivially do this at first by
-    simply generating a delta with no copy instructions.
-
-  - Replicate block devices.  Most of the difficulties here are to do
-    with replication in place, though on some systems we will also
-    have to do I/O on block boundaries.
-
-  - Peer to peer features.  Flavour of the year.  Can we think about
-    ways for clients to smoothly and voluntarily become servers for
-    content they receive?
-
-  - Imagine a situation where the destination has a much faster link
-    to the cloud than the source.  In this case, Mojo Nation downloads
-    interleaved blocks from several slower servers.  The general
-    situation might be a way for a master rsync process to farm out
-    tasks to several subjobs.  In this particular case they'd need
-    different sockets.  This might be related to multicast.
-
-
-Unlikely features:
-
-  - Allow remote source and destination.  If this can be cleanly
-    designed into the protocol, perhaps with the remote machine acting
-    as a kind of echo, then it's good.  It's uncommon enough that we
-    don't want to shape the whole protocol around it, though.
-
-    In fact, in a triangle of machines there are two possibilities:
-    all traffic passes from remote1 to remote2 through local, or local
-    just sets up the transfer and then remote1 talks to remote2.  FTP
-    supports the second but it's not clearly good.  There are some
-    security problems with being able to instruct one machine to open
-    a connection to another.
-
-
-In favour of evolving the protocol:
-
-  - Keeping compatibility with existing rsync servers will help with
-    adoption and testing.
-
-  - We should at the very least be able to fall back to the new
-    protocol.
-
-  - Error handling is not so good.
-
-
-In favour of using a new protocol:
-
-  - Maintaining compatibility might soak up development time that
-    would better go into improving a new protocol.
-
-  - If we start from scratch, it can be documented as we go, and we
-    can avoid design decisions that make the protocol complex or
-    implementation-bound.
-
-
-Error handling:
-
-  - Errors should come back reliably, and be clearly associated with
-    the particular file that caused the problem.
-
-  - Some errors ought to cause the whole transfer to abort; some are
-    just warnings.  If any errors have occurred, then rsync ought to
-    return an error.
-
-
-Concurrency:
-
-  - We want to keep the CPU, filesystem, and network as full as
-    possible as much of the time as possible.
-
-  - We can do nonblocking network IO, but not so for disk.
-
-  - It makes sense to on the destination be generating signatures and
-    applying patches at the same time.
-
-  - Can structure this with nonblocking, threads, separate processes,
-    etc.
-
-
-Uses:
-
-  - Mirroring software distributions:
-
-  - Synchronizing laptop and desktop
-
-  - NFS filesystem migration/replication.  See
-    http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764
-
-  - Sync with PDA
-
-  - Network backup systems
-
-  - CVS filemover
-
-
-Conflict resolution:
-
-  - Requires application-specific knowledge.  We want to provide
-    policy, rather than mechanism.
-
-  - Possibly allowing two-way migration across a single connection
-    would be useful.
-
-
-Moved files:
-
-  - There's no trivial way to detect renamed files, especially if they
-    move between directories.
-
-  - If we had a picture of the remote directory from last time on
-    either machine, then the inode numbers might give us a hint about
-    files which may have been renamed.
-
-  - Files that are renamed and not modified can be detected by
-    examining the directory listing, looking for files with the same
-    size/date as the origin.
-
-
-Filesystem migration:
-
-  NFSv4 probably wants to migrate file locks, but that's not really
-  our problem.
-
-
-Atomic updates:
-
-  The NFSv4 working group wants atomic migration.  Most of the
-  responsibility for this lies on the NFS server or OS.
-
-  If migrating a whole tree, then we could do a nearly-atomic rename
-  at the end.  This ties in to having separate basis and destination
-  files.
-
-  There's no way in Unix to replace a whole set of files atomically.
-  However, if we get them all onto the destination machine and then do
-  the updates quickly it would greatly reduce the window.
-
-
-Scalability:
-
-  We should aim to work well on machines in use in a year or two.
-  That probably means transfers of many millions of files in one
-  batch, and gigabytes or terabytes of data.
-
-  For argument's sake: at the low end, we want to sync ten files for a
-  total of 10kb across a 1kB/s link.  At the high end, we want to sync
-  1e9 files for 1TB of data across a 1GB/s link.
-
-  On the whole CPU usage is not normally a limiting factor, if only
-  because running over SSH burns a lot of cycles on encryption.
-
-  Perhaps have resource throttling without relying on rlimit.
-
-
-Streaming:
-
-  A big attraction of rsync is that there are few round-trip delays:
-  basically only one to get started, and then everything is
-  pipelined.  This is a problem with FTP, and NFS (at least up to
-  v3).  NFSv4 can pipeline operations, but building on that is
-  probably a bit complicated.
-
-
-Related work:
-
-  - mirror.pl
-
-  - ProFTPd
-
-  - Apache
-
-  - BitTorrent -- p2p mirroring
-    http://bitconjurer.org/BitTorrent/
--- a/rsyncsh.txt
+++ b/rsyncsh.txt
@@ -1,26 +0,0 @@
-rsyncsh
-Copyright (C) 2001 by Martin Pool
-
-This is a quick hack to build an interactive shell around rsync, the
-same way we have the ftp, lftp and ncftp programs for the FTP
-protocol.  The key application for this is connecting to a public
-rsync server, such as rsync.kernel.org, change down through and list
-directories, and finally pull down the file you want.
-
-rsync is somewhat ill-at-ease as an interactive operation, since every
-network connection is used to carry out exactly one operation.  rsync
-kind of "forks across the network" passing the options and filenames
-to operate upon, and the connection is closed when the transfer is
-complete.  (This might be fixed in the future, either by adapting the
-current protocol to allow chained operations over a single socket, or
-by writing a new protocol that better supports interactive use.)
-
-So, rsyncsh runs a new rsync command and opens a new socket for every
-(network-based) command you type.
-
-This has two consequences.  Firstly, there is more command latency
-than is really desirable.  More seriously, if the connection cannot be
-done automatically, because for example it uses SSH with a password,
-then you will need to enter the password every time.  We might even
-fix this in the future, though, by having a way to automatically feed
-the password to SSH if it's entered once.