diff --git a/doc/profile.txt b/doc/profile.txt deleted file mode 100644 index b911d0ae..00000000 --- a/doc/profile.txt +++ /dev/null @@ -1,42 +0,0 @@ -Notes on rsync profiling - -strlcpy is hot: - - 0.00 0.00 1/7735635 push_dir [68] - 0.00 0.00 1/7735635 pop_dir [71] - 0.00 0.00 1/7735635 send_file_list [15] - 0.01 0.00 18857/7735635 send_files [4] - 0.04 0.00 129260/7735635 send_file_entry [18] - 0.04 0.00 129260/7735635 make_file [20] - 0.04 0.00 141666/7735635 send_directory [36] - 2.29 0.00 7316589/7735635 f_name [13] -[14] 11.7 2.42 0.00 7735635 strlcpy [14] - - -Here's the top few functions: - - 46.23 9.57 9.57 13160929 0.00 0.00 mdfour64 - 14.78 12.63 3.06 13160929 0.00 0.00 copy64 - 11.69 15.05 2.42 7735635 0.00 0.00 strlcpy - 10.05 17.13 2.08 41438 0.05 0.38 sum_update - 4.11 17.98 0.85 13159996 0.00 0.00 mdfour_update - 1.50 18.29 0.31 file_compare - 1.45 18.59 0.30 129261 0.00 0.01 send_file_entry - 1.23 18.84 0.26 2557585 0.00 0.00 f_name - 1.11 19.07 0.23 1483750 0.00 0.00 u_strcmp - 1.11 19.30 0.23 118129 0.00 0.00 writefd_unbuffered - 0.92 19.50 0.19 1085011 0.00 0.00 writefd - 0.43 19.59 0.09 156987 0.00 0.00 read_timeout - 0.43 19.68 0.09 129261 0.00 0.00 clean_fname - 0.39 19.75 0.08 32887 0.00 0.38 matched - 0.34 19.82 0.07 1 70.00 16293.92 send_files - 0.29 19.89 0.06 129260 0.00 0.00 make_file - 0.29 19.95 0.06 75430 0.00 0.00 read_unbuffered - - - -mdfour could perhaps be made faster: - -/* NOTE: This code makes no attempt to be fast! */ - -There might be an optimized version somewhere that we can borrow. diff --git a/rsync3.txt b/rsync3.txt deleted file mode 100644 index e21f19f6..00000000 --- a/rsync3.txt +++ /dev/null @@ -1,467 +0,0 @@ --*- indented-text -*- - -Notes towards a new version of rsync -Martin Pool , September 2001. - - -Good things about the current implementation: - - - Widely known and adopted. - - - Fast/efficient, especially for moderately small sets of files over - slow links (transoceanic or modem.) - - - Fairly reliable. - - - The choice of running over a plain TCP socket or tunneling over - ssh. - - - rsync operations are idempotent: you can always run the same - command twice to make sure it worked properly without any fear. - (Are there any exceptions?) - - - Small changes to files cause small deltas. - - - There is a way to evolve the protocol to some extent. - - - rdiff and rsync --write-batch allow generation of standalone patch - sets. rsync+ is pretty cheesy, though. xdelta seems cleaner. - - - Process triangle is creative, but seems to provoke OS bugs. - - - "Morning-after property": you don't need to know anything on the - local machine about the state of the remote machine, or about - transfers that have been done in the past. - - - You can easily push or pull simply by switching the order of - files. - - - The "modules" system has some neat features compared to - e.g. Apache's per-directory configuration. In particular, because - you can set a userid and chroot directory, there is strong - protection between different modules. I haven't seen any calls - for a more flexible system. - - -Bad things about the current implementation: - - - Persistent and hard-to-diagnose hang bugs remain - - - Protocol is sketchily documented, tied to this implementation, and - hard to modify/extend - - - Both the program and the protocol assume a single non-interactive - one-way transfer - - - A list of all files are held in memory for the entire transfer, - which cripples scalability to large file trees - - - Opening a new socket for every operation causes problems, - especially when running over SSH with password authentication. - - - Renamed files are not handled: the old file is removed, and the - new file created from scratch. - - - The versioning approach assumes that future versions of the - program know about all previous versions, and will do the right - thing. - - - People always get confused about ':' vs '::' - - - Error messages can be cryptic. - - - Default behaviour is not intuitive: in too many cases rsync will - happily do nothing. Perhaps -a should be the default? - - - People get confused by trailing slashes, though it's hard to think - of another reasonable way to make this necessary distinction - between a directory and its contents. - - -Protocol philosophy: - - *The* big difference between protocols like HTTP, FTP, and NFS is - that their fundamental operations are "read this file", "delete - this file", and "make this directory", whereas rsync is "make this - directory like this one". - - -Questionable features: - - These are neat, but not necessarily clean or worth preserving. - - - The remote rsync can be wrapped by some other program, such as in - tridge's rsync-mail scripts. The general feature of sending and - retrieving mail over rsync is good, but this is perhaps not the - right way to implement it. - - -Desirable features: - - These don't really require architectural changes; they're just - something to keep in mind. - - - Synchronize ACLs and extended attributes - - - Anonymous servers should be efficient - - - Code should be portable to non-UNIX systems - - - Should be possible to document the protocol in RFC form - - - --dry-run option - - - IPv6 support. Pretty straightforward. - - - Allow the basis and destination files to be different. For - example, you could use this when you have a CD-ROM and want to - download an updated image onto a hard drive. - - - Efficiently interrupt and restart a transfer. We can write a - checkpoint file that says where we're up to in the filesystem. - Alternatively, as long as transfers are idempotent, we can just - restart the whole thing. [NFSv4] - - - Scripting support. - - - Propagate atimes and do not modify them. This is very ugly on - Unix. It might be better to try to add O_NOATIME to kernels, and - call that. - - - Unicode. Probably just use UTF-8 for everything. - - - Open authentication system. Can we use PAM? Is SASL an adequate - mapping of PAM to the network, or useful in some other way? - - - Resume interrupted transfers without the --partial flag. We need - to leave the temporary file behind, and then know to use it. This - leaves a risk of large temporary files accumulating, which is not - good. Perhaps it should be off by default. - - - tcpwrappers support. Should be trivial; can already be done - through tcpd or inetd. - - - Socks support built in. It's not clear this is any better than - just linking against the socks library, though. - - - When run over SSH, invoke with predictable command-line arguments, - so that people can restrict what commands sshd will run. (Is this - really required?) - - - Comparison mode: give a list of which files are new, gone, or - different. Set return code depending on whether anything has - changed. - - - Internationalized messages (gettext?) - - - Optionally use real regexps rather than globs? - - - Show overall progress. Pretty hard to do, especially if we insist - on not scanning the directory tree up front. - - -Regression testing: - - - Support automatic testing. - - - Have hard internal timeouts against hangs. - - - Be deterministic. - - - Measure performance. - - -Hard links: - - At the moment, we can recreate hard links, but it's a bit - inefficient: it depends on holding a list of all files in the tree. - Every time we see a file with a linkcount >1, we need to search for - another known name that has the same (fsid,inum) tuple. We could do - that more efficiently by keeping a list of only files with - linkcount>1, and removing files from that list as all their names - become known. - - -Command-line options: - - We have rather a lot at the moment. We might get more if the tool - becomes more flexible. Do we need a .rc or configuration file? - That wouldn't really fit with its pattern of use: cp and tar don't - have them, though ssh does. - - -Scripting issues: - - - Perhaps support multiple scripting languages: candidates include - Perl, Python, Tcl, Scheme (guile?), sh, ... - - - Simply running a subprocess and looking at its stdout/exit code - might be sufficient, though it could also be pretty slow if it's - called often. - - - There are security issues about running remote code, at least if - it's not running in the users own account. So we can either - disallow it, or use some kind of sandbox system. - - - Python is a good language, but the syntax is not so good for - giving small fragments on the command line. - - - Tcl is broken Lisp. - - - Lots of sysadmins know Perl, though Perl can give some bizarre or - confusing errors. The built in stat operators and regexps might - be useful. - - - Sadly probably not enough people know Scheme. - - - sh is hard to embed. - - -Scripting hooks: - - - Whether to transfer a file - - - What basis file to use - - - Logging - - - Whether to allow transfers (for public servers) - - - Authentication - - - Locking - - - Cache - - - Generating backup path/name. - - - Post-processing of backups, e.g. to do compression. - - - After transfer, before replacement: so that we can spit out a diff - of what was changed, or kick off some kind of reconciliation - process. - - -VFS: - - Rather than talking straight to the filesystem, rsyncd talks through - an internal API. Samba has one. Is it useful? - - - Could be a tidy way to implement cached signatures. - - - Keep files compressed on disk? - - -Interactive interface: - - - Something like ncFTP, or integration into GNOME-vfs. Probably - hold a single socket connection open. - - - Can either call us as a separate process, or as a library. - - - The standalone process needs to produce output in a form easily - digestible by a calling program, like the --emacs feature some - have. Same goes for output: rpm outputs a series of hash symbols, - which are easier for a GUI to handle than "\r30% complete" - strings. - - - Yow! emacs support. (You could probably build that already, of - course.) I'd like to be able to write a simple script on a remote - machine that rsyncs it to my workstation, edits it there, then - pushes it back up. - - -Pie-in-the-sky features: - - These might have a severe impact on the protocol, and are not - clearly in our core requirements. It looks like in many of them - having scripting hooks will allow us - - - Transport over UDP multicast. The hard part is handling multiple - destinations which have different basis files. We can look at - multicast-TFTP for inspiration. - - - Conflict resolution. Possibly general scripting support will be - sufficient. - - - Integrate with locking. It's hard to see a good general solution, - because Unix systems have several locking mechanisms, and grabbing - the lock from programs that don't expect it could cause deadlocks, - timeouts, or other problems. Scripting support might help. - - - Replicate in place, rather than to a temporary file. This is - dangerous in the case of interruption, and it also means that the - delta can't refer to blocks that have already been overwritten. - On the other hand we could semi-trivially do this at first by - simply generating a delta with no copy instructions. - - - Replicate block devices. Most of the difficulties here are to do - with replication in place, though on some systems we will also - have to do I/O on block boundaries. - - - Peer to peer features. Flavour of the year. Can we think about - ways for clients to smoothly and voluntarily become servers for - content they receive? - - - Imagine a situation where the destination has a much faster link - to the cloud than the source. In this case, Mojo Nation downloads - interleaved blocks from several slower servers. The general - situation might be a way for a master rsync process to farm out - tasks to several subjobs. In this particular case they'd need - different sockets. This might be related to multicast. - - -Unlikely features: - - - Allow remote source and destination. If this can be cleanly - designed into the protocol, perhaps with the remote machine acting - as a kind of echo, then it's good. It's uncommon enough that we - don't want to shape the whole protocol around it, though. - - In fact, in a triangle of machines there are two possibilities: - all traffic passes from remote1 to remote2 through local, or local - just sets up the transfer and then remote1 talks to remote2. FTP - supports the second but it's not clearly good. There are some - security problems with being able to instruct one machine to open - a connection to another. - - -In favour of evolving the protocol: - - - Keeping compatibility with existing rsync servers will help with - adoption and testing. - - - We should at the very least be able to fall back to the new - protocol. - - - Error handling is not so good. - - -In favour of using a new protocol: - - - Maintaining compatibility might soak up development time that - would better go into improving a new protocol. - - - If we start from scratch, it can be documented as we go, and we - can avoid design decisions that make the protocol complex or - implementation-bound. - - -Error handling: - - - Errors should come back reliably, and be clearly associated with - the particular file that caused the problem. - - - Some errors ought to cause the whole transfer to abort; some are - just warnings. If any errors have occurred, then rsync ought to - return an error. - - -Concurrency: - - - We want to keep the CPU, filesystem, and network as full as - possible as much of the time as possible. - - - We can do nonblocking network IO, but not so for disk. - - - It makes sense to on the destination be generating signatures and - applying patches at the same time. - - - Can structure this with nonblocking, threads, separate processes, - etc. - - -Uses: - - - Mirroring software distributions: - - - Synchronizing laptop and desktop - - - NFS filesystem migration/replication. See - http://www.ietf.org/proceedings/00jul/00july-133.htm#P24510_1276764 - - - Sync with PDA - - - Network backup systems - - - CVS filemover - - -Conflict resolution: - - - Requires application-specific knowledge. We want to provide - policy, rather than mechanism. - - - Possibly allowing two-way migration across a single connection - would be useful. - - -Moved files: - - - There's no trivial way to detect renamed files, especially if they - move between directories. - - - If we had a picture of the remote directory from last time on - either machine, then the inode numbers might give us a hint about - files which may have been renamed. - - - Files that are renamed and not modified can be detected by - examining the directory listing, looking for files with the same - size/date as the origin. - - -Filesystem migration: - - NFSv4 probably wants to migrate file locks, but that's not really - our problem. - - -Atomic updates: - - The NFSv4 working group wants atomic migration. Most of the - responsibility for this lies on the NFS server or OS. - - If migrating a whole tree, then we could do a nearly-atomic rename - at the end. This ties in to having separate basis and destination - files. - - There's no way in Unix to replace a whole set of files atomically. - However, if we get them all onto the destination machine and then do - the updates quickly it would greatly reduce the window. - - -Scalability: - - We should aim to work well on machines in use in a year or two. - That probably means transfers of many millions of files in one - batch, and gigabytes or terabytes of data. - - For argument's sake: at the low end, we want to sync ten files for a - total of 10kb across a 1kB/s link. At the high end, we want to sync - 1e9 files for 1TB of data across a 1GB/s link. - - On the whole CPU usage is not normally a limiting factor, if only - because running over SSH burns a lot of cycles on encryption. - - Perhaps have resource throttling without relying on rlimit. - - -Streaming: - - A big attraction of rsync is that there are few round-trip delays: - basically only one to get started, and then everything is - pipelined. This is a problem with FTP, and NFS (at least up to - v3). NFSv4 can pipeline operations, but building on that is - probably a bit complicated. - - -Related work: - - - mirror.pl - - - ProFTPd - - - Apache - - - BitTorrent -- p2p mirroring - http://bitconjurer.org/BitTorrent/ diff --git a/rsyncsh.txt b/rsyncsh.txt deleted file mode 100644 index 93932dc7..00000000 --- a/rsyncsh.txt +++ /dev/null @@ -1,26 +0,0 @@ -rsyncsh -Copyright (C) 2001 by Martin Pool - -This is a quick hack to build an interactive shell around rsync, the -same way we have the ftp, lftp and ncftp programs for the FTP -protocol. The key application for this is connecting to a public -rsync server, such as rsync.kernel.org, change down through and list -directories, and finally pull down the file you want. - -rsync is somewhat ill-at-ease as an interactive operation, since every -network connection is used to carry out exactly one operation. rsync -kind of "forks across the network" passing the options and filenames -to operate upon, and the connection is closed when the transfer is -complete. (This might be fixed in the future, either by adapting the -current protocol to allow chained operations over a single socket, or -by writing a new protocol that better supports interactive use.) - -So, rsyncsh runs a new rsync command and opens a new socket for every -(network-based) command you type. - -This has two consequences. Firstly, there is more command latency -than is really desirable. More seriously, if the connection cannot be -done automatically, because for example it uses SSH with a password, -then you will need to enter the password every time. We might even -fix this in the future, though, by having a way to automatically feed -the password to SSH if it's entered once.