mirror of
https://github.com/RsyncProject/rsync.git
synced 2026-01-30 09:42:04 -05:00
Merge ChangeSet@1.10: Documentation about flist scalabilityTODO
This commit is contained in:
23
TODO
23
TODO
@@ -41,8 +41,8 @@ Performance
|
||||
network access as much as we could.
|
||||
|
||||
We need to be careful of duplicate names getting into the file list.
|
||||
See clean_flist. This could happen if multiple arguments include
|
||||
the same file. Bad.
|
||||
See clean_flist(). This could happen if multiple arguments include
|
||||
the same file. Bad.
|
||||
|
||||
I think duplicates are only a problem if they're both flowing
|
||||
through the pipeline at the same time. For example we might have
|
||||
@@ -58,6 +58,25 @@ Performance
|
||||
|
||||
We could have a hash table.
|
||||
|
||||
The root of the problem is that we do not want more than one file
|
||||
list entry referring to the same file. At first glance there are
|
||||
several ways this could happen: symlinks, hardlinks, and repeated
|
||||
names on the command line.
|
||||
|
||||
If names are repeated on the command line, they may be present in
|
||||
different forms, perhaps by traversing directory paths in different
|
||||
ways, traversing paths including symlinks. Also we need to allow
|
||||
for expansion of globs by rsync.
|
||||
|
||||
At the moment, clean_flist() requires having the entire file list in
|
||||
memory. Duplicate names are detected just by a string comparison.
|
||||
|
||||
We don't need to worry about hard links causing duplicates because
|
||||
files are never updated in place. Similarly for symlinks.
|
||||
|
||||
I think even if we're using a different symlink mode we don't need
|
||||
to worry.
|
||||
|
||||
Memory accounting
|
||||
|
||||
At exit, show how much memory was used for the file list, etc.
|
||||
|
||||
Reference in New Issue
Block a user