Efficiently synchronise copies of a large sparse file locally. I deal with a large amount of large sparse files because of virtualisation and other technologies. Because of their size, often a small number of blocks have data and, of these, a small number of blocks are changed and need to be backed up. Using a log-based (snapshotting) file system on USB 2 as a backup device, I only want to write blocks if absolutely necessary.
So what's the solution? Some simple custom code that
- checks that both file sizes are identical;
- verifies that some metadata has changed (i.e time stamp, permissions or owner/group);
- reads both files block-by-block;
- writes only changed blocks to the destination file, and
- updates any changed metadata.