Large file synchronization
Efficiently synchronize copies of a large sparse file locally. I deal with a large amount of large sparse files because of virtualization and other technologies. Because of their size, often a small number of blocks have data and, of these, a small number of blocks are changed and need to be backed up. Using a log-based (snapshotting) file system on USB 2 as a backup device, I only want to write blocks if absolutely necessary.
So what's the solution? Some simple custom code that
- checks that both file sizes are identical;
- verifies that some metadata has changed (i.e time stamp, permissions or owner/group);
- reads both files block-by-block;
- writes only changed blocks to the destination file, and
- updates any changed metadata.
Apache2 fails to start on boot (but subsequent manual starts work fine)
I migrated a virtual machine running the Apache 2 web server software on Debian wheezy GNU/Linux from Bytemark's legacy virtual machine infrastructure to their new BigV infrastructure. Staging this on my internal virtual infrastructure worked without a hitch. However, on the true infrastructure, Apache failed to start on boot with the following message. It would restart without issue if I then logged on and started it manually.
MapReduce introduction (Linux Journal)
Linux Journal has published a simple to understand introduction to MapReduce using Hadoop. Definitely worth a read if you need an introduction.
Concurrency and synchronization in POSIX Bourne Shell (sh or bash)
It seems like ages ago now that I found my customer had a process that connected to hundreds of Oracle databases to run predefined SQL for health checks. These databases were hosted all over the world and the SQL could take up to fifteen minutes to complete for a single database (with huge amounts of TNS timeouts). The end result was a CSV file that was ultimately formatted into a spreadsheet to provide management information. It took about a day to obtain this final result.
I thought there was a better way.
Netbook operating system choice
The Economist has an interesting article on the decisions you need to make when choosing a netbook computer. It is suggested that you use the bundled tuned operating system based on the Linux kernel.