A paranoid’s guide to backing up a working folder

Oops time

Leanpub supports multiple storage engines and a private GitHub repository is probably the safest way to the backing up your working folder. I chose Dropbox as I didn’t envision anything wrong with the automatic synchronization mechanism.

While working on my book, I accidentally managed to wipe-out half of my diagrams and all changes were instantly synchronized by Dropbox. The free-of-charge Dropbox account doesn’t offer folder-level versioning, so deleted files are simply gone. Fortunately, IntelliJ IDEA Local History saved the day and the diagrams were properly restored.

Backing up

Incidents are inevitable, so a disaster recovery plan should be a top priority from the very beginning.

One of the first option is to simply archive a copy of the working folder and store it in a different location.
As simple as it may be, this approach has some major drawbacks:

  • A lot of disk space is wasted, even if just a bunch of files have changed
  • Detecting changes requires some external tool

Disk space is not really a problem when using an external hard drive. For remote storages, a delta copying mechanism is more suitable.

Although I’m using a Windows machine, I happen to use Cygwin extensively. Even if it comes with tons of Unix utilities, some Kernel-related tools can’t be easily implemented on Windows. Without inotify, the watchman utility is out of picture.

A better alternative is to follow the revision control tools approach. With this in mind, I turned my working folder into a local Git repository. Even if the repository is not mirrored on a remote machine, I can still take advantage of the version control mechanism. Git provides ways to detect pending changes and the repository can be copied on multiple locations (addressing the single point of failure issue).

My current solution looks like this:

#!/bin/sh

git_lock=./.git/index.lock

if [ -f $git_lock ];
then
   echo "Git lock $git_lock exists, we must remove it."
   rm -f $git_lock
fi

git add .
status=`git status --untracked-files=no --porcelain`

if [ -z "$status" ]; then
    echo "No change detected!"
else
    echo "Changes detected, autosave and synchronize!"
    git commit -m "Autosave `date`"

    echo "Copy backup"
    epoch=`date +%s`
    backup_file=backup-$epoch.zip
    7z a -t7z /cygdrive/d/Vlad/Work/Books/high-performance-java-persistence/backups/$backup_file . -r

    echo "Rsync to OneDrive"
    rsync.exe -r . /cygdrive/c/Users/Vlad/OneDrive/Documente/high-performance-java-persistence/manuscript/
fi
  1. If the git process crashes while doing some action, the lock will prevent any further operation, so the lock needs to be removed first.
  2. All changes are staged
  3. With the Git status command, we check if there are pending changes. If nothing has changed, it makes no sense to waste resources on synchronizing working folders.
  4. All changes are committed automatically, therefore offering point-in-time recovering
  5. An archived copy goes to a separate external drive
  6. Using rsync, the Dropbox Git repository is mirrored to OneDrive as well

In the end, the working folder is backed by Dropbox and OneDrive and the version control is handled through Git. A full archive copy is also stored on an external drive (just in case).

Process automation

The only thing left to do is to automate the backup process. If cron is the de facto task scheduler for Linux systems, when using Cygwin, cron requires setting Administrative Privileges, a dedicated Windows Service and Security Policy adjustments. For simplicity sake, I chose a much simpler approach, using an infinite loop like the following:

#!/bin/sh

cd `dirname "$0"`

while [ 1 ]; do
    ./bkp.sh
    test $? -gt 128 && break;
    sleep 15
done

The backup script is called every 15 seconds, as long as the terminal doesn’t receive a SIGTERM signal from the user.

To have this script running hen the system boots up, a startup windows batch script must open Cygwin like this:

start /min C:\cygwin64\bin\mintty --hold never /bin/bash -l -e '/cygdrive/c/Users/Vlad/Dropbox/high-performance-java-persistence/manuscript/run-bkp.sh'

Conclusion

A backup strategy can save you from an irremediable loss of data. By mirroring the working folder across several servers, you can access your data even when a given external service is down. Keeping track of all changes makes recovery much easier, so a Git repository sounds very appealing.

Enter your email address to follow this blog and receive notifications of new posts by email.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s