A paranoid’s guide to backing up a working folder
Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!
Leanpub supports multiple storage engines and a private GitHub repository is probably the safest way to the backing up your working folder. I chose Dropbox as I didn’t envision anything wrong with the automatic synchronization mechanism.
While working on my book, I accidentally managed to wipe-out half of my diagrams and all changes were instantly synchronized by Dropbox. The free-of-charge Dropbox account doesn’t offer folder-level versioning, so deleted files are simply gone. Fortunately, IntelliJ IDEA Local History saved the day and the diagrams were properly restored.
Incidents are inevitable, so a disaster recovery plan should be a top priority from the very beginning.
One of the first options is to simply archive a copy of the working folder and store it in a different location.
As simple as it may be, this approach has some major drawbacks:
- A lot of disk space is wasted, even if just a bunch of files have changed
- Detecting changes requires some external tool
Disk space is not really a problem when using an external hard drive. For remote storages, a delta copying mechanism is more suitable.
Although I’m using a Windows machine, I happen to use Cygwin extensively. Even if it comes with tons of Unix utilities, some Kernel-related tools can’t be easily implemented on Windows. Without inotify, the watchman utility is out of picture.
A better alternative is to follow the revision control tools approach. With this in mind, I turned my working folder into a local Git repository. Even if the repository is not mirrored on a remote machine, I can still take advantage of the version control mechanism. Git provides ways to detect pending changes and the repository can be copied on multiple locations (addressing the single point of failure issue).
My current solution looks like this:
#!/bin/sh git_lock=./.git/index.lock if [ -f $git_lock ]; then echo "Git lock $git_lock exists, we must remove it." rm -f $git_lock fi git add . status=`git status --untracked-files=no --porcelain` if [ -z "$status" ]; then echo "No change detected!" else echo "Changes detected, autosave and synchronize!" git commit -m "Autosave `date`" echo "Copy backup" epoch=`date +%s` backup_file=backup-$epoch.zip 7z a -t7z /cygdrive/d/Vlad/Work/Books/high-performance-java-persistence/backups/$backup_file . -r echo "Rsync to OneDrive" rsync.exe -r . /cygdrive/c/Users/Vlad/OneDrive/Documente/high-performance-java-persistence/manuscript/ fi
- If the git process crashes while doing some action, the lock will prevent any further operation, so the lock needs to be removed first.
- All changes are staged.
- With the Git
statuscommand, we check if there are pending changes. If nothing has changed, it makes no sense to waste resources on synchronizing working folders.
- All changes are committed automatically, therefore offering point-in-time recovering.
- An archived copy goes to a separate external drive
rsync, the Dropbox Git repository is mirrored to OneDrive as well
In the end, the working folder is backed by Dropbox and OneDrive and the version control is handled through Git. A full archive copy is also stored on an external drive (just in case).
The only thing left to do is to automate the backup process. If cron is the de facto task scheduler for Linux systems, when using Cygwin, cron requires setting Administrative Privileges, a dedicated Windows Service and Security Policy adjustments. For simplicity sake, I chose a much simpler approach, using an infinite loop like the following:
#!/bin/sh cd `dirname "$0"` while [ 1 ]; do ./bkp.sh test $? -gt 128 && break; sleep 15 done
The backup script is called every 15 seconds, as long as the terminal doesn’t receive a SIGTERM signal from the user.
To have this script running hen the system boots up, a startup windows batch script must open Cygwin like this:
start /min C:\cygwin64\bin\mintty --hold never /bin/bash -l -e '/cygdrive/c/Users/Vlad/Dropbox/high-performance-java-persistence/manuscript/run-bkp.sh'
A backup strategy can save you from an irremediable loss of data. By mirroring the working folder across several servers, you can access your data even when a given external service is down. Keeping track of all changes makes recovery much easier, so a Git repository sounds very appealing.
Download free ebook sample
If you subscribe to my newsletter, you'll get:
- A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
- 3 chapters from my book, High-Performance Java Persistence,
- a 10% discount coupon for my book.