A paranoid’s guide to backing up a working folder

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Oops time

Leanpub supports multiple storage engines, and a private GitHub repository is probably the safest way to the backing up your working folder. I chose Dropbox as I didn’t envision anything wrong with the automatic synchronization mechanism.

While working on my book, I accidentally managed to wipe out half of my diagrams, and all changes were instantly synchronized by Dropbox. The free-of-charge Dropbox account doesn’t offer folder-level versioning, so deleted files are simply gone. Fortunately, IntelliJ IDEA Local History saved the day, and the diagrams were properly restored.

Backing up

Incidents are inevitable, so a disaster recovery plan should be a top priority from the very beginning.

One of the first options is to simply archive a copy of the working folder and store it in a different location.
As simple as it may be, this approach has some major drawbacks:

  • A lot of disk space is wasted, even if just a bunch of files have changed
  • Detecting changes requires some external tool

Disk space is not really a problem when using an external hard drive. For remote storages, a delta copying mechanism is more suitable.

Although I’m using a Windows machine, I happen to use Cygwin extensively. Even if it comes with tons of Unix utilities, some Kernel-related tools can’t be easily implemented on Windows. Without inotify, the watchman utility is out of picture.

A better alternative is to follow the revision control tools approach. With this in mind, I turned my working folder into a local Git repository. Even if the repository is not mirrored on a remote machine, I can still take advantage of the version control mechanism. Git provides ways to detect pending changes, and the repository can be copied on multiple locations (addressing the single point of failure issue).

My current solution looks like this:

#!/bin/sh

git_lock=./.git/index.lock

if [ -f $git_lock ];
then
   echo "Git lock $git_lock exists, we must remove it."
   rm -f $git_lock
fi

git add .
status=`git status --untracked-files=no --porcelain`

if [ -z "$status" ]; then
    echo "No change detected!"
else
    echo "Changes detected, autosave and synchronize!"
    git commit -m "Autosave `date`"

    echo "Copy backup"
    epoch=`date +%s`
    backup_file=backup-$epoch.zip
    7z a -t7z /cygdrive/d/Vlad/Work/Books/high-performance-java-persistence/backups/$backup_file . -r

    echo "Rsync to OneDrive"
    rsync.exe -r . /cygdrive/c/Users/Vlad/OneDrive/Documente/high-performance-java-persistence/manuscript/
fi
  1. If the git process crashes while doing some action, the lock will prevent any further operation, so the lock needs to be removed first.
  2. All changes are staged.
  3. With the Git status command, we check if there are pending changes. If nothing has changed, it makes no sense to waste resources on synchronizing working folders.
  4. All changes are committed automatically, therefore offering point-in-time recovery.
  5. An archived copy goes to a separate external drive
  6. Using rsync, the Dropbox Git repository is mirrored to OneDrive as well

In the end, the working folder is backed by Dropbox and OneDrive, and the version control is handled through Git. A full archive copy is also stored on an external drive (just in case).

Process automation

The only thing left to do is to automate the backup process. If cron is the de facto task scheduler for Linux systems, when using Cygwin, cron requires setting Administrative Privileges, a dedicated Windows Service, and Security Policy adjustments. For simplicity’s sake, I chose a much simpler approach, using an infinite loop like the following:

#!/bin/sh

cd `dirname "$0"`

while [ 1 ]; do
    ./bkp.sh
    test $? -gt 128 && break;
    sleep 15
done

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Seize the deal! 40% discount. Seize the deal! 40% discount. Seize the deal! 40% discount.

The backup script is called every 15 seconds, as long as the terminal doesn’t receive a SIGTERM signal from the user.

To have this script running when the system boots up, a startup windows batch script must open Cygwin like this:

start /min C:\cygwin64\bin\mintty --hold never /bin/bash -l -e '/cygdrive/c/Users/Vlad/Dropbox/high-performance-java-persistence/manuscript/run-bkp.sh'

Conclusion

A backup strategy can save you from an irremediable loss of data. By mirroring the working folder across several servers, you can access your data even when a given external service is down. Keeping track of all changes makes recovery much easier, so a Git repository sounds very appealing.

Transactions and Concurrency Control eBook

2 Comments on “A paranoid’s guide to backing up a working folder

  1. Thanks for this.

    I happened across it while Googling something pretty unrelated (although Cygwin was involved). Serendipity!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.