Backing Up All The Things

Having a backup of your data is important, and for me it’s taken several different forms over the years — morphing as my needs have changed, as I’ve gotten better at doing backups, and as my curiosity has compelled me.

For various reasons that will become clear, I’ve iterated through yet another backup system/strategy which I think would be useful to share.

The Backup System That Was

The most recently incarnation of my backup strategy was centered around CrashPlan and looked something like this:

Atlas is my NAS and where a bulk of the data I care about is located. It backs up its data to CrashPlan Cloud.

Andrew and Rachel are the laptops we have. I also care about that data and they also backup to CrashPlan Cloud. Additionally, they also backup to Atlas using CrashPlan’s handy peer-to-peer system.

Brother and Mom are extended family member’s laptops that just backup to CrashPlan Cloud

Fremont is the web server (decommissioned recently though), it used to backup to CrashPlan as well.

This all worked great because CrashPlan offered a (frankly) unbelievably good CrashPlan+ Family Plan deal that allowed up ten computers and “unlimited” data — which CrashPlan took to mean somewhere around 20TB of total backups1 — for $150/year. In terms of pure data storage cost this was $0.000625/GB/month2, which is an order of magnitude less than Amazon Glacier’s cost of $0.004/GB/month3.

And then one year ago CrashPlan announced:

 

we have shifted our business strategy to focus on the enterprise and small business segments. This means that over the next 14 months we will be exiting the consumer market and you must choose another option for data backup before your subscription expires.


To allow you time to transition to a new backup solution, we’ve extended your subscription (at no cost to you) by 60 days. Your new subscription expiration date is 09/28/2018.

 

Important Things In A Backup System

3-2-1-Bang

First a quick refresher on how to backup. Arguably the best method is the 3-2-1-bang strategy: “three total copies of your data of which two are local but on different mediums (read: devices), and at least one copy offsite.” Bang represents inevitable scenario where you have to use your backup.

This can be as simple as backing up your computer to two external hard drives — one you keep at home and backup to weekly and one you leave at a friends house and backup to monthly.

Of course, it can also be more complex.

Considerations

Replacing CrashPlan was hard because it has so many features for its price point, especially:

  • Encryption
  • Snapshots
  • Deduplication
  • Incremental backup
  • Recentness

…these would become my core requirements, in addition to also needing to understand how the backup software works (because of this I strongly prefer open-source).

I also had additional considerations I needed to keep in mind:

  • How much data I needed to backup:
    • Atlas: While I have 12TB of usable space (of which I’m using 10TB), I only had about 7TB of data to backup.
    • My Laptop: < 1TB GB
    • Wife’s Laptop: < 0.250 TB
    • Extended family: <500 GB each
    • Fremont:  decommissioned  in 2017, but < 20 GB at the time
  • How recent I wanted the backups to be (put another way, how much time/effort was I willing to loose):
    • I was willing to lose up to one hour of data
  • What kind of disasters was I looking to mitigate:
    • Hyper localized incident (e.g. hard drive failure, stupidity, file corruption, theft, etc)
      • This could impact a single device
    • Localized incident (e.g. fire, burglary, etc)
      • This could impact all devices within a given structure ( < ~ 1000 m radius)
    • Regionalized incident (e.g. earthquake, flood, etc)
      • This could impact all devices in the region (~ 1000 km radius)
  • How much touch-time did I want to put in to maintain the system:
    •  As little as possible (< 10 hours/year)

The New Backup System

There’s no single key to the system and this is probably the way it should be. Instead, it’s a series of smaller, modular elements that work together and can be replaced as needed.

My biggest concern was cost, and the primary driver for cost was going to be where to store the backups.

Where to put the data?

I did look at off-the-shelf options and my first consideration was just staying with CrashPlan and moving to their Small Business plan, but at $120/device/year I was looking at $360/year just to backup Atlas, Andrew, and Rachel.

Carbonite, a CrashPlan competitor but also who CrashPlan has partnered with to transition their home users to, has a “Safe” plan for $72/device/year, but it was a non-starter because they don’t support Linux, have a 30 day limit on file restoration, and do silly things like not automatically backing up files over 4GB and not backing up video files.

Backblaze is The Wirecutter’s Best Pick comes in at $50/device/year for unlimited data with no weird file restrictions, but there’s some wonkiness about file permissions and time stamps, and it also only retains old file versions/deleted files for 30 days.

I decided I could live with Backblaze Backups to handle the off-site copies for the laptops, at least for now. I was back to the drawing board for Atlas though.

The most challenging part was how to create a cost-effective solution for highly-recent off-site data backup. I looked at various cloud storage options4, setting up a server at a friends house (high initial costs, hands-on maintenance would be challenging, not enough bandwidth), and using external hard drives (recentness would be too prolonged in backups).

I was dreading how much data I had as it looked like backing up to the cloud was going to be the only viable option, even if it was expensive.

In an attempt to reduce my overall amount of data hoarding, I looked at the different kinds of data I had and noticed that only a relatively small amount changed on a regular basis — 2.20% within the last year, and 4.70% within the last three years.

The majority5 was “archive data” that I still want to have immediate (read-only) access to, but was not going to change, either because they are the digital originals (e.g. DV, RAW, PDF) or other files I keep for historic reasons — by the way, if I’ve ever worked on a project for you and you want a copy because you lost yours there’s a good chance I still have it.

Since archive data wasn’t changing, recentness would not be an issue and I could easily store external hard drives offsite. The significantly smaller amount of active data I could now backup in the cloud for a reasonable cost.

Backblazes B2 has the lowest overall costs for cloud storage: $0.005/GB/month with a retrieval fee of $0.01/GB6.

Assuming I’m only backing up the active data (~300GB) and I have a 20% data change rate over a year (i.e. 20% of the data will change over the year which I will also need to backup) results in roughly $21.60/year worth of costs. Combined with two external WD 8TB hard drives for rotating through off-site storage and the back-of-the-envelope calculations were now in the ballpark of just $85/year when amortized over five years.

How to put the data?

I looked at, tested, and eventually passed on several different programs:

  • borg/attic…requires server-side software
  • duplicity…does not deduplicate
  • Arq…does not have a Linux version
  • duplicacy…doesn’t support restoring files directly to a directory outside of the repository7

To be clear: these are all very good programs and in another scenario I would likely use one of them.

Also, deduplication was probably the biggest issue for me, not so much because I thought I had a lot of files that were identical (or even parts of files) — I don’t — but because I knew I was going to be re-organizing lots of files and when you move a file to a different folder the backup program (without deduplication capability) doesn’t know that it’s the same file8.

I eventually settled on Duplicati — not to be confused with duplicity or duplicacy — because it ticks all right boxes for me:

  • open source (with a good track record and actively maintained)
  • client side (e.g. does not require a server-side software)
  • incremental
  • block-level deduplication
  • snapshots
  • deletion
  • supports B2 and local storage destinations
  • multiple retention policies
  • encryption (including ability to use asymmetric keys with GPG!)

Fortunately, OpenMediaVault (OMV) supports Duplicati through the OMVExtras plugin, so installing and managing it was very easy.

The default settings appear to be pretty good and I didn’t change anything except for:

Adding SSL encryption for the web-based interface

Duplicati uses a web-based interface9 that is only designed to be used on the local computer — it’s not designed to be run on a server and have then access the GUI remotely through a browser. Because it was only designed to be accessed from localhost, it sends passwords in the clear, which is a concern but one that has already been filed as an issue and can be mitigated with using HTTPS.

Unfortunately, the OMV Duplicati plugin doesn’t support enabling HTTPS as one of its options.

Fortunately, I’m working on a patch to fix that: https://github.com/fergbrain/openmediavault-duplicati/tree/ssl

Somewhat frustratingly, Duplicati requires using the PKCS 12 certificate format. Thus I did have to repackage Atlas’ SSL key:

openssl pkcs12 -export -out certificate.pfx -inkey private_key.key -in server_certifcate.crt -certfile CAChain.crt

Asymmetric keys

Normally Duplicati uses symmetric keys. However, when doing some testing with duplicity I was turned on to the idea of using asymmetric keys.

If you generated the GPG key on your server then you’re all set. However, if you generated them elsewhere you’ll need to move over to the server and then import them:

gpg --import private.key
gpg --edit-key {KEY} trust quit
# enter 5<RETURN>
# enter y<RETURN>

Once you have your GPG key on the server you can then configure Duplicati to use them. This is not intuitive but has been documented:

--encryption-module=gpg
--gpg-encryption-command=--encrypt
--gpg-encryption-switches=--recipient "andrew@example.com"
--gpg-decryption-command=--decrypt
--passphrase=unused

Note: the recipient can either be an email address (e.g. andrew@example.com) or it can be a GPG Key ID (e.g. 9C7F1D46).

The last piece of the puzzle was how to manage my local backups for the laptops. I’m currently using Arq and TimeMachine to make nightly backups to Atlas on a trial basis.

Final Result

The resulting setup actually ends up being very similar to what I had with CrashPlan, with the exception of adding two rotating external drives which brings me into compliance with the “3 total copies” rule — something that was lacking.

Each external hard drive will spend a year off-site (as the off-site copy) and then a year on-site where it will serve as the “second” copy of the data (first is the “live” version, second is the on-site backup, and third is the the off-site backup).

Overall, this system should be usable for at the least the next five years — at least in terms of data capacity and wear/tear. Total costs should be under $285/year. However, I’m going to work on getting that down even more over the next year by looking at alternatives to the relatively high per-device cost for Backblaze Backup which only makes sense if a device is backing up close to 1TB of data — which I’m not.

Update: Edits based on feedback


  1. “While there is no current limitation for CrashPlan Unlimited subscribers on the amount of User Data backed up to the Public Cloud, Code 42 reserves the right in the future, in its sole discretion, to set commercially reasonable data storage limits (i.e. 20 TB) on all CrashPlan+ Family accounts.” Source 

  2. my actual usage was closer to 8TB, so my actual rate was ~$0.0015/GB/month…still an amazingly good deal 

  3. which also has additional costs associated with retrieval processing that could run up to near $2000 if you actually had to restore 20TB worth of data 

  4. very expensive – on the order of $500 to $2500/year for 10 TB 

  5. 95.30% had not been modified within the last three years 

  6. however there is also a trial program where they ship you a hard drive for free…you just pay return postage. 

  7. though the more I’ve though about it the more question if this would actually be a problem 

  8. it’s basically the same operation as making a copy of a file and then deleting the original version 

  9. you can also use the CLI 

We’re Going To Need a Bigger Boat

Comcast Bandwidth

I somehow have managed to churn through over 3.5 terabytes of data on our internet connection over the last three months. While I’m glad Comcast’s “enforcement of the 250GB data consumption threshold is currently suspended”, I’m also scared to see what happens when they bring it back.

For what it’s worth, most (~90%) of that data is being pushed up to the cloud, likely to CrashPlan1.

I can’t wait for Fiber to The Home.


  1. or the NSA 

CrashPlan Keeps Crashing

I had an issue where CrashPlan kept on crashing on Mac (OSX 10.8.4). The CrashPlan launch menu bar would also fail to show, and even when I started it manually, it would only stay active for no more than about 60 seconds before crashing again.

I thought the issue was related to my version of Java, but upgrading to the latest version did not solve the issue.

I finally came across a helpdesk article from Pro Backup, which uses the CrashPlan engine:

In some cases a large file selection (>1TiB or 1 million files) can cause CrashPlan to crash. This can be noticed by continuous stopping and starting. You may also see the CrashPlan application and System Tray icon disappearing in the middle of a backup. Or the main CrashPlan program will run for about 30 seconds then close down with no error message.

The CrashPlan engine by default is limited to 512MB of main memory. The CrashPlan engine won’t use more memory than that, even if the CrashPlan engine needs more working memory and the computer has memory available. When the CrashPlan engine is running out of memory it crashes.

The issue was that as a heavy user, I backup more than 1TB of data. However, CrashPlan only allocates 512MB of memory in Java, which is insufficient for my large backup size.

  1. Stop the CrashPlan daemon:
    sudo launchctl unload /library/launchdaemons/com.crashplan.engine.plist
  2. Edit /library/launchdaemons/com.crashplan.engine.plist and change -Xmx512m to -Xmx1024m (or whatever is needed).
  3. Restart the CrashPlan daemon:
    sudo launchctl load /library/launchdaemons/com.crashplan.engine.plist

Problem solved!

Setting up OpenMediaVault

I hope everyone had a merry Thanksgiving! I spent some of my time setting up OpenMediaVault on an Acer Aspire 3610 that my Kolby gave me. It’s a pretty small machine, running an Intel Atom 330 1.6 GHz with 2 GB of RAM1, but I think it will be perfect for running my new NAS!

I’ve been dreaming of a NAS for some time, I’ve contemplated building one for at least two years, but could never justify the cost. What makes this different is that it doesn’t require any new outlay for equipment–I’m literally using what I have already!

I settled on OpenMediaVault because it was based on Debian, which I have more experience with2.

Here are some configuration tricks I need to use in order to get it to work How I Like Itâ„¢:

CrashPlan

I use CrashPlan on my laptop and it’s great3! If you don’t have a backup plan, you need to stop reading and get one now. Seriously. What would you do if your computer was stolen, or the hard drive went kaput, or you accidentally deleted something? I want to make sure that data my NAS is storing is just as safe as the data on my laptop.

There’s a guide over on the OpenMediaVault forums which basically echos the official CrashPlan Linux installation guide. Everything went okay until I tried to launch the desktop client and I couldn’t get X11 forwarding to work. I was eventually able to get a headless client to run from my laptop connected over a tunneled SSH, but I didn’t want to have to muck with the ui.properties files every time I wanted to check on things. I also wanted to be able to run both my client and the OMV client simultaneously. So I went back and did some more work on the X11 issue and here’s what I found needed to happen:

For the purposes of this, the IP address of the OpenMediaVault server is 172.16.131.130

Log in to your terminal via SSH:

ssh root@172.16.131.130

Note, if you get a ssh: connect to host 172.16.131.130 port 22: Connection refused, you need to enable SSH via the OMV online console first!

Prerequisites:

apt-get update
apt-get install xorg
echo "X11UseLocalHost no" >> /etc/ssh/sshd_config
/etc/init.d/ssh restart
apt-get install openjdk-6-jre

Install CrashPlan

cd /tmp
wget&nbsp;http://download.crashplan.com/installs/linux/install/CrashPlan/CrashPlan_3.4.1_Linux.tgz
tar -zxvf CrashPlan_3.4.1_Linux.tgz
cd CrashPlan-install
./install.sh

Answer yes to install installing Java, and answer all the other questions as required. If you just press return, the defaults will work just fine (and that’s what I used).

Log out and back in with X11 forwarding enabled, then run CrashPlan:

exit
ssh -X root@172.16.131.130
/usr/local/bin/CrashPlanDesktop

Give it a few seconds and you’ll see that familiar CrashPlan green.

Other notes:

  • It was helpful to debug with ssh -v
  • Looking through /usr/local/crashplan/log/ui_error.log was the key to understanding that the version of Java downloaded by CrashPlan was throwing errors (such as java.lang.UnsatisfiedLinkError: no swt-pi-gtk-3448 or swt-pi-gtk in swt.library.path) and needed to be updated.

HFS+

I have a couple of drives that are formated in HFS+ that I wanted to use without having to reformat them. As a side note, I think NTFS is probably the best bet for multisystem compatibility when the potential for dealing with files larger than 4GB. A comment on the OMV blog by norse laid the basic ground work, but I also had to pull some information from Raam Dev’s blog about configuring HFS for Debian.

Note: hfsprogs 332.25-9 and below has a bug where “[f]ormatting a partition as HFSPLUS does not provide the partition with a UUID.“. The work around is to boot to OS X and use the disk utility to format the partition, but this doesn’t work as well when you’re using a VM. The solution is to use the unstable 332.25-10 release of hfsprogs.

echo "deb http://ftp.debian.org/debian testing main contrib" >> /etc/apt/sources.list
apt-get update
apt-get install hfsplus hfsprogs hfsutils
sed -i '$ d' /etc/apt/sources.list
apt-get update

Then modify /var/www/openmediavault/rpcfilesystemmgmt.inc to be able to handle and mount HFS+ disks:

48c48
< 					  '"jfs","xfs","hfsplus"]},
---
> 					  '"jfs","xfs"]},
118c118
< 			  "umsdos", "vfat", "ufs", "reiserfs", "btrfs","hfsplus"))) {
---
> 			  "umsdos", "vfat", "ufs", "reiserfs", "btrfs"))) {
664,667d663
< 			break;
< 		case "hfsplus":
< 			$fsName = $fs->getUuid();
< 			$opts = "defaults,force"; //force,rw,exec,auto,users,uid=501,gid=20";

Finally, you may need to fsck your HFS+ disk if it’s being stubborn and mounting in read-only mode. With the partition unmount:

fsck.hfsplus -f /dev/sdaX

WiFi

Getting WiFi to work took me down a rabbit hole that ended up being unnecessary. First, verify which Wireless you card you have. The easiest way to do this is using lspci:

apt-get install pciutils
lspci | grep -i network

You should see a line like:
05:00.0 Network controller: RaLink RT3090 Wireless 802.11n 1T/1R PCIe

Installing the RT3090 is pretty straight forward:

echo "deb http://ftp.us.debian.org/debian squeeze main contrib non-free" >> /etc/apt/sources.list
apt-get update
aptitude install firmware-ralink wireless-tools wpasupplicant
sed -i '$ d' /etc/apt/sources.list
apt-get update

Edit /etc/network/interfaces to add the following:

auto wlan0
iface wlan0 inet dhcp
    wpa-ssid mynetworkname
    wpa-psk mysecretpassphrase

Note: the Debian guide recommends restricting the permissions of /etc/network/interfaces to prevent disclosure of the passphrase.

Then run:

ifup wlan0

That’s all I have for now, I’m working on some methods for backing up other data to the NAS (such as my web site and GMail) which I’ll write up later.


  1. we tried using it to stream the Olympics, but it wouldn’t even do that very well, but I think that was due to the nVidia chipset not playing well with Ubuntu 

  2. FreeNAS and NAS4Free both being based on FreeBSD 

  3. I previously use Mozy, but they started charging by the GB, which wasn’t going to work for me