Backing Up All The Things

Having a backup of your data is important, and for me it’s taken several different forms over the years — morphing as my needs have changed, as I’ve gotten better at doing backups, and as my curiosity has compelled me.

For various reasons that will become clear, I’ve iterated through yet another backup system/strategy which I think would be useful to share.

The Backup System That Was

The most recently incarnation of my backup strategy was centered around CrashPlan and looked something like this:

Atlas is my NAS and where a bulk of the data I care about is located. It backs up its data to CrashPlan Cloud.

Andrew and Rachel are the laptops we have. I also care about that data and they also backup to CrashPlan Cloud. Additionally, they also backup to Atlas using CrashPlan’s handy peer-to-peer system.

Brother and Mom are extended family member’s laptops that just backup to CrashPlan Cloud

Fremont is the web server (decommissioned recently though), it used to backup to CrashPlan as well.

This all worked great because CrashPlan offered a (frankly) unbelievably good CrashPlan+ Family Plan deal that allowed up ten computers and “unlimited” data — which CrashPlan took to mean somewhere around 20TB of total backups ((“While there is no current limitation for CrashPlan Unlimited subscribers on the amount of User Data backed up to the Public Cloud, Code 42 reserves the right in the future, in its sole discretion, to set commercially reasonable data storage limits (i.e. 20 TB) on all CrashPlan+ Family accounts.” Source)) — for $150/year. In terms of pure data storage cost this was $0.000625/GB/month ((my actual usage was closer to 8TB, so my actual rate was ~$0.0015/GB/month…still an amazingly good deal)), which is an order of magnitude less than Amazon Glacier’s cost of $0.004/GB/month ((which also has additional costs associated with retrieval processing that could run up to near $2000 if you actually had to restore 20TB worth of data)).

And then one year ago CrashPlan announced:


we have shifted our business strategy to focus on the enterprise and small business segments. This means that over the next 14 months we will be exiting the consumer market and you must choose another option for data backup before your subscription expires.

To allow you time to transition to a new backup solution, we’ve extended your subscription (at no cost to you) by 60 days. Your new subscription expiration date is 09/28/2018.


Important Things In A Backup System


First a quick refresher on how to backup. Arguably the best method is the 3-2-1-bang strategy: “three total copies of your data of which two are local but on different mediums (read: devices), and at least one copy offsite.” Bang represents inevitable scenario where you have to use your backup.

This can be as simple as backing up your computer to two external hard drives — one you keep at home and backup to weekly and one you leave at a friends house and backup to monthly.

Of course, it can also be more complex.


Replacing CrashPlan was hard because it has so many features for its price point, especially:

  • Encryption
  • Snapshots
  • Deduplication
  • Incremental backup
  • Recentness

…these would become my core requirements, in addition to also needing to understand how the backup software works (because of this I strongly prefer open-source).

I also had additional considerations I needed to keep in mind:

  • How much data I needed to backup:
    • Atlas: While I have 12TB of usable space (of which I’m using 10TB), I only had about 7TB of data to backup.
    • My Laptop: < 1TB GB
    • Wife’s Laptop: < 0.250 TB
    • Extended family: <500 GB each
    • Fremont:  decommissioned  in 2017, but < 20 GB at the time
  • How recent I wanted the backups to be (put another way, how much time/effort was I willing to loose):
    • I was willing to lose up to one hour of data
  • What kind of disasters was I looking to mitigate:
    • Hyper localized incident (e.g. hard drive failure, stupidity, file corruption, theft, etc)
      • This could impact a single device
    • Localized incident (e.g. fire, burglary, etc)
      • This could impact all devices within a given structure ( < ~ 1000 m radius)
    • Regionalized incident (e.g. earthquake, flood, etc)
      • This could impact all devices in the region (~ 1000 km radius)
  • How much touch-time did I want to put in to maintain the system:
    •  As little as possible (< 10 hours/year)

The New Backup System

There’s no single key to the system and this is probably the way it should be. Instead, it’s a series of smaller, modular elements that work together and can be replaced as needed.

My biggest concern was cost, and the primary driver for cost was going to be where to store the backups.

Where to put the data?

I did look at off-the-shelf options and my first consideration was just staying with CrashPlan and moving to their Small Business plan, but at $120/device/year I was looking at $360/year just to backup Atlas, Andrew, and Rachel.

Carbonite, a CrashPlan competitor but also who CrashPlan has partnered with to transition their home users to, has a “Safe” plan for $72/device/year, but it was a non-starter because they don’t support Linux, have a 30 day limit on file restoration, and do silly things like not automatically backing up files over 4GB and not backing up video files.

Backblaze is The Wirecutter’s Best Pick comes in at $50/device/year for unlimited data with no weird file restrictions, but there’s some wonkiness about file permissions and time stamps, and it also only retains old file versions/deleted files for 30 days.

I decided I could live with Backblaze Backups to handle the off-site copies for the laptops, at least for now. I was back to the drawing board for Atlas though.

The most challenging part was how to create a cost-effective solution for highly-recent off-site data backup. I looked at various cloud storage options ((very expensive – on the order of $500 to $2500/year for 10 TB)), setting up a server at a friends house (high initial costs, hands-on maintenance would be challenging, not enough bandwidth), and using external hard drives (recentness would be too prolonged in backups).

I was dreading how much data I had as it looked like backing up to the cloud was going to be the only viable option, even if it was expensive.

In an attempt to reduce my overall amount of data hoarding, I looked at the different kinds of data I had and noticed that only a relatively small amount changed on a regular basis — 2.20% within the last year, and 4.70% within the last three years.

The majority ((95.30% had not been modified within the last three years)) was “archive data” that I still want to have immediate (read-only) access to, but was not going to change, either because they are the digital originals (e.g. DV, RAW, PDF) or other files I keep for historic reasons — by the way, if I’ve ever worked on a project for you and you want a copy because you lost yours there’s a good chance I still have it.

Since archive data wasn’t changing, recentness would not be an issue and I could easily store external hard drives offsite. The significantly smaller amount of active data I could now backup in the cloud for a reasonable cost.

Backblazes B2 has the lowest overall costs for cloud storage: $0.005/GB/month with a retrieval fee of $0.01/GB ((however there is also a trial program where they ship you a hard drive for free…you just pay return postage.)).

Assuming I’m only backing up the active data (~300GB) and I have a 20% data change rate over a year (i.e. 20% of the data will change over the year which I will also need to backup) results in roughly $21.60/year worth of costs. Combined with two external WD 8TB hard drives for rotating through off-site storage and the back-of-the-envelope calculations were now in the ballpark of just $85/year when amortized over five years.

How to put the data?

I looked at, tested, and eventually passed on several different programs:

  • borg/attic…requires server-side software
  • duplicity…does not deduplicate
  • Arq…does not have a Linux version
  • duplicacy…doesn’t support restoring files directly to a directory outside of the repository ((though the more I’ve though about it the more question if this would actually be a problem))

To be clear: these are all very good programs and in another scenario I would likely use one of them.

Also, deduplication was probably the biggest issue for me, not so much because I thought I had a lot of files that were identical (or even parts of files) — I don’t — but because I knew I was going to be re-organizing lots of files and when you move a file to a different folder the backup program (without deduplication capability) doesn’t know that it’s the same file ((it’s basically the same operation as making a copy of a file and then deleting the original version)).

I eventually settled on Duplicati — not to be confused with duplicity or duplicacy — because it ticks all right boxes for me:

  • open source (with a good track record and actively maintained)
  • client side (e.g. does not require a server-side software)
  • incremental
  • block-level deduplication
  • snapshots
  • deletion
  • supports B2 and local storage destinations
  • multiple retention policies
  • encryption (including ability to use asymmetric keys with GPG!)

Fortunately, OpenMediaVault (OMV) supports Duplicati through the OMVExtras plugin, so installing and managing it was very easy.

The default settings appear to be pretty good and I didn’t change anything except for:

Adding SSL encryption for the web-based interface

Duplicati uses a web-based interface ((you can also use the CLI)) that is only designed to be used on the local computer — it’s not designed to be run on a server and have then access the GUI remotely through a browser. Because it was only designed to be accessed from localhost, it sends passwords in the clear, which is a concern but one that has already been filed as an issue and can be mitigated with using HTTPS.

Unfortunately, the OMV Duplicati plugin doesn’t support enabling HTTPS as one of its options.

Fortunately, I’m working on a patch to fix that:

Somewhat frustratingly, Duplicati requires using the PKCS 12 certificate format. Thus I did have to repackage Atlas’ SSL key:

openssl pkcs12 -export -out certificate.pfx -inkey private_key.key -in server_certifcate.crt -certfile CAChain.crt

Asymmetric keys

Normally Duplicati uses symmetric keys. However, when doing some testing with duplicity I was turned on to the idea of using asymmetric keys.

If you generated the GPG key on your server then you’re all set. However, if you generated them elsewhere you’ll need to move over to the server and then import them:

gpg --import private.key
gpg --edit-key {KEY} trust quit
# enter 5<RETURN>
# enter y<RETURN>

Once you have your GPG key on the server you can then configure Duplicati to use them. This is not intuitive but has been documented:

--gpg-encryption-switches=--recipient ""

Note: the recipient can either be an email address (e.g. or it can be a GPG Key ID (e.g. 9C7F1D46).

The last piece of the puzzle was how to manage my local backups for the laptops. I’m currently using Arq and TimeMachine to make nightly backups to Atlas on a trial basis.

Final Result

The resulting setup actually ends up being very similar to what I had with CrashPlan, with the exception of adding two rotating external drives which brings me into compliance with the “3 total copies” rule — something that was lacking.

Each external hard drive will spend a year off-site (as the off-site copy) and then a year on-site where it will serve as the “second” copy of the data (first is the “live” version, second is the on-site backup, and third is the the off-site backup).

Overall, this system should be usable for at the least the next five years — at least in terms of data capacity and wear/tear. Total costs should be under $285/year. However, I’m going to work on getting that down even more over the next year by looking at alternatives to the relatively high per-device cost for Backblaze Backup which only makes sense if a device is backing up close to 1TB of data — which I’m not.

Update: Edits based on feedback

I Bought a 3D Printer

Guys! I bought a 3D printer! It hasn’t even arrived yet, but I already feel like I should have done this ages ago! I ended up going with the
Wanhao i3 v2.1 Duplicator. It’s an upgraded version of the v2.0, which is effectively the same model that MonoPrice rebrands and sells as the Maker Select 3D Printer v2.

All around it seems to hit the sweet spot between price and capability. For me, the big selling points are:

  • Sufficient large build envelope: 200 mm x 200 mm x 180 mm
  • Sufficient build resolution: 0.1 mm, but can go down to 0.05 mm!
  • Multiple-material filament capabilities
  • Good community support
  • Easy to make your improvements/repairs

I had to pay a bit of a premium since I’m in the UK, but I think it will be worth it. Printer arrives tomorrow, and I hope to have a report out soon thereafter.

You Can’t Always Get What You Want

Jeffrey Goldberg at Agilebits, who make 1Password, has a great primer on why law enforcement back doors are bad for security architecture. The entire article is worth a read, presents a solid yet easily understood technical discussion — but I think it really can be distilled down to this:


Just because something would be useful for law enforcement doesn’t mean that they should have it. There is no doubt that law enforcement would be able to catch more criminals if they weren’t bound by various rules. If they could search any place or anybody any time they wished (instead of being bound by various rules about when they can), they would clearly be able to solve and prevent more crimes. That is just one of many examples of where we deny to law enforcement tools that would obviously be useful to them.

Quite simply, non-tyrannical societies don’t give every power to law enforcement that law enforcement would find useful. Instead we make choices based on a whole complex array of factors. Obviously the value of some power is one factor that plays a role in such a decision, and so it is important to hear from law enforcement about what they would find useful. But that isn’t where the conversation ends, it is where it begins.

Whenever that conversation does takes place, it is essential that all the participants understand the nature of the technology: There are some things that we simply can’t do without deeply undermining the security of the systems that we all rely on to keep us safe.

Конструктор: Engineer of the People

This is one of those niche games that probably only applies to enginerds ((and those who like to dabble in such realms)), but if you — like me — are one of those people be prepared to lose yourself in this game as you deposit silicon and metal to make real life circuits.

Конструктор is Russian for designer or contractor.

Calendar Invitation Email Gone Awry

Here’s some more details on calendar email issue I noted late Monday.

Just after 10pm on Monday, I attempted to migrate my calendar from Google Calendar to Fastmail Calendar ((there’s a larger story about why, but that’s not important at the moment)).

I did this by exporting my existing calendar from Google (per and then re-importing it back into Fastmail using Apple’s Calendar App. During this re-importing process, it appears that the Fastmail system regenerated the event requests and emailed all the participants of the events; although I initially suspected Apple’s Calendar app.

My wife, who was sitting next to me, was the first to let me know something was awry when she received over 400 emails from me.

After aborting last nights attempt, I tried again to import the data again Tuesday morning by using FastMail’s “Subscribe to a public calendar” feature (, which should not have resulted in emails being sent but still did.

In total, 109 people were affected by this issue and up to 2904 emails were sent (1452 from each incident).

Graph of Emails Sent

The good news (if there is such a thing) is that 45% of those affected only received a single email (well, two emails), and 78% of those affected received less than 10 emails (20 emails across both incidents).

Unfortunately, emails were also sent to people even when I was not the original organizer of the event. This accounted for over half the emails that were sent.

I have opened a ticket with Fastmail (Calendar import emailing participants (Ticket Id: 479473)). Fastmail has been prompt and the issue is, in theory, resolved. However, in the future I plan on scrubbing the calendar file of email address to prevent this issue from occurring again.

For those curious, here’s how I extracted ((based on mosg’s answer on the number of those affected from the ICS file:

grep -Eiorh 'mailto:([[:alnum:]_.-]+@[[:alnum:]_.-]+?\.[[:alpha:].]{2,6})' "$@" basic.ics | sort | uniq -c | sort -r

Mea Culpa.

VMWare and USB 3

It took me a while to figure out why my external Seagate harddrive wasn’t working on Windows 7 and VMware Fusion 5. As it turns out, VMware Fusion 5 does not support USB 3.0 with Windows 7 ((you need Windows 8, per their features list “Microsoft Windows 8 required for USB 3 support”)).

What is not intuitive — and frankly doesn’t make sense — is that VMware Fusion 5 will not automatically revert to USB 2.0 to attempt to support it.

The solution to this is to run your USB 3.0 capable device through a USB 2.0 hub, such as an Apple Keyboard with Numeric Keypad.

See also:

Why You’re Doing Passwords Wrong

If you use passwords, there’s a good chance you’re doing them wrong and exposing yourself to unnecessary risk.

My intent is provide some basic information on how you can do passwords better ((Arguably, there is no one right way to do passwords)), suitable for grandma to use (no offense grandma), because there’s no reason that you can’t do passwords better.

Why We Have Passwords

In the beginning, the internet was a benevolent place. If I said I was fergbrain, everyone knew I was fergbrain. I didn’t need to prove I was fergbrain. Of course, that didn’t last long and so passwords were created to validate that I was, in fact, fergbrain.

Passwords are one of three ways in which someone can authenticate who they are:

  1. Password: something you know
  2. Token: something you have that can’t be duplicated (such as an RSA token or YubiKey)
  3. Biometric: something you are (such as a fingerprint or other biometric marker unique to you)

Back In The Day™, passwords were the de facto method of authentication because they were the easiest to implement and in many ways still are.

At the time, token-based methods were just on the verge of development with many of the technologies (such as public-key encryption) not even possible until the mid 1970’s. And once suitable encryption was more completely developed ((it’s one thing to prove the mathematics of something, it’s a whole other thing to release a suitable product)), it could not be legally deployed outside of the United States until 1996 (President Clinton signed Executive Order 13026).

Finally, biometric authentication was an expensive pipe dream ((and still sort of is)).

The point being: passwords where the method of choice; and as we know, it is quite difficult to change the path of something once it gets moving.

Having just one password is easy enough, especially if you use it often enough. But how many places do you need to use a password? Email, social media, work, banking, games, utilities…the list goes on.

It would be pretty hard to remember all those different passwords. So we do the only thing we believe is reasonable: we use the same password. Or maybe a couple of different passwords: one for bank stuff, another for social media, maybe a third one for email.

Why Passwords Can Be a Problem

Bad guys know that most people use the same username, email address, and password for multiple services. This creates a massive incentive for bad guys to try and get that information. If the bad guys can extract your information from one web site, it’s likely they can use your hacked data to get into your account at other web sites.

For bad guys, the most bang for the buck comes from attacking systems that store lots of usernames and passwords. And this is how things have gone. Over just the last two years Kickstarter, Adobe, LinkedIn, eHarmony,,, LivingSocial, and Yahoo have all been hacked and had passwords compromised. And those are just the big companies.

In my opinion, most people know they have bad passwords, but don’t know what to do about it. It’s likely your IT person at work ((or your son/grandson/nephew/cousin)) keeps telling you to make “more complex” passwords, but what does that mean? Does it even help? What are we to do about this? Can we do anything to keep ourselves safer?

How to do Passwords Better

There is no single best way to do passwords. The best way for any particular person is a compromise between security, cost, and ease of use.

There are several parts to doing passwords better:

Have Unique Passwords

If one web site is hacked, that should not compromise your data at another web site. Web sites generally identify you by your username (or email address) and password. You could have a different username for every single web site you use, but that would probably be more confusing (and could possible lead to personality disorder). Besides, having to explain to your friends why you go by TrogdorTheMagnificent on one site but TrogdorTheBold on another side would get tiring pretty quick.

For reasons which I hope are obvious, making your passwords unique is better than making your usernames unique. Unless you don’t want people to find you, then make both your username and password unique.

General Rule of Thumb

Passwords should be unique for each web site or service.

Why: If a unique passwords is compromised (e.g. someone hacked the site), the compromised password cannot be used to gain access to additional resources (i.e. other web sites)

If you’re asking yourself, “But how do I remember all those passwords?!” just hold your horses.

Choose better passwords

People suck…at picking good passwords.

If you choose your own passwords, here’s a little test:

  1. For the 1st character in your password, give yourself 4 points.
  2. For 2nd through 8th character in your password, give yourself 2 points for each character.
  3. For the 9th through 20th character in your password, give yourself 1.5 points.
  4. If you password has upper case, lower case, and numbers (or special characters), give yourself an additional 6 points.
  5. If your password does not contain any words from the dictionary, give yourself an additional 6 points.
  • If you score 44 points or more, you have a good password!
  • If you score between 21 and 44 points, your password sucks.
  • If you score 20 points or less, your password really sucks.

If my password was, for example, Ferguson86Gmail, I would only have 34.5 points:

  • F: 4 points
  • erguson: 2 points each, 14 points
  • 86gmail: 1.5 points each, 10.5 points
  • I have uppercase, lowercase, and a number: 6 points
  • “Ferguson” and “gmail” are both considered dictionary words, so I get no extra points

Instead choosing Ferguson86Gmail as my password, what if my password was Dywpac27Najunst? The password is still 15 characters long, it still has two capital letters, and it still has two numbers. However, since it’s randomly generated it would score 89.3 — over twice as many points as the password I choose.

What’s going on here?

When you make up your own password, such as Ferguson86Gmail, you’re not choosing it at random and thus your password will not have a uniform random distribution of information ((this is, in part, how predictive typing technologies such as SWYPE work)).

Passwords chosen by users probably roughly reflect the patterns and character frequency distributions of ordinary English text, and are chosen by users so that they can remember them. Experience teaches us that many users, left to choose their own passwords will choose passwords that are easily guessed and even fairly short dictionaries of a few thousand commonly chosen passwords, when they are compared to actual user chosen passwords, succeed in “cracking” a large share of those passwords. ((NIST Special Publication 800-63 Rev 1))

The “goodness” of a password is measured by randomness, which is usually referred to as bits of entropy (which I cleverly disguised as “points” in the above test) the reality of the situation is that humans suck at picking their own passwords.

More Entropy!

If more entropy leads to better passwords, let’s look at what leads to more bits of entropy in a password. The number of bits of entropy, H, in a randomly generated password (versus a password you picked) of length, L, is:


Where N is the number of characters possible. If you use only lowercase letters, N is 26. If you use lower and uppercase, N is 52. Adding numbers increases N to 62.

For example:

  • mougiasw is an eight-character all lowercase password that has log_{2}26^{8}=37.6 bits of entropy.
  • gLAviAco is an eight-character lowercase and uppercase password that has log_{2}52^{8}=45.6 bits of entropy
  • Pr96Regu is an eight-character lowercase, uppercase, and numeric password that has log_{2}62^{8}=47.6 bits of entropy.

Adding uppercase gets us 8 additional bits, but adding numbers only nets us 2 additional bits of entropy. However, look what happens when we just add additional characters instead:

  • vubachukus is a ten-character all lowercase password that has log_{2}26^{10}=47.0 bits of entropy.
  • neprajubrawa is a twelve-character all lowercase password that has log_{2}26^{12}=56.4 bits of entropy.

For every additional character, you add log_{2}N bits of entropy. And unlike expanding the character set (e.g. using uppercase letters and/or numbers and/or special characters), you get more bits of entropy for every additional character you extend your password by…not just the first one.

The good news is that for randomly generated passwords, increasing the length by one character increases the difficulty to guess it by a factor of 32. The bad news is that for user selected passwords, every additional character added to make a password longer only quadruples the difficulty (adds roughly 2 bits of entropy which, based on NIST Special Publication 800-63 Rev 1 for the first 12 characters of a password).

More bits of entropy is better and I usually like to have at least 44 bits of entropy in my passwords. More is better.

Having to break out a calculator to determine the entropy of your passwords is not easy, and passwords should be easy. So let’s make it easy:

General Rule of Thumb<

Longer passwords (at least ten characters long) are better than more complex passwords.

Why: Adding complexity only provides a minimal and one time benefit. Adding length provides benefit for each character added and is likely to be easier to remember.

To anyone who understands information theory and security and is in an infuriating argument with someone who does not (possibly involving mixed case), I sincerely apologize.

Track Your Passwords

The inevitable reality of doing passwords better is that you need a way to keep track of them. There simply is no way a person can keep track of all the different passwords for all the different sites.

This leaves us with two other options:

Write Down Your Passwords

Yes. Writing your passwords down in a safe place is an acceptable method of keeping track of your passwords:

Simply, people can no longer remember passwords good enough to reliably defend against dictionary attacks, and are much more secure if they choose a password too complicated to remember and then write it down. We’re all good at securing small pieces of paper. I recommend that people write their passwords down on a small piece of paper, and keep it with their other valuable small pieces of paper: in their wallet.

Bruce Schneier, 2005

Writing down passwords can be appropriate because the most common attack vector is online (i.e. someone you’ve never even heard of trying to hack into your account from half-a-world away) with the following caveat: you make them more unique and more entropic.

By writing down passwords, you can increase their entropy (i.e. making them harder to guess) since you don’t have to memorize them. And since you don’t have to memorize them, you are more likely to create a better password. Additionally, if you write your passwords down, you don’t have to remember which password goes with which account so you can have a different password for each account: this also increases password uniqueness.

Encrypt Your Passwords

It would be reasonable to obfuscate your password list — instead of just writing them down in plaintext — so that if someone were to riffle through your wallet, they wouldn’t immediately recognize it as a password list or know exactly which passwords go with which accounts.

Instead of keeping them on a piece of paper, you could use a program to encrypt your passwords for you. There are a variety of ways to safely encrypt and store your passwords on your computer. I have been using 1Password for several years now and have been very impressed with their products ((as well as their technechal discusions on topics such as threats to confidentiality versus threats to availability)).

KeePass is another password manager I’ve used, however it does not have good support for OSX. There are other systems one could use, including Password Safe YubiKey.

I tend to be leery of web-based systems, such as LastPass and Passpack for two reasons:

  1. Having lots of sensitive data stored in a known location on the internet is ripe for an attack.
  2. The defense against such an attack is predicated on the notion that the company has implemented their encryption solution correctly!

General Rule of Thumb

You don’t have to remember your passwords.

Why: It’s better to have unique and more entropic passwords than it is to never write down your password.

That’s it! Hopefully you found this helpful, now go make your passwords better and report back!

19 February 2014: Added additional clarification about entropy of user-generated versus randomly-generated passwords.