With the recent news that Crashplan were doing away with their inexpensive “Home” offering, I had reason to reconsider my choice of online backup backup provider. Since description of how I achieve satisfactory safety margins and some discussion of the options available seems to be of general interest, here's a version of my notes from the process. A more complete version (geared less toward discussion and more toward exposition) is also posted on my web site.


The canonical version of all of my data is stored on my home server, currently about 15 terabytes of raw disk (five individual disks) on btrfs RAID. This is backed up in two different places and has several other redundancies.

  • RAID gives robustness against failure of any single disk at a time. No data is lost from the live copy unless more than one disk is lost at any given time.
  • Using regular btrfs snapshots via Snapper, read-only copies of everything are available for the recent past. This allows easy recovery from accidental deletion or modification of files without requiring much additional storage.
  • Remote (internet-based) backups are continuously made to Crashplan's cloud service.
  • Offline backups are made to a USB hard disk (that I don't keep at home with the server) with btrfs send, allowing me to make incremental backups that don't require bidirectional communication between the backup source and destination. The offline disk is kept unplugged when not in use.

Between these, I'm protected from most plausible failure modes that could damage my data. For instance:
  • Lightning strike destroys the server and disks: offline and online backups remain available.
  • Ransomware encrypts everything on the server: offline backup cannot be affected, online backup keeps old versions of files.
  • Linux bug causes silent filesystem corruption and data loss: may be possible to recover from read-only snapshots on the offline disk, Crashplan should be unaffected
  • Probably others, more and less mundane


There are no shortage of services for storing data in "the cloud," though not all are suitable for my intended application of storing incremental backups. Though I've been using Crashplan for years now and been generally happy with the service, I did consider some alternatives when Crashplan announced that their "home" service would no longer be available:

  • AWS Glacier offers extremely low cost per gigabyte stored, but retrieving data from it is slow and expensive. It's good if you have large blobs that you want to store and retrieve as a unit (and do so very rarely), but a poor choice for incremental backups.
  • Google Cloud Storage (particularly in the nearline and coldline flavors) has pretty low cost-per-gigabyte, competitive with Glacier in the coldline flavor but with much lower costs to access data stored in it.

These two are priced such that it's difficult to tell what you'd actually end up paying if trying to do incremental backups, because they charge for storage, operations (reads and writes) and data transfer out. While you could make them work for this kind of application, they're not really designed for it.

  • C14 is similar to the above two pricing-wise, but it's not a commonly-used service. Data transfer and operations are free in the "intensive" flavor, which is convenient for estimation.
  • B2 is basically designed for storing backups and happens to be possible to use for other applications. Pricing ends up being reasonable, with flat charges per gigabyte stored and downloaded.
Each of these could be good choices. B2 in particular is supported by a wide variety of applications designed for making backups.

  • Tarsnap is a service targetted at savvy users. Doesn't support Windows and is significantly more expensive than most of the other options here though.
  • Backblaze is basically "B2 but using only their client." Flat subscription fee for "unlimited" storage, but their client is Windows/Mac only.
  • Crashplan used to be the same price as Backblaze for the same "unlimited" storage, but their "Small Business" offering which soon to be the only available option is twice the price. The client runs on all three major operating systems, though.


Services like Google Drive or Dropbox may be useful to some users, but they're not designed for this kind of use case so I did not seriously consider any of them.

Despite the cost increase, I've been happy enough sticking with Crashplan- it's still cheaper than any of the per-gigabyte services given the size of my backups.


I do have other computers other than my home server that I want to back up, but that's relatively easy. For my desktop machine at home, I mostly just mount the server as a networked disk and operate directly on things stored on it. While I've previously used Crashplan's peer-to-peer functionality for laptops that aren't always on my home network, I've started using Duplicati to back up to my server via SSH instead (which works fine in conjunction with dynamic DNS so my server is connectable externally). My other servers back up to the home one with Borg, in basically the same way that other machines are using Duplicati.


So, backups: do you have them? If no, why not? Discuss and/or ask your questions.
I'm going to copy paste this from my blog. Don't read my blog. Most things in my blog are rants, this being one of the somewhat longer ones.

Where's the good backup software?

For *nix users, the answer is easy: rsync. For Macintosh users, the answer is even simpler: Time Machine ("time 'sheen"). For Windows, the answer is a convoluted mess of choices. And the problem is that none of those choices give everything you want.

Why can't you have everything? Here's all of the things a backup program needs:
  • Permissions. If you can't preserve your metadata, forget about making faithful backups. POSIX and Windows permissions are very different, but they still deserve the same love.
  • Resilience. The restore part of a program should never produce a fatal error, unless a backup has been corrupted beyond repair. If a part has been corrupted, ignore that part, notify the user that a corrupted portion was ignored (noting, of course, what the corrupted portion actually is), and continue with the restore process.
  • Compression. Many would argue that compression only makes the backup more difficult to restore, yields a minimal return in efficiency, etc. However, this can make a large difference when uploading from a personal home network to a storage service, where storage costs are billed by the gigabyte. I don't know about you, but $1 a month was more than my tax return this year.
  • Encryption. Everyone's got their tinfoil hats on, how about you?
  • Incremental backups. People are not going to do full backups every week. This is a waste of time, storage space, and energy, since most files would be redundantly stored.
  • Block-level. If you modified a 20 GB VHD file, are you going to copy that whole thing on every weekly incremental backup? No, you're going to copy the differences in blocks/parts of that file.
  • Archivable. It appears most people choose either image-based backups or file-based backups. I personally prefer at the file level, but this should not mean "copy millions of files and spew them on the target directory." The backup should be neatly organized in, say, 50 MB parts that can be easily uploaded to a cloud service as part of a future backup plan. Or, it can just be made as a monolithic 800 GB file. The former is workable by most consumer file services, while the latter is most convenient for more enterprise-oriented services like Amazon Glacier.
  • Resumable. Most backup programs hate it when you shut down your computer for the night. Yet none of them seem to understand that this is exactly what shadow copies are for. Even after shutting down the computer, shadow copies do not magically change. Yet the software goes, restarts your entire backup, and creates yet another useless shadow copy for the mere sake of not wanting to touch files in use and making the most up-to-date backup possible.
  • Snapshots. Let's say I don't want to restore my whole computer; I just want to see an old file and its version changes over time. Most backup programs will not let you do that, citing that it is "too complex." No, it's not. Track the files the software backed up, using a tiny database like SQLite. There, you can store checksums, file sizes, previous versions, and so on and so forth. The suffering ends there. The end user can view a snapshot of the computer at a certain point in time, or view the history of a specific file, perhaps with diffs (binary diffs if the backup software is user-friendly enough).
  • Low profile. What is CloudBerry Backup using 2.7 GB of memory for? Just flapping around? No! Decent backup software should use 100 MB of memory, tops. Leave the heavy RAM consumption to browsers, games, and servers.
  • Integration. This backup software should be robust enough to make anything either a source or a destination for backups, notwithstanding the limitations of each backup medium.
    • Least liquid: Offline local storage; Amazon Glacier; Google Coldline
    • Somewhat liquid: FTP (due to its slow transfer speed of many files and inability to perform multipart transfers); most consumer storage services
    • Most liquid: iSCSI SANs; high-availability storage services

  • Drive path-agnostic. A backup software should never, ever depend on drive letters to figure out backup sources and targets.
  • Predict drive failure. This goes somewhat beyond the scope of a backup software, but there should be at least some kind of periodic SMART monitor to inform and warn a user of a drive that is indicating signs of failure. Yes, put a big popup on the notification bar with a scary message like "Your drive might fail soon" or just outright "Your drive is failing." Show it to them the first three days, make it go away, and then show it to them the next week. Of course, the notification can be removed for a specific drive, but it will require them to read a message about possibly losing data on the failing drive, wait 5 seconds to close the dialog, and now they never have to see the dialog for that drive again.
  • Recognize cache folders. Here's what you need to do: just stick that CCleaner scanning stuff into your product. Make the default backup plan ignore whatever CCleaner would usually clean up. Caches can add up to be gigabytes of size, and many users do not even care about including them in their backups, because all they want are their programs and documents. However, there is that one company that might say, "no you can't ignore cache folders because we need a perfect file-level backup of the system tree." (My argument would be to use CloneZilla and do it at the image level - but fine.)
  • Import from other services. No, I don't care much about Acronis, Veeam, or other proprietary solutions. What I do care about, however, are the crappy Windows 7 Backup and Restore backups, dd "backups," and other image-level backup formats. Don't just import the backups: import file history, recompress them, preserve timestamps. Give them the full treatment, and put them neatly in the new backup format as if it really were an old backup.
  • Responsive (and responsible) backend. Big enterprise backup software uses a UI frontend, which merely communicates with the service backend. This is generally a good design. However, when the backend decides to quit, the UI frontend goes into limbo and does not respond to any commands, instead of providing a reasonable explanation to what is happening with the backend, while the backend does not attempt to halt whatever blocking operation that is taking too long. The gears just grind to a halt, and nothing can get done on either side.
  • Don't delete anything without asking. No, I don't even want an auto-purge functionality, and if you do, for the love of God, make it a manual operation. There is no reason to keep purging things constantly, unless you have a disk quota to work under - in that case, the software should determine what is best to purge (start with the big stuff, at the earliest backup) to meet the size requirement.
  • Only one backup mode. That backup mode better be good, and it should have a hybrid format.
  • Open-source format. The software itself may not be open-source, but you are essentially ensuring that someone out there can make a restore software that can always be compatible with the latest and greatest operating systems.
  • Bootable. Where are you going to make your restores from? A flash drive running Linux with an ncurses interface for your backup software, obviously. You could, of course, allow backups from that same bootable drive, in the case of an infected drive or as part of a standard computer emergency response procedure - but eh, that's really pushing it. Just restores will do fine.
  • Self-testable. Make sure the backups can actually restore to something.
  • Exportable. One day, your backup software will not be relevant anymore, so why bother locking in users to your format? Make it so that they can export full archives of their backups, with a CSV sheet explaining all of the contents of each archive.

At the end of the day, users just want their files safe and sound, so keep the software as close to the fundamentals as possible, and allow others to make tools around the backup software if additional functionality is needed.
Nice post Tari!

This has definitely made me look at my own backup solution since this went live on your blog. I took some of your musing and settled with B2. Presently, my NAS is a bit dated and doesn't support any viable outside back up solutions except for S3 - It does support DropBox, Box, Google, and OneDrive but the plans for unlimited are either nonexistent or not economical.

My current back up solution consists of a 5TB HDD plugged in via USB to the NAS and manual backups to Amazon Cloud Drive Unlimited. They are removing the Unlimited plan when my subscription expires in January but I'm not too concerned, as mostly everything I have backed up are my photos. I left the Family Prime Plan for my own Prime in Summer so I can qualify for Unlimited Photo Storage but now I'm 150GB over the limit for files that aren't photos.

I can back up to BackBlaze B2 for like $16 a month, I think it was. I'll be upgrading my NAS this year and it will support remote backups to Amazon and B2. As well I'll also remove the USB HDD as the local backup. Instead I'll probably buy five new 2TB HDDs to replace the five that are presently in my current NAS and use it as the backup NAS. I'll remove the local back from the circuit/room that my existing hardware is in, so that if something were to happen the back up isn't directly connected to the source let alone on the same wiring. Both NAS & HDD are behind a UPS and I'll likely buy a new UPS as well to protect the backup even though it'll probably just power on once a week.

Costs probably won't go up too much each month. I don't have any huge foreseeable size increase, I may take on a friends photo library as a semi-local backup but it's not huge. When I upgrade my NAS I'll definitely upgrade the storage as well and may use it as a backup point for my laptop.

The only thing I wish B2 offered was the HDD restoration. I can pay $189 to restore my personal backups from an HDD that BackBlaze ships me but I see no such option for B2. Which blows because I have less than 4TB of photos and it would have worked perfectly; it's less than the data overages I'd pay on Comcast downloading the backup and it's more than likely faster than downloading the back up from the web. When I start the process I'll certainly have to throttle the upload until it's done.
Alex wrote:
The only thing I wish B2 offered was the HDD restoration. ...
The site alludes to something like that, but doesn't offer details:
Quote:
Data by Mail: Use our Fireball to mail us data; get it back with Snapshots by Mail.
They have an actual page for Fireball, but nothing about Snapshots by Mail (and Google doesn't find anything either). Might be worth contacting them about it.
I did a search and found this blog post which introduced the service to restore from a shipped hard drive a couple years ago. I did a search on the page for B2 and found this comment:



napoleandynamiteyes.gif
Ladies and gentlemen, where did we fall on this? What has been your experience with the backup solutions you've picked, Tari? I want to perform more rigorous off-site backups of my (Windows and Ubuntu) laptop and critical files.
I upgraded my NAS not too long ago and will be purchasing a B2 plan soon.

On a semi-related note, is anyone aware of any back up services that connect to cloud services like DropBox, GDrive, Creative Cloud, and more? It'd be awesome to properly backup those up to a service like BackBlaze.
I'm still pretty happy with Crashplan. It's hard to beat the value with the size of my backup archive, and performance remains satisfactory. Borg continues to work fine for pushing images between my own machines.

Quote:
On a semi-related note, is anyone aware of any back up services that connect to cloud services like DropBox, GDrive, Creative Cloud, and more? It'd be awesome to properly backup those up to a service like BackBlaze.
rclone supports a bunch of storage services and can synchronize between any two of them or your local machine, which may be sufficient for your needs. You won't get proper incrementals out of it, but you can mirror the latest image from remote storage to a local directory or another remote and do whatever you like with it at that point.
Arq5 to B2 is clearly the best option. You noobs.

Arq5 is a storage agnostic backup tool that you pay for a single license and then can choose whatever storage provider you want. B2, Dropbox, Local Folders, whatever. It's pretty slick.
Tari wrote:
rclone supports a bunch of storage services and can synchronize between any two of them or your local machine, which may be sufficient for your needs. You won't get proper incrementals out of it, but you can mirror the latest image from remote storage to a local directory or another remote and do whatever you like with it at that point.


Incremental would be nice but I can make do. It'd be more of a "Oh, I deleted that file" than a "Oh, I want an older version of this file." I save a lot of stuff to the cloud so I can access it between machines, usually making use of my NAS.

On that note, I remembered I can sync my NAS with DropBox and GDrive - as I already do on the old one I'm replacing - so I can just include those shares in my B2 backup. I'll get this sync set up on the new NAS soon.

Thanks for jogging my memory there. Good Idea
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement