Over the past couple years, several cloud storage services have emerged and as one may suspect, Amazon’s S3 is a popular choice due to it’s feature-set and price. While I have had many backup schemes over the years from tape drives to removable hard drives, they have all been as “disaster-survivable” as the last time I not only performed the backup but physically secured the backup media at an alternate location. With the accessibility of cloud storage, I wanted to push my backups out to the cloud so that they would immediately be at an alternate site on redundant and highly-available storage.
Given that we’re talking about Linux, tried-and-true rsync was certainly capable of meeting my requirements … provided I could present a file-system or ssh destination to Linux for rsync to replicate to. This is where s3fs (http://code.google.com/p/s3fs/) comes in because it is a “Linux FUSE-based file system backed by Amazon S3”. I have used s3fs for a couple years, and while it has gone through cycles of development and support, it recently “thawed” and underwent some nice improvements to the extent that it now supports all the usual suspects: user/group ownership, permission, last modified date/time stamp, and representing data (notably folders/directories) in a manner that is compatible with other S3 clients.
Before we begin, if you do not yet have an Amazon Web Services account and an S3 Bucket, head on over to https://aws.amazon.com and sign-up. You may be wondering about costs … as of the writing of this (July 2013), I am backing-up about 30 GB of data to S3 (standard tier, multiple regions, not reduced redundancy) and it runs less than $5.00/month for storage and moderate data transfer in/out and about a couple cents (yes, literally $.03 or so) each time I run rsync to compare several thousand files. AWS also has a detailed cost estimation tool at http://calculator.s3.amazonaws.com/calc5.html. Finally, if you are paranoid about an “insanely high” bill hitting your credit card, you can setup a billing alert to notify you when a certain threshold is observed.
The first step is obtaining and installing s3fs. You can review the details on the product web site; however, it is traditional Linux: download tarball, extract, then run configure, make, and make install. I’ve tested it on various flavors of Ubuntu, most recently on Ubuntu 12.04 LTS and s3fs 1.71.
The second step is configuring s3fs. Because my backup job will run as root, I have opted for the system-wide configuration file /etc/passwd-s3fs (ownership root:root, permission 600). If you chose to run as non-root, you can instead reference the user-specific configuration file ~/.passwd-s3fs (secure ownership and permission accordingly). The contents of this configuration file are really straightforward and the s3fs site has detailed examples (understandably, I am not going to provide mine as an example here).
As far as mounting the S3 file system, you can check if it is it presented and mount if need be:
# Configuration s3fsCmd='/usr/local/bin/s3fs' s3Bucket='my_backup' # no trailing slash on local mount path localMount="/mnt/s3-${s3Bucket}" ####################################################################### # Checks the S3 file sytem, mounting if need be ####################################################################### function checkMount() { df -kh | grep "${localMount}" >/dev/null 2>/dev/null if [ ! $? -eq 0 ] ; then echo "Mounting ${localMount}" ${s3fsCmd} "${s3Bucket}" "${localMount}" df -kh | grep "${localMount}" >/dev/null 2>/dev/null if [ ! $? -eq 0 ] ; then echo "ERROR: Unable to mount S3 Bucket '${s3Bucket}' to file system '${localMount}' using '${s3fsCmd}'" exit 1 fi fi }
Once presented, rsync can be leveraged as one may anticipate (with some optimizations for S3 like using --inplace
):
checkMount rsync -ahvO --progress --inplace --delete "/data" "${localMount}/"
You can also consider doing something like running the rsync a maximum number of attempts and if a non-zero return code is encountered, unmounting the volume (unmount –fl "${localMount}"
), sleeping for perhaps a minute or two (to allow time for pending writes/syncs to occur), and running checkMount prior to attempting rsync again.