Linux Backups To Amazon S3

Over the past couple years, several cloud storage services have emerged and as one may suspect, Amazon’s S3 is a popular choice due to it’s feature-set and price. While I have had many backup schemes over the years from tape drives to removable hard drives, they have all been as “disaster-survivable” as the last time I not only performed the backup but physically secured the backup media at an alternate location. With the accessibility of cloud storage, I wanted to push my backups out to the cloud so that they would immediately be at an alternate site on redundant and highly-available storage.
 

Given that we’re talking about Linux, tried-and-true rsync was certainly capable of meeting my requirements … provided I could present a file-system or ssh destination to Linux for rsync to replicate to. This is where s3fs (http://code.google.com/p/s3fs/) comes in because it is a “Linux FUSE-based file system backed by Amazon S3”. I have used s3fs for a couple years, and while it has gone through cycles of development and support, it recently “thawed” and underwent some nice improvements to the extent that it now supports all the usual suspects: user/group ownership, permission, last modified date/time stamp, and representing data (notably folders/directories) in a manner that is compatible with other S3 clients.
 

Before we begin, if you do not yet have an Amazon Web Services account and an S3 Bucket, head on over to https://aws.amazon.com and sign-up. You may be wondering about costs … as of the writing of this (July 2013), I am backing-up about 30 GB of data to S3 (standard tier, multiple regions, not reduced redundancy) and it runs less than $5.00/month for storage and moderate data transfer in/out and about a couple cents (yes, literally $.03 or so) each time I run rsync to compare several thousand files. AWS also has a detailed cost estimation tool at http://calculator.s3.amazonaws.com/calc5.html. Finally, if you are paranoid about an “insanely high” bill hitting your credit card, you can setup a billing alert to notify you when a certain threshold is observed.
 

The first step is obtaining and installing s3fs. You can review the details on the product web site; however, it is traditional Linux: download tarball, extract, then run configure, make, and make install. I’ve tested it on various flavors of Ubuntu, most recently on Ubuntu 12.04 LTS and s3fs 1.71.
 

The second step is configuring s3fs. Because my backup job will run as root, I have opted for the system-wide configuration file /etc/passwd-s3fs (ownership root:root, permission 600). If you chose to run as non-root, you can instead reference the user-specific configuration file ~/.passwd-s3fs (secure ownership and permission accordingly). The contents of this configuration file are really straightforward and the s3fs site has detailed examples (understandably, I am not going to provide mine as an example here).
 

As far as mounting the S3 file system, you can check if it is it presented and mount if need be:

# Configuration
s3fsCmd='/usr/local/bin/s3fs'
s3Bucket='my_backup'
# no trailing slash on local mount path
localMount="/mnt/s3-${s3Bucket}"

#######################################################################
# Checks the S3 file sytem, mounting if need be
#######################################################################
function checkMount() {
  df -kh | grep "${localMount}" >/dev/null 2>/dev/null
  if [ ! $? -eq 0 ] ; then
    echo "Mounting ${localMount}"
    ${s3fsCmd} "${s3Bucket}" "${localMount}"
    df -kh | grep "${localMount}" >/dev/null 2>/dev/null
    if [ ! $? -eq 0 ] ; then
      echo "ERROR: Unable to mount S3 Bucket '${s3Bucket}' to file system '${localMount}' using '${s3fsCmd}'"
      exit 1
    fi
  fi
}

 

Once presented, rsync can be leveraged as one may anticipate (with some optimizations for S3 like using --inplace):

checkMount
rsync -ahvO --progress --inplace --delete "/data" "${localMount}/"

 

You can also consider doing something like running the rsync a maximum number of attempts and if a non-zero return code is encountered, unmounting the volume (unmount –fl "${localMount}"), sleeping for perhaps a minute or two (to allow time for pending writes/syncs to occur), and running checkMount prior to attempting rsync again.

Including Perl Dependencies Within Your Application

Perl can be great at many things, but when it comes to doing something “real” (parsing XML, compression, logging with log4perl, handling Unix signals) you ultimately want to leverage CPAN modules … and generally, these are not “already installed, out of the box” within Linux distributions like Red Hat.

You may consider including in some installation How To notes on downloading modules from CPAN, and installing them … but, then you would have to worry about version conflicts on top of the hassle of possibly supporting people whom simply did not know how to do that. You are probably already delivering your application in a distribution package like RPM to simplify installation … so, why would you want to possibly complicate it again?

There is one way to navigate this which boils-down to including your dependencies in your package, within some application-specific library path for example. This is actually quite straight-forward:

  1. Download the CPAN modules you need and install them into something like /my/app/lib/
  2. Expand the include path within your code with:
    push(@INC, '/my/app/lib');

Now, what happens when you want to include binary library objects like *.so files? These, having been compiled for a system, are architecture-specific and usually Perl version-specific. We can extend our basic solution to support this as well by including architecture and version specific library paths. For example, we may have

  • /my/app/lib/perl58-linux-i386
  • /my/app/lib/perl58-linux-x86_64
  • /my/app/lib/perl510-linux-i386
  • /my/app/lib/perl510-linux-x86_64

The required step of downloading and installing the modules from CPAN is still a good start. This step now needs to be performed for each variation of Perl version and architecture we intend to support. So, you may have to do this on several OS and architecture combinations if you plan on supporting multiples (and save-off the resulting artifacts within source control so you can incorporate them into your build). Then, the code needs to be enhanced as well:

# Determine runtime version
# $] is a raw version string where examples are:
#     5.008008 for 5.8.8
#     5.010001 for 5.10.1
# So, major version is before the "."
#   then minor is the next 3
#   then patch the last 3
my $perlVersion = "$]";
if ($perlVersion =~ /^([\d]+)\.([\d]{3})([\d]{3})$/) {
  # quick hack of adding zero to the string to "convert" something
  #   like 008 to 8 and 010 to 10
  my $majorVersion = $1 + 0;
  my $minorVersion = $2 + 0;
  my $patchVersion = $3 + 0;
  # for our version and architecture specific lib path, we will
  #   use major and minor (example: 58 or 510)
  $perlVersion = "$majorVersion" . "$minorVersion"
} else {
  # Unexpected version format
  die "FATAL ERROR: Unknown Perl version found ('$perlVersion')\n";
}

# Determine if 32-bit or 64-bit
my $is64 = 'false';
foreach my $incPath (@INC) {
  if ($incPath =~ /64/) {
    $is64 = 'true';
    last;
  }
}

# Runtime-specific path
my $runtimeLibPath = "/my/app/lib/perl" . $perlVersion . "-linux-";
if ($is64 eq 'true') {
  $runtimeLibPath .= "x86_64";
} else {
  $runtimeLibPath .= "i386";
}

@ Add it to our include path
push(@INC, $runtimeLibPath);

Now that you see the pattern, you can expand this for other platforms beyond Linux if you wish also.