当前位置:首页 >> 农林牧渔 >>

2009-FAST-Cumulus


Cumulus: Filesystem Backup to the Cloud
Michael Vrable, Stefan Savage, and Geoffrey M. Voelker Department of Computer Science and Engineering University of California, San Diego Abstract


In this paper we describe Cumulus, a system for ef?ciently implementing ?lesystem backups over the Internet. Cumulus is speci?cally designed under a thin cloud assumption—that the remote datacenter storing the backups does not provide any special backup services, but only provides a least-common-denominator storage interface (i.e., get and put of complete ?les). Cumulus aggregates data from small ?les for remote storage, and uses LFS-inspired segment cleaning to maintain storage ef?ciency. Cumulus also ef?ciently represents incremental changes, including edits to large ?les. While Cumulus can use virtually any storage service, we show that its ef?ciency is comparable to integrated approaches. uitous broadband now provides suf?cient bandwidth resources to of?oad the application. For small to mid-sized businesses, backup is rarely part of critical business processes and yet is suf?ciently complex to “get right” that it can consume signi?cant IT resources. Finally, larger enterprises bene?t from backing up to the cloud to provide a business continuity hedge against site disasters. However, to price cloud-based backup services attractively requires minimizing the capital costs of data center storage and the operational bandwidth costs of shipping the data there and back. To this end, most existing cloud-based backup services (e.g., Mozy, Carbonite, Symantec’s Protection Network) implement integrated solutions that include backup-speci?c software hosted on both the client and at the data center (usually using servers owned by the provider). In principle, this approach allows greater storage and bandwidth ef?ciency (server-side compression, cleaning, etc.) but also reduces portability—locking customers into a particular provider. In this paper we explore the other end of the design space—the thin cloud. We describe a cloud-based backup system, called Cumulus, designed around a minimal interface (put, get, delete, list) that is trivially portable to virtually any on-line storage service. Thus, we assume that any application logic is implemented solely by the client. In designing and evaluating this system we make several contributions. First, we show through simulation that, through careful design, it is possible to build ef?cient network backup on top of a generic storage service—competitive with integrated backup solutions, in spite of having no speci?c backup support in the underlying storage service. Second, we build a working prototype of this system using Amazon’s Simple Storage Service (S3) and demonstrate its effectiveness on real end-user traces. Finally, we describe how such systems can be tuned for cost instead of for bandwidth or storage, both using the Amazon pricing model as well as for a range of storage to network cost ratios. In the remainder of this paper, we ?rst describe prior work in backup and network-based backup, followed by a design overview of Cumulus and an in-depth description of its implementation. We then provide both simulation and experimental results of Cumulus performance, overhead, and cost in trace-driven scenarios. We conclude with a discussion of the implications of our work

1

Introduction

It has become increasingly popular to talk of “cloud computing” as the next infrastructure for hosting data and deploying software and services. Not surprisingly, there are a wide range of different architectures that fall under the umbrella of this vague-sounding term, ranging from highly integrated and focused (e.g., Software As A Service offerings such as Salesforce.com) to decomposed and abstract (e.g., utility computing such as Amazon’s EC2/S3). Towards the former end of the spectrum, complex logic is bundled together with abstract resources at a datacenter to provide a highly speci?c service— potentially offering greater performance and ef?ciency through integration, but also reducing ?exibility and increasing the cost to switch providers. At the other end of the spectrum, datacenter-based infrastructure providers offer minimal interfaces to very abstract resources (e.g., “store ?le”), making portability and provider switching easy, but potentially incurring additional overheads from the lack of server-side application integration. In this paper, we explore this thin-cloud vs. thickcloud trade-off in the context of a very simple application: ?lesystem backup. Backup is a particularly attractive application for outsourcing to the cloud because it is relatively simple, the growth of disk capacity relative to tape capacity has created an ef?ciency and cost in?ection point, and the cloud offers easy off-site storage, always a key concern for backup. For end users there are few backup solutions that are both trivial and reliable (especially against disasters such as ?re or ?ood), and ubiq-

USENIX Association

7th USENIX Conference on File and Storage Technologies

225

and how this research agenda might be further explored.

2

Related Work

Many traditional backup tools are designed to work well for tape backups. The dump, cpio, and tar [16] utilities are common on Unix systems and will write a full ?lesystem backup as a single stream of data to tape. These utilities may create a full backup of a ?lesystem, but also support incremental backups, which only contain ?les which have changed since a previous backup (either full or another incremental). Incremental backups are smaller and faster to create, but mostly useless without the backups on which they are based. Organizations may establish backup policies specifying at what granularity backups are made, and how long they are kept. These policies might then be implemented in various ways. For tape backups, long-term backups may be full backups so they stand alone; short-term daily backups may be incrementals for space ef?ciency. Tools such as AMANDA [2] build on dump or tar, automating the process of scheduling full and incremental backups as well as collecting backups from a network of computers to write to tape as a group. Cumulus supports ?exible policies for backup retention: an administrator does not have to select at the start how long to keep backups, but rather can delete any snapshot at any point. The falling cost of disk relative to tape makes backup to disk more attractive, especially since the random access permitted by disks enables new backup approaches. Many recent backup tools, including Cumulus, take advantage of this trend. Two approaches for comparing these systems are by the storage representation on disk, and by the interface between the client and the storage— while the disk could be directly attached to the client, often (especially with a desire to store backups remotely) communication will be over a network. Rsync [22] ef?ciently mirrors a ?lesystem across a network using a specialized network protocol to identify and transfer only those parts of ?les that have changed. Both the client and storage server must have rsync installed. Users typically want backups at multiple points in time, so rsnapshot [19] and other wrappers around rsync exist that will store multiple snapshots, each as a separate directory on the backup disk. Unmodi?ed ?les are hard-linked between the different snapshots, so storage is space-ef?cient and snapshots are easy to delete. The rdiff-backup [7] tool is similar to rsnapshot, but it changes the storage representation. The most recent snapshot is a mirror of the ?les, but the rsync algorithm creates compact deltas for reconstructing older versions—these reverse incrementals are more space ef?cient than full copies of ?les as in rsnapshot. Another modi?cation to the storage format at the server is to store snapshots in a content-addressable stor-

age system. Venti [17] uses hashes of block contents to address data blocks, rather than a block number on disk. Identical data between snapshots (or even within a snapshot) is automatically coalesced into a single copy on disk—giving the space bene?ts of incremental backups automatically. Data Domain [26] offers a similar but more recent and ef?cient product; in addition to performance improvements, it uses content-de?ned chunk boundaries so de-duplication can be performed even if data is offset by less than the block size. A limitation of these tools is that backup data must be stored unencrypted at the server, so the server must be trusted. Box Backup [21] modi?es the protocol and storage representation to allow the client to encrypt data before sending, while still supporting rsync-style ef?cient network transfers. Most of the previous tools use a specialized protocol to communicate between the client and the storage server. An alternate approach is to target a more generic interface, such as a network ?le system or an FTP-like protocol. Amazon S3 [3] offers an HTTP-like interface to storage. The operations supported are similar enough between these different protocols—get/put/delete on ?les and list on directories—that a client can easily support multiple protocols. Cumulus tries to be network-friendly like rsync-based tools, while using only a generic storage interface. Jungle Disk [13] can perform backups to Amazon S3. However, the design is quite different from that of Cumulus. Jungle Disk is ?rst a network ?lesystem with Amazon S3 as the backing store. Jungle Disk can also be used for backups, keeping copies of old versions of ?les instead of deleting them. But since it is optimized for random access it is less ef?cient than Cumulus for pure backup—features like aggregation in Cumulus can improve compression, but are at odds with ef?cient random access. Duplicity [8] aggregates ?les together before storage for better compression and to reduce per-?le storage costs at the server. Incremental backups use spaceef?cient rsync-style deltas to represent changes. However, because each incremental backup depends on the previous, space cannot be reclaimed from old snapshots without another full backup, with its associated large upload cost. Cumulus was inspired by duplicity, but avoids this problem of long dependency chains of snapshots. Brackup [9] has a design very similar to that of Cumulus. Both systems separate ?le data from metadata: each snapshot contains a separate copy of ?le metadata as of that snapshot, but ?le data is shared where possible. The split data/metadata design allows old snapshots to be easily deleted. Cumulus differs from Brackup primarily in that it places a greater emphasis on aggregating small ?les together for storage purposes, and adds a seg-

226

7th USENIX Conference on File and Storage Technologies

USENIX Association

Simple server

Sub-?le delta storage

Incremental forever

Encryption

3

Design

Multiple snapshots

In this section we present the design of our approach for making backups to a thin cloud remote storage service.

rsync rsnapshot rdiff-backup Box Backup Jungle Disk duplicity Brackup Cumulus

? ? ? ? ? ? ?

? ? ? ?

? ? ? ? ? ? ?

N/A

3.1

Storage Server Interface

? ? ? ?

? ? ? ? ?

We assume only a very narrow interface between a client generating a backup and a server responsible for storing the backup. The interface consists of four operations: Get: Given a pathname, retrieve the contents of a ?le from the server. Put: Store a complete ?le on the server with the given pathname. List: Get the names of ?les stored on the server. Delete: Remove the given ?le from the server, reclaiming its space. Note that all of these operations operate on entire ?les; we do not depend upon the ability to read or write arbitrary byte ranges within a ?le. Cumulus neither requires nor uses support for reading and setting ?le attributes such as permissions and timestamps. The interface is simple enough that it can be implemented on top of any number of protocols: FTP, SFTP, WebDAV, S3, or nearly any network ?le system. Since the only way to modify a ?le in this narrow interface is to upload it again in full, we adopt a writeonce storage model, in which a ?le is never modi?ed after it is ?rst stored, except to delete it to recover space. The write-once model provides convenient failure guarantees: since ?les are never modi?ed in place, a failed backup run cannot corrupt old snapshots. At worst, it will leave a partially-written snapshot which can garbage-collected. Because Cumulus does not modify ?les in place, we can keep snapshots at multiple points in time simply by not deleting the ?les that make up old snapshots.

Multiple snapshots: Can store multiple versions of ?les at different points in time; Simple server: Can back up almost anywhere; does not require special software at the server; Incremental forever: Only initial backup must be a full backup; Sub-?le delta storage: Ef?ciently represents small differences between ?les on storage; only relevant if storing multiple snapshots; Encryption: Data may be encrypted for privacy before sending to storage server.

Table 1: Comparison of features among selected tools that back up to networked storage.

ment cleaning mechanism to manage the inef?ciency introduced. Additionally, Cumulus tries to ef?ciently represent small changes to all types of large ?les and can share metadata where unchanged; both changes reduce the cost of incremental backups. Peer-to-peer systems may be used for storing backups. Pastiche [5] is one such system, and focuses on the problem of identifying and sharing data between different users. Pastiche uses content-based addressing for deduplication. But if sharing is not needed, Brackup and Cumulus could use peer-to-peer systems as well, simply treating it as another storage interface offering get and put operations. While other interfaces to storage may be available— Antiquity [24] for example provides a log append operation—a get/put interface likely still works best since it is simpler and a single put is cheaper than multiple appends to write the same data. Table 1 summarizes differences between some of the tools discussed above for backup to networked storage. In relation to existing systems, Cumulus is most similar to duplicity (without the need to occasionally re-upload a new full backup), and Brackup (with an improved scheme for incremental backups including rsync-style deltas, and improved reclamation of storage space).

3.2

Storage Segments

When storing a snapshot, Cumulus will often group data from many smaller ?les together into larger units called segments. Segments become the unit of storage on the server, with each segment stored as a single ?le. Filesystems typically contain many small ?les (both our traces described later and others, such as [1], support this observation). Aggregation of data produces larger ?les for storage at the server, which can be bene?cial to: Avoid inef?ciencies associated with many small ?les: Storage servers may dislike storing many small ?les for various reasons—higher metadata costs, wasted space from rounding up to block boundaries, and more seeks when reading. This preference may be expressed in the

USENIX Association

7th USENIX Conference on File and Storage Technologies

227

Snapshot?Descriptors
Date:?2008-01-01?12:00:00 Root:?A/0 Segments:?A?B Date:?2008-01-02?12:00:00 Root:?C/0 Segments:?B?C?

Segment?Store Segment?A Segment?C

?le data itself. Both metadata and data are broken apart into blocks, or objects, and these objects are then packed together into segments, compressed as a unit and optionally encrypted, and stored on the server. Each segment has a unique name—we use a randomly generated 128bit UUID so that segment names can be assigned without central coordination. Objects are numbered sequentially within a segment. Segments are internally structured as a TAR ?le, with each ?le in the archive corresponding to an object in the segment. Compression and encryption are provided by ?ltering the raw segment data through gzip, bzip2, gpg, or other similar external tools. A snapshot can be decoded by traversing the tree (or, in the case of sharing, DAG) of objects. The root object in the tree is the start of the metadata log. The metadata log need not be stored as a ?at object; it may contain pointers to objects containing other pieces of the metadata log. For example, if many ?les have not changed, then a single pointer to a portion of the metadata for an old snapshot may be written. The metadata objects eventually contain entries for individual ?les, with pointers to the ?le data as the leaves of the tree. The metadata log entry for each individual ?le speci?es properties such as modi?cation time, ownership, and ?le permissions, and can be extended to include additional information if needed. It includes a cryptographic hash so that ?le integrity can be veri?ed after a restore. Finally, it includes a list of pointers to objects containing the ?le data. Metadata is stored in a text, not binary, format to make it more transparent. Compression applied to the segments containing the metadata, however, makes the format space-ef?cient. The one piece of data in each snapshot not stored in a segment is a snapshot descriptor, which includes a timestamp and a pointer to the root object. Starting with the root object stored in the snapshot descriptor and traversing all pointers found, a list of all segments required by the snapshot can be constructed. Since segments may be shared between multiple snapshots, a garbage collection process deletes unreferenced segments when snapshots are removed. To simplify garbage-collection, each snapshot descriptor includes (though it is redundant) a summary of segments on which it depends. Pointers within the metadata log include cryptographic hashes so that the integrity of all data can be validated starting from the snapshot descriptor, which can be digitally signed. Additionally, Cumulus writes a summary ?le with checksums for all segments so that it can quickly check snapshots for errors without a full restore.

Segment?B

name:?file1 owner:?root data:?B/0 name:?file2 owner:?root data:?B/1?B/2

name:?file1 owner:?root data:?C/1 name:?file2 owner:?root data:?B/1?B/2

Figure 1: Simpli?ed schematic of the basic format for storing snapshots on a storage server. Two snapshots are shown, taken on successive days. Each snapshot contains two ?les. file1 changes between the two snapshots, but the data for file2 is shared between the snapshots. For simplicity in this ?gure, segments are given letters as names instead of the 128-bit UUIDs used in practice. cost model of the provider. Amazon S3, for example, has both a per-request and a per-byte cost when storing a ?le that encourages using ?les greater than 100 KB in size. Avoid costs in network protocols: Small ?les result in relatively larger protocol overhead, and may be slower over higher-latency connections. Pipelining (if supported) or parallel connections may help, but larger segments make these less necessary. We study one instance of this effect in more detail in Section 5.4.5. Take advantage of inter-?le redundancy with segment compression: Compression can be more effective when small ?les are grouped together. We examine this effect in Section 5.4.2. Provide additional privacy when encryption is used: Aggregation helps hide the size as well as contents of individual ?les. Finally, as discussed in Sections 3.4 and 4.3, changes to small parts of larger ?les can be ef?ciently represented by effectively breaking those ?les into smaller pieces during backup. For the reasons listed above, re-aggregating this data becomes even more important when sub-?le incremental backups are supported.

3.3

Snapshot Format

Figure 1 illustrates the basic format for backup snapshots. Cumulus snapshots logically consist of two parts: a metadata log which lists all the ?les backed up, and the

228

7th USENIX Conference on File and Storage Technologies

USENIX Association

3.4

Sub-File Incrementals

If only a small portion of a large ?le changes between snapshots, only the changed portion of the ?le should be stored. The design of the Cumulus format supports this. The contents of each ?le is speci?ed as a list of objects, so new snapshots can continue to point to old objects when data is unchanged. Additionally, pointers to objects can include byte ranges to allow portions of old objects to be reused even if some data has changed. We discuss how our implementation identi?es data that is unchanged in Section 4.3.

3.5

Segment Cleaning

When old snapshots are no longer needed, space is reclaimed by deleting the root snapshot descriptors for those snapshots, then garbage collecting unreachable segments. It may be, however, that some segments only contain a small fraction of useful data—the remainder of these segments, data from deleted snapshots, is now wasted space. This problem is similar to the problem of reclaiming space in the Log-Structured File System (LFS) [18]. There are two approaches that can be taken to segment cleaning given that multiple backup snapshots are involved. The ?rst, in-place cleaning, is most like the cleaning in LFS. It identi?es segments with wasted space and rewrites the segments to keep just the needed data. This mode of operation has several disadvantages, however. It violates the write-once storage model, in that the data on which a snapshot depends is changed after the snapshot is written. It requires detailed bookkeeping to determine precisely which data must be retained. Finally, it requires downloading and decrypting old segments—normal backups only require an encryption key, but cleaning needs the decryption key as well. The alternative to in-place cleaning is to never modify segments in old snapshots. Instead, Cumulus avoids referring to data in inef?cient old segments when creating a new snapshot, and writes new copies of that data if needed. This approach avoids the disadvantages listed earlier, but is less space-ef?cient. Dead space is not reclaimed until snapshots depending on the old segments are deleted. Additionally, until then data is stored redundantly since old and new snapshots refer to different copies of the same data. We analyzed both approaches to cleaning in simulation. We found that the cost bene?ts of in-place cleaning were not large enough to outweigh its disadvantages, and so our Cumulus prototype does not clean in place. The simplest policy for selecting segments to clean is to set a minimum segment utilization threshold, α, that triggers cleaning of a segment. We de?ne utilization as the fraction of bytes within the segment which are ref-

erenced by a current snapshot. For example, α = 0.8 will ensure that at least 80% of the bytes in segments are useful. Setting α = 0 disables segment cleaning altogether. Cleaning thresholds closer to 1 will decrease storage overhead for a single snapshot, but this more aggressive cleaning requires transferring more data. More complex policies are possible as well, such as a cost-bene?t evaluation that favors repacking long-lived segments. Cleaning may be informed by snapshot retention policies: cleaning is more bene?cial immediately before creating a long-term snapshot, and cleaning can also consider which other snapshots currently reference a segment. Finally, segment cleaning may reorganize data, such as by age, when segments are repacked. Though not currently implemented, Cumulus could use heuristics to group data by expected lifetime when a backup is ?rst written in an attempt to optimize segment data for later cleaning (as in systems such as WOLF [23]).

3.6

Restoring from Backup

Restoring data from previous backups may take several forms. A complete restore extracts all ?les as they were on a given date. A partial restore recovers one or a small number of ?les, as in recovering from an accidental deletion. As an enhancement to a partial restore, all available versions of a ?le or set of ?les can be listed. Cumulus is primarily optimized for the ?rst form of restore—recovering all ?les, such as in the event of the total loss of the original data. In this case, the restore process will look up the root snapshot descriptor at the date to restore, then download all segments referenced by that snapshot. Since segment cleaning seeks to avoid leaving much wasted space in the segments, the total amount of data downloaded should be only slightly larger than the size of the data to restore. For partial restores, Cumulus downloads those segments that contain metadata for the snapshot to locate the ?les requested, then locates each of the segments containing ?le data. This approach might require fetching many segments—for example, if restoring a directory whose ?les were added incrementally over many days— but will usually be quick. Cumulus is not optimized for tracking the history of individual ?les. The only way to determine the list of changes to a ?le or set of ?les is to download and process the metadata logs for all snapshots. However, a client could keep a database of this information to allow more ef?cient queries.

3.7

Limitations

Cumulus is not designed to replace all existing backup systems. As a result, there are situations in which other systems will do a better job.

USENIX Association

7th USENIX Conference on File and Storage Technologies

229

The approach embodied by Cumulus is for the client making a backup to do most of the work, and leave the backup itself almost entirely opaque to the server. This approach makes Cumulus portable to nearly any type of storage server. However, a specialized backup server could provide features such as automatically repacking backup data when deleting old snapshots, eliminating the overhead of client-side segment cleaning. Cumulus, as designed, does not offer coordination between multiple backup clients, and so does not offer features such as de-duplication between backups from different clients. While Cumulus could use convergent encryption [6] to allow de-duplication even when data is ?rst encrypted at the client, several issues prevent us from doing so. Convergent encryption would not work well with the aggregation in Cumulus. Additionally, server-side de-duplication is vulnerable to dictionary attacks to determine what data clients are storing, and storage accounting for billing purposes is more dif?cult. Finally, the design of Cumulus is predicated on the fact that backing up each ?le on the client to a separate ?le on the server may introduce too much overhead, and so Cumulus groups data together into segments. If it is known that the storage server and network protocol can ef?ciently deal with small ?les, however, then grouping data into segments adds unnecessary complexity and overhead. Other disk-to-disk backup programs may be a better match in this case.

on modi?cation time. Cumulus also uses it to deltaencode the metadata log for new snapshots. An SQLite database keeps a record of recent snapshots and all segments and objects stored in them. The table of objects includes an index by content hash to support data de-duplication. Enabling de-duplication leaves Cumulus vulnerable to corruption from a hash collision [11, 12], but, as with other systems, we judge the risk to be small. The hash algorithm (currently SHA-1) can be upgraded as weaknesses are found. In the event that client data must be recovered from backup, the content indices can be rebuilt from segment data as it is downloaded during the restore. Note that the Cumulus backup format does not specify the format of this information stored locally. It is entirely possible to create a new and very different implementation which nonetheless produces backups conforming to the structure described in Section 3.3 and readable by our Cumulus prototype.

4.2

Segment Cleaning

4

Implementation

We discuss details of the implementation of the Cumulus prototype in this section. Our implementation is relatively compact: only slightly over 3200 lines of C++ source code (as measured by SLOCCount [25]) implementing the core backup functionality, along with another roughly 1000 lines of Python for tasks such as restores, segment cleaning, and statistics gathering.

4.1

Local Client State

Each client stores on its local disk information about recent backups, primarily so that it can detect which ?les have changed and properly reuse data from previous snapshots. This information could be kept on the storage server. However, storing it locally reduces network bandwidth and improves access times. We do not need this information to recover data from a backup so its loss is not catastrophic, but this local state does enable various performance optimizations during backups. The client’s local state is divided into two parts: a local copy of the metadata log and an SQLite database [20] containing all other needed information. Cumulus uses the local copy of the previous metadata log to quickly detect and skip over unchanged ?les based

The Cumulus backup program, written in C++, does not directly implement segment cleaning heuristics. Instead, a separate Cumulus utility program, implemented in Python, controls cleaning. When writing a snapshot, Cumulus records in the local database a summary of all segments used by that snapshot and the fraction of the data in each segment that is actually referenced. The Cumulus utility program uses these summaries to identify segments which are poorlyutilized and marks the selected segments as “expired” in the local database. It also considers which snapshots refer to the segments, and how long those snapshots are likely to be kept, during cleaning. On subsequent backups, the Cumulus backup program re-uploads any data that is needed from expired segments. Since the database contains information about the age of all data blocks, segment data can be grouped by age when it is cleaned. If local client state is lost, this age information will be lost. When the local client state is rebuilt all data will appear to have the same age, so cleaning may not be optimal, but can still be done.

4.3

Sub-File Incrementals

As discussed in Section 3.4, the Cumulus backup format supports ef?ciently encoding differences between ?le versions. Our implementation detects changes by dividing ?les into small chunks in a content-sensitive manner (using Rabin ?ngerprints) and identifying chunks that are common, as in the Low-Bandwidth File System [15]. When a ?le is ?rst backed up, Cumulus divides it into blocks of about a megabyte in size which are stored individually in objects. In contrast, the chunks used for sub?le incrementals are quite a bit smaller: the target size is

230

7th USENIX Conference on File and Storage Technologies

USENIX Association

4 KB (though variable, with a 2 KB minimum and 64 KB maximum). Before storing each megabyte block, Cumulus computes a set of chunk signatures: it divides the data block into non-overlapping chunks and computes a (20-byte SHA-1 signature, 2-byte length) tuple for each chunk. The list of chunk signatures for each object is stored in the local database. These signatures consume 22 bytes for every roughly 4 KB of original data, so the signatures are about 0.5% of the size of the data to back up. Unlike LBFS, we do not create a global index of chunk hashes—to limit overhead, we do not attempt to ?nd common data between different ?les. When a ?le changes, we limit the search for unmodi?ed data to the chunks in the previous version of the ?le. Cumulus computes chunk signatures for the new ?le data, and matches with old chunks are written as a reference to the old data. New chunks are written out to a new object. However, Cumulus could be extended to perform global data deduplication while maintaining backup format compatibility.

segment is downloaded data from that segment is restored. For partial restores, only the necessary segments are downloaded. Currently, in the restore tool it is possible that a segment may be downloaded multiple times if blocks for some ?les are spread across many segments. However, this situation is merely an implementation issue and can be ?xed by restoring data for these ?les non-sequentially as it is downloaded. Finally, Cumulus includes a FUSE [10] interface that allows a collection of backup snapshots to be mounted as a virtual ?lesystem on Linux, thereby providing random access with standard ?lesystem tools. This interface relies on the fact that ?le metadata is stored in sorted order by ?lename, so a binary search can quickly locate any speci?ed ?le within the metadata log.

5

Evaluation

4.4

Segment Filtering and Storage

We use both trace-based simulation and a prototype implementation to evaluate the use of thin cloud services for remote backup. Our goal is to answer three high-level sets of questions: ? What is the penalty of using a thin cloud service with a very simple storage interface compared to a more sophisticated service? ? What are the monetary costs for using remote backup for two typical usage scenarios? How should remote backup strategies adapt to minimize monetary costs as the ratio of network and storage prices varies? ? How does our prototype implementation compare with other backup systems? What are the additional bene?ts (e.g., compression, sub-?le incrementals) and overheads (e.g., metadata) of an implementation not captured in simulation? What is the performance of using an online service like Amazon S3 for backup? The following evaluation sections answer these questions, beginning with a description of the trace workloads we use as inputs to the experiments.

The core Cumulus backup implementation is only capable of writing segments as uncompressed TAR ?les to local disk. Additional functionality is implemented by calling out to external scripts. When performing a backup, all segment data may be ?ltered through a speci?ed command before writing it. Specifying a program such as gzip can provide compression, or gpg can provide encryption. Similarly, network protocols are implemented by calling out to external scripts. Cumulus ?rst writes segments to a temporary directory, then calls an upload script to transfer them in the background while the main backup process continues. Slow uploads will eventually throttle the backup process so that the required temporary storage space is bounded. Upload scripts may be quite simple; a script for uploading to Amazon S3 is merely 12 lines long in Python using the boto [4] library.

4.5

Snapshot Restores

The Cumulus utility tool implements complete restore functionality. This tool can automatically decompress and extract objects from segments, and can ef?ciently extract just a subset of ?les from a snapshot. To reduce disk space requirements, the restore tool downloads segments as needed instead of all at once at the start, and can delete downloaded segments as it goes along. The restore tool downloads the snapshot descriptor ?rst, followed by the metadata. The backup tool segregates data and metadata into separate segments, so this phase does not download any ?le data. Then, ?le contents are restored—based on the metadata, as each

5.1

Trace Workloads

We use two traces as workloads to drive our evaluations. A ?leserver trace tracks all ?les stored on our research group ?leserver, and models the use of a cloud service for remote backup in an enterprise setting. A user trace is taken from the Cumulus backups of the home directory of one of the author’s personal computers, and models the use of remote backup in a home setting. The traces contain a daily record of the metadata of all ?les in each setting, including a hash of the ?le contents. The user

USENIX Association

7th USENIX Conference on File and Storage Technologies

231

Duration (days) Entries Files File Sizes Median Average Maximum Total Update Rates New data/day Changed data/day Total data/day

Fileserver 157 26673083 24344167 0.996 KB 153 KB 54.1 GB 3.47 TB 9.50 GB 805 MB 10.3 GB

User 223 122007 116426 4.4 KB 21.4 KB 169 MB 2.37 GB 10.3 MB 29.9 MB 40.2 MB

Table 2: Key statistics of the two traces used in the evaluations. File counts and sizes are for the last day in the trace. “Entries” is ?les plus directories, symlinks, etc. trace further includes complete backups of all ?le data, and enables evaluation of the effects of compression and sub-?le incrementals. Table 2 summarizes the key statistics of each trace.

Since remote backup services have per-?le charges, the simulator also tracks segment overhead as the number of segments created during the process of making backups. The simulator also models two snapshot scenarios. In the single snapshot scenario, the simulator maintains only one snapshot remotely and it deletes all previous snapshots. In the multiple snapshot scenario, the simulator retains snapshots according to a pre-determined backup schedule. In our experiments, we keep the most recent seven daily snapshots, with additional weekly snapshots retained going back farther in time so that a total of 12 snapshots are kept. This schedule emulates the backup policy an enterprise might employ. The simulator makes some simplifying assumptions that we explore later when evaluating our implementation. The simulator detects changes to ?les in the traces using a per-?le hash. Thus, the simulator cannot detect changes to only a portion of a ?le, and assumes that the entire ?le is changed. The simulator also does not model compression or metadata. We account for sub?le changes, compression, and metadata overhead when evaluating the prototype in Section 5.4. 5.2.2 Optimal Baseline

5.2

Remote Backup to a Thin Cloud

First we explore the overhead of using remote backup to a thin cloud service that has only a simple storage interface. We compare this thin service model to an “optimal” model representing more sophisticated backup systems. We use simulation for these experiments, and start by describing our simulator. We then de?ne our optimal baseline model and evaluate the overhead of using a simple interface relative to a more sophisticated system. 5.2.1 Cumulus Simulator The Cumulus simulator models the process of backing up collections of ?les to a remote backup service. It uses traces of daily records of ?le metadata to perform backups by determining which ?les have changed, aggregating changed ?le data into segments for storage on a remote service, and cleaning expired data as described in Section 3. We use a simulator, rather than our prototype, because a parameter sweep of the space of cleaning parameters on datasets as large as our traces is not feasible in a reasonable amount of time. The simulator tracks three overheads associated with performing backups. It tracks storage overhead, or the total number of bytes to store a set of snapshots computed as the sum of the size of each segment needed. Storage overhead includes both actual ?le data as well as wasted space within segments. It tracks network overhead, the total data that must be transferred over the network to accomplish a backup. On graphs, we show this overhead as a cumulative value: the total data transferred from the beginning of the simulation until the given day.

A simple storage interface for remote backup can incur an overhead penalty relative to more sophisticated approaches. To quantify the overhead of this approach, we use an idealized optimal backup as a basis of comparison. For our simulations, the optimal backup is one in which no more data is stored or transferred over the network than is needed. Since simulation is done at a ?le granularity, the optimal backup will transfer the entire contents of a ?le if any part changes. Optimal backup will, however, perform data de-duplication at a ?le level, storing only one copy if multiple ?les have the same hash value. In the optimal backup, no space is lost to fragmentation when deleting old snapshots. Cumulus could achieve this optimal performance in this simulation by storing each ?le in a separate segment—that is, to never bundle ?les together into larger segments. As discussed in Section 3.2 and as our simulation results show, though, there are good reasons to use segments with sizes larger than the average ?le. As an example of these costs and how we measure them, Figure 2(a) shows the optimal storage and upload overheads for daily backups of the 223 days of the user trace. In this simulation, only a single snapshot is retained each day. Storage grows slowly in proportion to the amount of data in a snapshot, and the cumulative network transfer grows linearly over time. Figure 2(b) shows the results of two simulations of Cumulus backing up the same data. The graph shows the overheads relative to optimal backup; a backup as good as optimal would have 0% relative overhead. These re-

232

7th USENIX Conference on File and Storage Technologies

USENIX Association

11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000

Overhead vs. Optimal (%)

Cumulative Transfers Snapshot Size

90 80 70 60 50 40 30 20 10 0 0

Storage (No Cleaning) Storage (Cleaning) Upload (Cleaning)

Size (MB)

0

50

100 Days

150

200

250

50

100 Days

150

200

250

Figure 2: (a) Storage and network overhead for an optimal backup of the ?les from the user trace. (b) Overheads with and without cleaning; segments are cleaned at 60% utilization. Only storage overheads are shown for the no-cleaning case since there is no network transfer overhead without cleaning. sults clearly demonstrate the need for cleaning when using a simple storage interface for backup. When segments are not cleaned (only deleting segments that by chance happen to be entirely no longer needed), wasted storage space grows quickly with time—by the end of the simulation at day 223, the size of a snapshot is nearly double the required size. In contrast, when segments are marked for cleaning at the 60% utilization threshold, storage overhead quickly stabilizes below 10%. The overhead in extra network transfers is similarly modest. 5.2.3 Cleaning Policies Storage and upload overheads improve with decreasing segment size, but at small segment sizes (< 1 MB) backups require very large numbers of segments and limit the bene?ts of aggregating ?le data (Section 3.2). As expected, increasing the cleaning threshold increases the network upload overhead. Storage overhead with multiple snapshots, however, has an optimum cleaning threshold value. Increasing the threshold initially decreases storage overhead, but high thresholds increase it again; we explore this behavior further below. Both the ?leserver and user workloads exhibit similar sensitivities to cleaning thresholds and segment sizes. The user workload has higher overheads relative to optimal due to smaller average ?les and more churn in the ?le data, but overall the overhead penalties remain low. Figures 3(a) and 4(a) show that there is a cleaning threshold that minimizes storage overheads. Increasing the cleaning threshold intuitively reduces storage overhead relative to optimal since the more aggressive cleaning at higher thresholds will delete wasted space in segments and thereby reduce storage requirements. Figure 5 explains why storage overhead increases again at higher cleaning thresholds. It shows three curves, the 16 MB segment size curve from Figure 3(a) and two curves that decompose the storage overhead into individual components (Section 3.5). One is overhead due to duplicate copies of data stored over time in the cleaning process; cleaning at lower thresholds reduces this component. The other is due to wasted space in segments which have not been cleaned; cleaning at higher thresholds reduces this component. A cleaning threshold near the middle, however, minimizes the sum of both of these overheads.

Cleaning is clearly necessary for ef?cient backup, but it is also parameterized by two metrics: the size of the segments used for aggregation, transfer, and storage (Section 3.2), and the threshold at which segments will be cleaned (Section 3.5). In our next set of experiments, we explore the parameter space to quantify the impact of these two metrics on backup performance. Figures 3 and 4 show the simulated overheads of backup with Cumulus using the ?leserver and user traces, respectively. The ?gures show both relative overheads to optimal backup (left y -axis) as well as the absolute overheads (right y -axis). We use the backup policy of multiple daily and weekly snapshots as described in Section 5.2.1. The ?gures show cleaning overhead for a range of cleaning thresholds and segment sizes. Each ?gure has three graphs corresponding to the three overheads of remote backup to an online service. Average daily storage shows the average storage requirements per day over the duration of the simulation; this value is the total storage needed for tracking multiple backup snapshots, not just the size of a single snapshot. Similarly, average daily upload is the average of the data transferred each day of the simulation, excluding the ?rst; we exclude the ?rst day since any backup approach must transfer the entire initial ?lesystem. Finally, average segments per day tracks the number of new segments uploaded each day to account for per-?le upload and storage costs.

5.3

Paying for Remote Backup

The evaluation in the previous section measured the overhead of Cumulus in terms of storage, network, and segment resource usage. Remote backup as a service, however, comes at a price. In this section, we calculate

USENIX Association

7th USENIX Conference on File and Storage Technologies

233

2
Overhead vs. Optimal (%)

Raw Size (TB)

1.5 1 0.5 0

3.82 3.81 3.8 3.79 3.78 3.77

6 5 4 3 2 1 0 0

10.4 10.3 10.2 10.1 10 9.9

New Segments Daily

Raw Size (GB/day)

3.83

Overhead vs. Optimal (%)

64 MB Segments 16 MB Segments 4 MB Segments 1 MB Segments

3.84

8 7

64 MB Segments 16 MB Segments 4 MB Segments 1 MB Segments

10.6 10.5

6000 5000 4000 3000 2000 1000 0 0 10 20 30 40 50 60 70

0

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

Cleaning Threshold

Cleaning Threshold

Target Segment Size (MB)

(a) Average daily storage

(b) Average daily upload

(c) Average segments per day

Figure 3: Overheads for backups in the ?leserver trace.
25
Overhead vs. Optimal (%) Overhead vs. Optimal (%)

New Segments Daily

Raw Size (GB)

30 25 20 15 10 5 0 0

15 10 5 0

3.1 3 2.9 2.8 2.7

48 46 44 42 40 38

Raw Size (MB/day)

20

16 MB Segments 4 MB Segments 1 MB Segments 512 KB Segments 128 KB Segments

3.3 3.2

40 35

16 MB Segments 4 MB Segments 1 MB Segments 512 KB Segments 128 KB Segments

52 50

180 160 140 120 100 80 60 40 20 0 0 2 4 6 8 10 12 14 16

0

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

Cleaning Threshold

Cleaning Threshold

Target Segment Size (MB)

(a) Average daily storage

(b) Average daily upload

(c) Average segments per day

Figure 4: Overheads for backups in the user trace.
0.9
Overhead vs. Optimal (%)

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

Total Overhead Duplicate Data Overhead Wasted Segment Space

Fileserver Initial upload Upload Storage User Initial upload Upload Storage

Amount 3563 GB 303 GB/month 3858 GB Amount 1.82 GB 1.11 GB/month 2.68 GB

Cost $356.30 $30.30/month $578.70/month Cost $0.27 $0.11/month $0.40/month

0.2

0.4 0.6 0.8 Cleaning Threshold

1

Figure 5: Detailed breakdown of storage overhead when using a 16 MB segment size for the ?leserver workload.

Table 3: Costs for backups in US dollars, if performed optimally, for the ?leserver and user traces using current prices for Amazon S3. Storage: $0.15 per GB · month Upload: $0.10 per GB Segment: $0.01 per 1000 ?les uploaded With this pricing model, the segment cost for uploading an empty ?le is equivalent to the upload cost for uploading approximately 100 KB of data, i.e., when uploading 100 KB ?les, half of the cost is for the bandwidth and half for the upload request itself. As the ?le size increases, the per-request component becomes an increasingly smaller part of the total cost. Neglecting for the moment the segment upload costs, Table 3 shows the monthly storage and upload costs for each of the two traces. Storage costs dominate ongoing costs. They account for about 95% and 78% of the

monetary costs for our two workload models, evaluate cleaning threshold and segment size in terms of costs instead of resource usage, and explore how cleaning should adapt to minimize costs as the ratio of network and storage prices varies. While similar, there are differences between this problem and the typical evaluation of cleaning policies for a typical log-structured ?le system: instead of a ?xed disk size and a goal to minimize I/O, we have no ?xed limits but want to minimize monetary cost. We use the prices for Amazon S3 as an initial point in the pricing space. As of January 2009, these prices are (in US dollars):

234

7th USENIX Conference on File and Storage Technologies

USENIX Association

Cost Increase vs. Optimal (%)

2 1.5 1 0.5 0

Estimated Optimal Cleaning Threshold

64 MB Segments 16 MB Segments 4 MB Segments 1 MB Segments

620 618
Cost ($)

1 0.8 0.6 0.4 0.2 0

Fileserver User

616 614 612 610

0

0.2 0.4 0.6 0.8 Cleaning Threshold

1

0.1

1 Storage / Network Cost Ratio

10

Figure 6: Costs in US dollars for backups in the ?leserver assuming Amazon S3 prices. Costs for the user trace differ in absolute values but are qualitatively similar. monthly costs for the ?leserver and user traces, respectively. Thus, changes to the storage ef?ciency will have a more substantial effect on total cost than changes in bandwidth ef?ciency. We also note that the absolute costs for the home backup scenario are very low, indicating that Amazon’s pricing model is potentially quite reasonable for consumers: even for home users with an order of magnitude more data to backup than our user workload, yearly ongoing costs are roughly US$50. Whereas Figure 3 explored the parameter space of cleaning thresholds and segment sizes in terms of resource overhead, Figure 6 shows results in terms of overall cost for backing up the ?leserver trace. These results show that using a simple storage interface for remote backup also incurs very low additional monetary cost than optimal backup, from 0.5–2% for the ?leserver trace depending on the parameters, and as low as about 5% in the user trace. When evaluated in terms of monetary costs, though, the choices of cleaning parameters change compared to the parameters in terms of resource usage. The cleaning threshold providing the minimum cost is smaller and less aggressive (threshold = 0.4) than in terms of resource usage (threshold = 0.6). However, since overhead is not overly sensitive to the cleaning threshold, Cumulus still provides good performance even if the cleaning threshold is not tuned optimally. Furthermore, in contrast to resource usage, decreasing segment size does not always decrease overall cost. At some point—in this case between 1–4 MB—decreasing segment size increases overall cost due to the per-?le pricing. We do not evaluate segment sizes less than 1 MB for the ?leserver trace since, by 1 MB, smaller segments are already a loss. The results for the user workload, although not shown, are qualitatively similar, with a segment size of 0.5 MB to 1 MB best. The pricing model of Amazon S3 is only one point in the pricing space. As a ?nal cost experiment, we ex-

Figure 7: How the optimal threshold for cleaning changes as the relative cost of storage vs. network varies. plore how cleaning should adapt to changes in the relative price of storage versus network. Figure 7 shows the optimal cleaning threshold for the ?leserver and user workloads as a function of the ratio of storage to network cost. The storage to network ratio measures the relative cost of storing a gigabyte of data for a month and uploading a gigabyte of data. Amazon S3 has a ratio of 1.5. In general, as the cost of storage increases, it becomes advantageous to clean more aggressively (the optimal cleaning threshold increases). The ideal threshold stabilizes around 0.5–0.6 when storage is at least ten times more expensive than network upload, since cleaning too aggressively will tend to increase storage costs.

5.4

Prototype Evaluation

In our ?nal set of experiments, we compare the overhead of the Cumulus prototype implementation with other backup systems. We also evaluate the sensitivity of compression on segment size, the overhead of metadata in the implementation, the performance of sub-?le incrementals and restores, and the time it takes to upload data to a remote service like Amazon S3. 5.4.1 System Comparisons First, we provide some results from running our Cumulus prototype and compare with two existing backup tools that also target Amazon S3: Jungle Disk and Brackup. We use the complete ?le contents included in the user trace to accurately measure the behavior of our full Cumulus prototype and other real backup systems. For each day in the ?rst three months of the user trace, we extract a full snapshot of all ?les, then back up these ?les with each of the backup tools. We compute the average cost, per month, broken down into storage, upload bandwidth, and operation count (?les created or modi?ed). We con?gured Cumulus to clean segments with less then 60% utilization on a weekly basis. We evaluate Brackup with two different settings. The ?rst uses the merge_files_under=1kB option to only aggregate ?les if those ?les are under 1 KB in size

USENIX Association

7th USENIX Conference on File and Storage Technologies

235

System Jungle Disk Brackup (default) Brackup (aggregated) Cumulus

Storage ≈ 2 GB $0.30 1.340 GB $0.201 1.353 GB $0.203 1.264 GB $0.190

Upload 1.26 GB $0.126 0.760 GB $0.076 0.713 GB $0.071 0.465 GB $0.047

Operations 30000 $0.30 9027 $0.090 1403 $0.014 419 $0.004

200 MB/month of the upload cost. The cost in Cumulus is lower since Cumulus can re-use metadata. Comparing storage requirements of Cumulus with the average size of a full backup with the venerable tar utility, both are within 1%: storage overhead in Cumulus is roughly balanced out by gains achieved from deduplication. Using duplicity as a proxy for near-optimal incremental backups, in a test with two months from the user trace Cumulus uploads only about 8% more data than is needed. Without sub-?le incrementals in Cumulus, the ?gure is closer to 33%. The Cumulus prototype thus shows that a service with a simple storage interface can achieve low overhead, and that Cumulus can achieve a lower total cost than other existing backup tools targeting S3. While perhaps none of the systems are yet optimized for speed, initial full backups in Brackup and Jungle Disk were both notably slow. In the tests, the initial Jungle Disk backup took over six hours, Brackup (to local disk, not S3) took slightly over two hours, and Cumulus (to S3) approximately 15 minutes. For comparison, simply archiving all ?les with tar to local disk took approximately 10 minutes. For incremental backups, elapsed times for the tools were much more comparable. Jungle Disk averaged 248 seconds per run archiving to S3. Brackup averaged 115 seconds per run and Cumulus 167 seconds, but in these tests each were storing snapshots to local disk rather than to Amazon S3. 5.4.2 Segment Compression

Table 4: Cost comparison for backups based on replaying actual ?le changes in the user trace over a three month period. Costs for Cumulus are lower than those shown in Table 3 since that evaluation ignored the possible bene?ts of compression and sub-?le incrementals, which are captured here. Values are listed on a per-month basis. (this setting is recommended). Since this setting still results in many small ?les (many of the small ?les are still larger than 1 KB), a “high aggregation” run sets merge_files_under=16kB to capture most of the small ?les and further reduce the operation count. Brackup includes the digest database in the ?les backed up, which serves a role similar to the database Cumulus stores locally. For fairness in the comparison, we subtract the size of the digest database from the sizes reported for Brackup. Both Brackup and Cumulus use gpg to encrypt data in the test; gpg compresses the data with gzip prior to encryption. Encryption is enabled in Jungle Disk, but no compression is available. In principle, we would expect backups with Jungle Disk to be near optimal in terms of storage and upload since no space is wasted due to aggregation. But, as a tradeoff, Jungle Disk will have a much higher operation count. In practice, Jungle Disk will also suffer from a lack of de-duplication, sub-?le incrementals, and compression. Table 4 compares the estimated backup costs for Cumulus with Jungle Disk and Brackup. Several key points stand out in the comparison: ? Storage and upload requirements for Jungle Disk are larger, owing primarily to the lack of compression. ? Except in the high aggregation case, both Brackup and Jungle Disk incur a large cost due to the many small ?les stored to S3. The per-?le cost for uploads is larger than the per-byte cost, and for Jungle Disk signi?cantly so. ? Brackup stores a complete copy of all ?le metadata with each snapshot, which in total accounts for 150–

Next we isolate the effectiveness of compression at reducing the size of the data to back up, particularly as a function of segment size and related settings. We used as a sample the full data contained in the ?rst day of the user trace: the uncompressed size is 1916 MB, the compressed tar size is 1152 MB (factor of 1.66), and ?les individually compressed total 1219 MB (1.57×), 5.8% larger than whole-snapshot compression. When aggregating data together into segments, we found that larger input segment sizes yielded better compression, up to about 300 KB when using gzip and 1– 2 MB for bzip2 where compression ratios leveled off. 5.4.3 Metadata

The Cumulus prototype stores metadata for each ?le in a backup snapshot in a text format, but after compression the format is still quite ef?cient. In the full tests on the user trace, the metadata for a full backup takes roughly 46 bytes per item backed up. Since most items include a 20-byte hash value which is unlikely to be compressible, the non-checksum components of the metadata average under 30 bytes per ?le.

236

7th USENIX Conference on File and Storage Technologies

USENIX Association

File size Compressed size Cumulus size Size overhead rdiff delta Cumulus delta Delta overhead

File A 4.860 MB 1.547 MB 5.190 MB 235% 1.421 MB 1.527 MB 7%

File B 5.890 MB 2.396 MB 3.081 MB 29% 122 KB 181 KB 48%

mentals is still much smaller than a full copy of the ?le. When large parts of the ?le change daily (File A), the ef?ciency of Cumulus approaches that of rdiff. 5.4.5 Upload Time As a ?nal experiment, we consider the time to upload to a remote storage service. Our Cumulus prototype is capable of uploading snapshot data directly to Amazon S3. To simplify matters, we evaluate upload time in isolation, rather than as part of a full backup, to provide a more controlled environment. Cumulus uses the boto [4] Python library to interface with S3. As our measurements are from one experiment from a single computer (on a university campus network), they should not be taken as a good measure of the overall performance of S3. For large ?les—a megabyte or larger— uploads proceed at a maximum rate of about 800 KB/s. According to our results there is an overhead equivalent to a latency of roughly 100 ms per upload, and for small ?les this dominates the actual time for data transfer. It is thus advantageous to upload data in larger segments, as Cumulus does. More recent tests indicate that speeds may have improved. The S3 protocol, based on HTTP, does not support pipelining multiple upload requests. Multiple uploads in parallel could reduce overhead somewhat. Still, it remains bene?cial to perform uploads in larger units. For perspective, assuming the maximum transfer rates above, ongoing backups for the ?leserver and user workloads will take on average 3.75 hours and under a minute, respectively. Overheads from cleaning will increase these times, but since network overheads from cleaning are generally small, these upload times will not change by much. For these two workloads, backup times are very reasonable for daily snapshots. 5.4.6 Restore Time To completely restore all data from one of the user snapshots takes approximately 11 minutes, comparable to but slighly faster than the time required for an initial full backup. When restoring individual ?les from the user dataset, almost all time is spent extracting and parsing metadata—there is a ?xed cost of approximately 24 seconds to parse the metadata to locate requested ?les. Extracting requested ?les is relatively quick, under a second for small ?les. Both restore tests were done from local disk; restoring from S3 will be slower by the time needed to download the data.

Table 5: Comparison of Cumulus sub-?le incrementals with an idealized system based on rdiff, evaluated on two sample ?les from the user trace. Metadata logs can be stored incrementally: new snapshots can reference the portions of old metadata logs that are not modi?ed. In the full user trace replay, a full metadata log was written to a snapshot weekly. On days where only differences were written out, though, the average metadata log delta was under 2% of the size of a full metadata log. Overall, across all the snapshots taken, the data written out for ?le metadata was approximately 5% of the total size of the ?le data itself. 5.4.4 Sub-File Incrementals To evaluate the support for sub-?le incrementals in Cumulus, we make use of ?les extracted from the user trace that are frequently modi?ed in place. We extract ?les from a 30-day period at the start of the trace. File A is a frequently-updated Bayesian spam ?ltering database, about 90% of which changes daily. File B records the state for a ?le-synchronization tool (unison), of which an average of 5% changes each day—however, unchanged content may still shift to different byte offsets within the ?le. While these samples do not capture all behavior, they do represent two distinct and notable classes of sub?le updates. To provide a point of comparison, we use rdiff [14] to generate an rsync-style delta between consecutive ?le versions. Table 5 summarizes the results. The size overhead measures the storage cost of sub?le incrementals in Cumulus. To reconstruct the latest version of a ?le, Cumulus might need to read data from many past versions, though cleaning will try to keep this bounded. This overhead compares the average size of a daily snapshot (“Cumulus size”) against the average compressed size of the ?le backed up. As ?le churn increases overhead tends to increase. The delta overhead compares the data that must be uploaded daily by Cumulus (“Cumulus delta”) against the average size of patches generated by rdiff (“rdiff delta”). When only a small portion of the ?le changes each day (File B), rdiff is more ef?cient than Cumulus in representing the changes. However, sub-?le incrementals are still a large win for Cumulus, as the size of the incre-

6

Conclusions

It is fairly clear that the market for Internet-hosted backup service is growing. However, it remains unclear

USENIX Association

7th USENIX Conference on File and Storage Technologies

237

what form of this service will dominate. On one hand, it is in the natural interest of service providers to package backup as an integrated service since that will both create a “stickier” relationship with the customer and allow higher fees to be charged as a result. On the other hand, given our results, the customer’s interest may be maximized via an open market for commodity storage services (such as S3), increasing competition due to the low barrier to switching providers, and thus driving down prices. Indeed, even today integrated backup providers charge between $5 and $10 per month per user while the S3 charges for backing up our test user using the Cumulus system was only $0.24 per month. (For example, Symantec’s Protection Network charges $9.99 per month for 10GB of storage and EMC’s MozyPro service costs $3.95 + $0.50/GB per month per desktop.) Moreover, a thin-cloud approach to backup allows one to easily hedge against provider failures by backing up to multiple providers. This strategy may be particularly critical for guarding against business risk—a lesson that has been learned the hard way by customers whose hosting companies have gone out of business. Providing the same hedge using the integrated approach would require running multiple backup systems in parallel on each desktop or server, incurring redundant overheads (e.g., scanning, compression, etc.) that will only increase as disk capacities grow. Finally, while this paper has focused on an admittedly simple application, we believe it identi?es a key research agenda in?uencing the future of “cloud computing”: can one build a competitive product economy around a cloud of abstract commodity resources, or do underlying technical reasons ultimately favor an integrated service-oriented infrastructure?

[4] boto: Python interface to Amazon Web Services. http:// code.google.com/p/boto/. [5] C OX , L. P., M URRAY, C. D., AND N OBLE , B. D. Pastiche: Making backup cheap and easy. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI) (2002), USENIX, pp. 285–298. [6] D OUCEUR , J. R., A DYA , A., B OLOSKY, W. J., S IMON , D., AND T HEIMER , M. Reclaiming space from duplicate ?les in a serverless distributed ?le system. Technical Report MSR-TR2002-30. [7] E SCOTO , B. rdiff-backup. rdiff-backup/. http://www.nongnu.org/ Duplicity. http://

[8] E SCOTO , B., AND L OAFMAN , K. duplicity.nongnu.org/.

[9] F ITZPATRICK , B. Brackup. http://code.google.com/ p/brackup/, http://brad.livejournal.com/tag/ brackup. [10] FUSE: Filesystem in sourceforge.net/. userspace. http://fuse.

[11] H ENSON , V. An analysis of compare-by-hash. Proceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems (2003). [12] H ENSON , V. The code monkey’s guide to cryptographic hashes for content-based addressing, Nov. 2007. http://www. linuxworld.com/news/2007/111207-hash.html. [13] Jungle disk. http://www.jungledisk.com/. [14] librsync. http://librsync.sourcefrog.net/. ` RES , D. A [15] M UTHITACHAROEN , A., C HEN , B., AND M AZI E low-bandwidth network ?le system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP) (2001), ACM, pp. 174–187. [16] P RESTON , W. C. Backup & Recovery. O’Reilly, 2006. [17] Q UINLAN , S., AND D ORWARD , S. Venti: a new approach to archival storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) (2002), USENIX Association. [18] ROSENBLUM , M., AND O USTERHOUT, J. K. The design and implementation of a log-structured ?le system. ACM Trans. Comput. Syst. 10, 1 (1992), 26–52. [19] rsnapshot. http://www.rsnapshot.org/. [20] Sqlite. http://www.sqlite.org/. [21] S UMMERS , B., AND W ILSON , C. Box backup. http://www. boxbackup.org/. [22] T RIDGELL , A. Ef?cient Algorithms for Sorting and Synchronization. PhD thesis, Australian National University, Feb. 1999. [23] WANG , J., AND H U , Y. WOLF–A novel reordering write buffer to boost the performance of log-structured ?le systems. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) (2002), USENIX Association. [24] W EATHERSPOON , H., E ATON , P., C HUN , B.-G., AND K UBI ATOWICZ , J. Antiquity: Exploiting a secure log for wide-area distributed storage. In EuroSys ’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (New York, NY, USA, 2007), ACM, pp. 371–384. [25] W HEELER , D. A. SLOCCount. http://www.dwheeler. com/sloccount/. [26] Z HU , B., L I , K., AND PATTERSON , H. Avoiding the disk bottleneck in the data domain deduplication ?le system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST) (2008), USENIX Association, pp. 269–282.

7

Acknowledgments

The authors would like to thank Chris X. Edwards and Brian Kantor for assistance in collecting ?leserver traces and other computing support. We would also like to thank our shepherd Niraj Tolia and the anonymous reviewers for their time and insightful comments regarding Cumulus and this paper. This work was supported in part by the National Science Foundation grant CNS0433668 and the UCSD Center for Networked Systems. Vrable was further supported in part by a National Science Foundation Graduate Research Fellowship.

References
[1] AGRAWAL , N., B OLOSKY, W. J., D OUCEUR , J. R., AND L ORCH , J. R. A ?ve-year study of ?le-system metadata. ACM Trans. Storage 3, 3 (2007), 9. [2] The Advanced Maryland Automatic Network Disk Archiver. http://www.amanda.org/. [3] A MAZON W EB S ERVICES. Amazon Simple Storage Service. http://aws.amazon.com/s3/.

238

7th USENIX Conference on File and Storage Technologies

USENIX Association


相关文章:
2009-2013英语
暂无评价 11页 1下载券 河南2009-2013中考英语范... 3页 2下载券...57.How did Ma Shwe manage to save her calf from the fast-flowing water...
2009年重庆中考英语试题及答案
fast food C. restaurants in the world D. Chinese food B Have you seen ...newspaper 〔机密〕2009 年 6 月 15 日前 重庆市 2009 年初中毕业生学业暨...
2009年重庆市中考英语试题及答案
重庆市 2009 年初中毕业暨高中招生考试 英语试卷 (全卷共八个大题 满分:150...fast food C. restaurants in the world D. Chinese food B Have you seen ...
fast物流系统UML系统建模文档
fast物流系统UML系统建模文档_计算机软件及应用_IT/计算机_专业资料。1 引言 .....2009 年 2 月 13 日 基于二维条码技术的物流管理解决方案.doc 2009 年 2 ...
2009年重庆市中考真题—英语
〔机密〕2009 年 6 月 15 日前 重庆市 2009 年初中毕业暨高中招生考试 ...The English people are the world’s biggest fans of fast food, while the...
【免费】2009年重庆市中考英语试题及答案
【免费】2009年重庆市中考英语试题及答案_调查/报告_表格/模板_实用文档。2009 ...fast food D. Chinese food C. Have you seen the movie Changjiang 7 (CJ...
2009-2014重庆中考英语阅读理解
2009 年 IV. 阅读理解。 (每小题 2 分,共 30 分) A A large number of people in the world eat fast food. Whenever you go into a fast food ...
2009年重庆市中考英语试卷(含答案)
重庆市 2009 年初中毕业暨高中招生考试 英语试卷 满分: 考试时间: 分钟) (全...The English people are the world’s biggest fans of fast food, while the...
2009年6月英语四六级考试考前必看范文
3 2009 年 6 月英语四级考前预测范文 Directions: For this part, you are allowed 30 minutes to write a composition on the topic Western Fast Food. You...
2009-2013五年中考英语试题合集含答案
重庆市 2009 年初中毕业暨高中招生考试 英语试卷 (全卷共八个大题 满分:150 ... proud of their delicious and high-class cuisine, don’t like fast food....
更多相关标签:
cumulus | cumulus 18 | cumulus nimbus区别 | asics cumulus | gel cumulus 17 | cumulus 17 | asics cumulus 17 | asics cumulus 18 |