Categories
Cloud Developer Tips

How I Moved 5% of All Objects in S3 with Jets3t

This is a true story about a lot of data. The cast of characters is as follows:

The Protagonist: Me.

The Hero: Jets3t, a Java library for using Amazon S3.

The Villain: Decisions made long ago, for forgotten reasons.

Innocent Bystanders: My client.

Once Upon a Time…

Amazon S3 is a great place to store media files and allows these files to be served directly from S3, instead of from your web server, thereby saving your server’s network and CPU for more important tasks. There are some gotchas with serving files directly from S3, and it is these gotchas that had my client locked in to paying for bandwidth and CPU to serve media files directly from his web server.

You see, a few years ago when my client first created their S3 bucket, they named it media.example.com. Public objects in that bucket could be accessed via the URL http://s3.amazonaws.com/media.example.com/objectKey or via the Virtual Host style URL http://media.example.com.s3.amazonaws.com/objectKey. If you’re just serving images via HTTP then this can work for you. But you might have a good reason to convince the browser that all the media is being served from your domain media.example.com (for example, when using Flash, which requires an appropriately configured crossdomain.xml). Using a URL that lives at s3.amazonaws.com or a subdomain of that host will not suffice for these situations.

Luckily, S3 lets you set up your DNS in a special manner, convincing the world that the same object lives at the URL http://media.example.com/objectKey. All you need to do is to set up a DNS CNAME alias pointing media.example.com to media.example.com.s3.amazonaws.com. The request will be routed to S3, which will look at the HTTP Host header and discover the bucket name media.example.com.

So what’s the problem? That’s all great for bucket with a name that works in DNS. But it won’t work for a bucket whose name is Bucket.example.com, because DNS is case insensitive. There are limitations on the name of a bucket if you want to use the DNS alias. This is where we reveal a secret: the bucket was not really named media.example.com. For some reason nobody remembers, the bucket was named Media.example.com – with a capital letter, which is invalid in DNS entries. This makes all the difference in the world, because S3 cannot serve this bucket via the Virtual Host method – you get a NoSuchBucket error when you try to access http://Media.example.com.s3.amazonaws.com/objectKey (equivalent to http://Media.example.com/objectKey with the appropriate DNS CNAME in place).

As a workaround my client developed an application that dynamically loaded the media onto the server and served it directly from there. This server serviced media.example.com, and it would essentially do the following for each requested file:

  1. Do we already have this objectKey on our local filesystem? If yes, go to step 3.
  2. Fetch the object from S3 via http://s3.amazonaws.com/Media.example.com/objectKey and save it to the local filesystem.
  3. Serve the file from the local filesystem.

This workaround allowed the client to release URLs that looked correct, but required using a separate server for the job. It costs extra time (when there is a cache miss) and money (to operate the server).

The challenge? To remove the need for this caching server and allow the URLs to be served directly from S3 via media.example.com.

Just Move the Objects, Right?

It might seem obvious: Why not simply move the objects to a correctly-named bucket? Turns out that’s not quite so simple to do in this case.

Obviously, if I was dealing with a few hundred, thousand, or even tens of thousands of objects, I could use a GUI tool such as CloudBerry Explorer or the S3Fox Organizer Firefox Extension. But this client is a popular web site, and has been storing media in the bucket for a few years already. They had 5 billion objects in the bucket (which is 5% of the total number of objects in S3). These tools crashed upon viewing the bucket. So, no GUI for you.

S3 is a hosted object store system. Why not just use its MOVE command (via the API) to move the objects from the wrong bucket to the correctly-named bucket? Well, it turns out that S3 has no MOVE command.

Thankfully, S3 has a COPY command which allows you to copy an object on the server-side, without downloading the object’s contents and uploading them again to the new location. Using some creative programming you can put together a COPY and a DELETE (only if the COPY succeeded!) to simulate a MOVE. I tried using the boto Python library but it choked on manipulating any object in the bucket name Media.example.com – even though it’s a legal name, it’s just not recommended – so I couldn’t use this tool. The Java-based Jets3t library was able to handle this unfortunate bucket name just fine, and it also provides a convenience method to move objects via COPY and DELETE. The objects in this bucket are immutable, so we don’t need to worry about consistency.

So I’m all set with Jets3t.

Or so I thought.

First Attempt: Make a List

My first attempt was to:

  1. List all the objects in the bucket and put them in a database.
  2. Run many client programs that requested the “next” object key from the database and deleted the entry from the database when it was successfully moved to the correctly-named bucket.

This approach would provide a way to make sure all the objects were moved successfully.

Unfortunately, listing so many objects took too long – I allowed a process to list the bucket’s contents for a full 24 hours before killing it. I don’t know how far it got, but I didn’t feel like waiting around for it to finish dumping its output to a file, then waiting some more to import the list into a database.

Second Attempt: Make a Smaller List

I thought about the metadata I had: The objects in the bucket all had object keys with a particular structure:

/binNumber/oneObjectKey

binNumber was a number from 0 to 4.5 million, and each binNumber served as the prefix for approximately 1200 objects (which works out to 5.4 billion objects total in the bucket). The names of these objects were essentially random letters and numbers after the binNumber/ component. S3 has  a list objects with this prefix method. Using this method you can get a list of object keys that begin with a specific prefix – which is perfect for my needs, since it will return a list of very manageable size.

So I coded up something quick in Java using Jets3t. Here’s the initial code snippet:

public class MoveObjects {
private static final String AWS_ACCESS_KEY_ID = .... ; private static final String AWS_SECRET_ACCESS_KEY = .... ; private static final String SOURCE_BUCKET_NAME = "Media.example.com"; private static final String DEST_BUCKET_NAME = "media.example.com"; public static void main(String[] args) {
AWSCredentials awsCredentials = new AWSCredentials(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY); S3Service restService = new RestS3Service(awsCredentials); S3Bucket sourceBucket = restService.getBucket(SOURCE_BUCKET_NAME); final String delimiter = "/"; String[] prefixes = new String[...]; for (int i = 0; i < prefixes.length; ++i) {
// fill the list of binNumbers from the command-line args (not shown)
prefixes[i] = String.valueOf(...);
} ExecutorService tPool = Executors.newFixedThreadPool(32); long delay = 50; for (String prefix : prefixes) {
S3Object[] sourceObjects = restService.listObjects(sourceBucket, prefix + delimiter, delimiter); if (sourceObjects != null && sourceObjects.length > 0) {
System.out.println(" At key " + sourceObjects[0].getKey() + ", this many: " + sourceObjects.length); for (int i = 0; i < sourceObjects.length; ++i) {
final S3Object sourceObject = sourceObjects[i]; final String sourceObjectKey = sourceObject.getKey(); sourceObject.setAcl(AccessControlList.REST_CANNED_PUBLIC_READ); Mover mover = new Mover(restService, sourceObject, sourceObjectKey); while (true) {
try {
tPool.execute(mover); delay = 50; break;
} catch (RejectedExecutionException r) {
System.out.println("Queue full: waiting " + delay + " ms"); Thread.sleep(delay); // backoff and retry delay += 50;
}
}
}
}
} tPool.shutdown(); tPool.awaitTermination(360000, TimeUnit.SECONDS); System.out.println(" Completed!");
} private static class Mover implements Runnable {
final S3Service restService; final S3Object sourceObject; final String sourceObjectKey; Mover(final S3Service restService, final S3Object sourceObject, final String sourceObjectKey) {
this.restService = restService; this.sourceObject = sourceObject; this.sourceObjectKey = sourceObjectKey;
} public void run() {
Map moveResult = null; try {
moveResult = restService.moveObject(SOURCE_BUCKET_NAME, sourceObjectKey, DEST_BUCKET_NAME, sourceObject, false); if (moveResult.containsKey("DeleteException")) {
System.out.println("Error: " + sourceObjectKey);
}
} catch (S3ServiceException e) {
System.out.println("Error: " + sourceObjectKey + " EXCEPTION: " + e.getMessage());
}
}
}
}

The code uses an Executor to control a pool of threads, each of which is given a single object to move which is encapsulated in a Mover. All objects with a given prefix (binNumber) are listed and then added to the Executor’s pool to be moved. The initial setup of Jets3t with the credentials and building the array of prefixes is not shown.

We need to be concerned that the thread pool will fill up faster than we can handle the operations we’re enqueueing, so we have backoff-and-retry logic in that code. But, notice we don’t care if a particular object’s move operation fails. This is because we will run the same program again a second time, after covering all the binNumber prefixes, to catch any objects that have been left behind (and a third time, too – until no more objects are left in the source bucket).

I ran this code on an EC2 m1.xlarge instance in two simultaneous processes, each of which was given half of the binNumber prefixes to work with. I settled on 32 threads in the thread pool after a few experiments showed this number ran the fastest. I made sure to set the proper number of underlying HTTP connections for Jets3t to use, with these arguments: -Ds3service.max-thread-count=32 -Dhttpclient.max-connections=60 . Things were going well for a few hours.

Third Attempt: Make it More Robust

After a few hours I noticed that the rate of progress was slowing. I didn’t have exact numbers, but I saw that things were just taking longer in minute 350 than they had taken in minute 10. I could have taken on the challenge of debugging long-running, multithreaded code. Or I could hack in a workaround.

The workaround I chose is to force the program to terminate every hour, and to restart itself. I added the following code to the main method:

    // exit every hour
    Timer t = new Timer(true);
    TimerTask tt = new TimerTask() {
    	public void run() {
    		System.out.println("Killing myself!");
    		System.exit(42);
    	}
    };
    final long dieMillis = 3600 * 1000;
    t.schedule(tt, dieMillis);

And I wrapped the program in a “forever” wrapper script:

#! /bin/bash

while true; do
	DATE=`date`
	echo $DATE: $0: launching $*
	$* 2>&1
done

This script is invoked as follows:

ARGS=... ./forever.sh nohup java -Ds3service.max-thread-count=32 -Dhttpclient.max-connections=60 -classpath bin/:lib/jets3t-0.7.2.jar:lib/commons-logging-1.1.1.jar:lib/commons-httpclient-3.1.jar:lib/commons-codec-1.3.jar com.orchestratus.s3.MoveObjects $ARGS >> nohup.out 2>&1 &

Whenever the Java program terminates, the forever wrapper script re-launches it with the same arguments. This works properly because the only objects that will be left in the bucket will be those that haven’t been deleted yet. Eventually, this ran to completion and the program would start, check all its binNumber prefixes, find nothing, exit, restart, find nothing, exit, restart, etc.

The whole process took 5 days to completely move all objects to the new bucket. Then I gave my client the privilege of deleting the Media.example.com bucket.

Lessons Learned

Here are some important lessons I learned and reinforced through this project.

Use the metadata to your benefit

Sometimes the only thing you know about a problem is its shape, not its actual contents. In this case I knew the general structure of the object keys, and this was enough to go on even if I couldn’t discover every object key a priori. This is a key principle when working with large amounts of data: the metadata is your friend.

Robustness is a feature

It took a few iterations until I got to a point where things were running consistently fast. And it took some advanced planning to design a scheme that would gracefully tolerate failure to move some objects. But without these features I would have had to manually intervene when problems arose. Don’t let intermittent failure delay a long-running process.

Sometimes it doesn’t pay to debug

I used an ugly hack workaround to force the process to restart every hour instead of debugging the actual underlying problem causing it to gradually slow down. For this code, which was one-off code that I wrote for my specific circumstances, I decided this was a more effective approach than getting bogged down making it correct. It works fast when brute-forced, so it didn’t need to be truly corrected.

Repeatability

I’ve been thinking about how someone would repeat my experiments and discover improvements to the techniques I employed. We could probably get by without actually copying and deleting the objects, rather we could perform two successive calls – perhaps to get different metadata headers. We’d need some public S3 bucket with many millions of objects in it to make a comparable test case. And we’d need an S3 account willing to let users play in it.

Any takers?

Categories
Cloud Developer Tips

Elastic Load Balancing with Sticky Sessions

At long last, the most oft-requested feature for EC2’s Elastic Load Balancer is here: session affinity, also known as “sticky sessions”. What is session affinity? Why is this feature in such high demand? How can it be used with existing applications? Let’s take a look at these questions. But first, let’s explore what a session is – then we’ll cover why we want it to be sticky, and what ELB’s sticky session limitations are. [To skip directly to an explanation of how to use ELB sticky sessions, go toward the bottom of the article.]

What is a Session?

A session is a way to get your application involved in a long-lasting conversation with a particular client. Without a session, a conversation between your application and a client would  look like something straight out of the movie Memento. It would look like this:

Life Without Sessions

Client: Hi, I’d like to see /products/awesomeDoohickey.html

Application: I don’t know who you are. Please go here to login first: /login

Client: OK, I’d like to see /login

Application: Here it is: “…”

Client: Thanks. Here’s the filled in login form.

Application: Thanks for logging in. Where do you want to go?

Client: I’d like to see /products/awesomeDoohickey.html

Application: I don’t know who you are. Please go here to login first: /login

Client: >Sigh< OK, I’d like to see /login

Application: Happily! Here it is: “…”

Client: Here’s the filled in login form.

Application: Thanks for logging in. Where do you want to go?

Client: Show me /products/awesomeDoohickey.html already!

Application: I don’t know who you are. Please go here to login first: /login

Client: *$#%& this!

The application can’t remember who the client is – it has no context to process each request as part of a conversation. The client gets so frustrated he starts thinking he’s living in an Adam Sandler movie.

On a technical level: Each HTTP request-response pair between the client and application happens (most often) on a different TCP connection. This is especially true when a load balancer sits between the client and the application. So the application can’t use the TCP connection as a way to remember the conversational context. And, HTTP itself is stateless: any request can be sent at any time, in any sequence, regardless of the preceding requests. Sure, the application may demand a particular pattern of interaction – like logging in before accessing certain resources – but that application-level state is enforced by the application, not by HTTP. So HTTP cannot be relied on to maintain conversational context between the client and the application.

There are two ways to solve this problem of forgetting the context. The first is for the client to remind the application of the context every time he requests something: “My name is Henry Whatsisface, I have these items in my shopping cart (…), I got here via this affiliate (…), yada yada yada… and I’d like to see /products/awesomeDoohickey.html”. No sane client would ever agree to interact with an application that needed to be sent the entire context at every stage of the conversation. Its burdensome for the client, it’s difficult to maintain for the application, and it’s expensive (in bandwidth) for both of them. Besides, the application usually maintains the conversational state, not the client. So it’s wrong to require the client to send the entire conversation context along with each request.

The accepted solution is to have the application remember the context by creating an associated memento. This memento is given to the client and returned to the application on subsequent requests. Upon receiving the memento the application looks for the associated context, and – voila – discovers it. Thus, the conversation is preserved.

One way of providing a memento is by putting it into the URL. It looks really ugly when you do this: http://www.example.com/products/awesomeDoohickey.html?sessionID=0123456789ABCDEFGH

More commonly, mementos are provided via cookies, which all browsers these days support. Cookies are placed within the HTTP request so they can be discovered by the application even if a load balancer intervenes.

Here’s what that conversation looks like with cookies:

Life With Sessions, Take 1

Client: Hi, I’d like to see /products/awesomeDoohickey.html

Application: I don’t know who you are. Please go here to login first: /login

Client: OK, I’d like to see /login

Application: Here it is: “…”

Client: Thanks. Here’s the filled in login form.

Application: Thanks for logging in. Here’s a cookie. Where do you want to go?

Client: I’d like to see /products/awesomeDoohickey.html and here’s my cookie.

Application: I know you – I’d recognize that cookie anywhere! Great, here’s that page: “…”

Client: I’d like to buy 5000 units. Here’s my cookie.

Much improved, yes?

A side point: most modern applications will provide a cookie earlier in the conversation. This allows the following more optimal conversation:

Life With Sessions, Take 2

Client: Hi, I’d like to see /products/awesomeDoohickey.html

Application: I don’t know who you are. Here’s a cookie. Take this login page and fill it out: “…”

Client: OK. Here’s the filled in login form. And here’s my cookie.

Application: I know you – I’d recognize that cookie anywhere! Thanks for logging in. I recall you wanted to see /products/awesomeDoohickey.html. Here it is: “…”

Client: I’d like to buy 5000 units. Here’s my cookie.

That’s about as optimized a conversation as you can have. Cookies make it possible.

What is Session Affinity (Sticky Sessions)? Why is it in High Demand?

When you only have one application server talking to your clients life is easy: all the session contexts can be stored in that application server’s memory for fast retrieval. But in the world of highly available and scalable applications there’s likely to be more than one application server fulfilling requests, behind a load balancer. The load balancer routes the first request to an application server, who stores the session context in its own memory and gives the client back a cookie. The next request from the same client will contain the cookie – and, if the same application server gets the request again, the application will rediscover the session context. But what happens if that client’s next request instead gets routed to a different application server? That application server will not have the session context in its memory – even though the request contains the cookie, the application can’t discover the context.

If you’re willing to modify your application you can overcome this problem. You can store the session context in a shared location, visible to all application servers: the database or memcached, for example. All application servers will then be able to lookup the cookie in the central, shared location and discover the context. Until now, this was the approach you needed to take in order to retain the session context behind an Elastic Load Balancer.

But not all applications can be modified in this way. And not all developers want to modify existing applications. Instead of modifying the application, you need the load balancer to route the same client to the same application server. Once the client’s request has been routed to the correct application server, that application server can lookup the session cookie in its own memory and recover the conversational context.

That’s what sticky sessions are: the load balancer routing the same client to the same application server. And that’s why they’re so important: If the load balancer supports sticky sessions then you don’t need to modify your application to remember client session context.

How to Use ELB with Sticky Sessions with Existing Applications

The key to managing ELB sticky sessions is the duration of the stickiness: how long the client should consistently be routed to the same back-end instance. Too short, and the session context will be lost, forcing the client to login again. Too long, and the load balancer will not be able to distribute requests equally across the application servers.

Controlling the ELB Stickiness Duration

ELB supports two ways of managing the stickiness’ duration: either by specifying the duration explicitly, or by indicating that the stickiness expiration should follow the expiration of the application server’s own session cookie.

If your application server has an existing session cookie already, the simplest way to get stickiness is to configure your ELB to use the existing application cookie for determining the stickiness duration. PHP applications usually have a session cookie called PHPSESSID. Java applications usually have a session cookie called JSESSIONID. The expiration of these cookies is controlled by your application, and the stickiness expiration can be set to match as follows. Assuming your load balancer is called myLoadBalancer and it has an HTTP listener on port 80:

elb-create-app-cookie-stickiness-policy myLoadBalancer --cookie-name PHPSESSID --policy-name followPHPPolicy
elb-set-lb-policies-of-listener myLoadBalancer --lb-port 80 --policy-names followPHPPolicy

The above commands create a stickiness policy that says “make the session stickiness last as long as the cookie PHPSESSID does” and sets the load balancer to use that stickiness policy. Behind the scenes, the ELB’s session cookie will have the same lifetime as the PHPSESSID cookie provided by your application.

If your application does not have its own session cookie already, set your own stickiness duration for the load balancer, as follows:

elb-create-lb-cookie-stickiness-policy myLoadBalancer --policy-name fifteenMinutesPolicy --expiration-period 900
elb-set-lb-policies-of-listener myLoadBalancer --lb-port 80 --policy-names fifteenMinutesPolicy

These commands create a stickiness policy that says “make the session stickiness last for fifteen minutes” and sets the load balancer to use that stickiness policy. Behind the scenes, the ELB’s session cookie will have a lifetime of fifteen minutes.

What Can’t ELB Sticky Session Do?

Life is not all roses with ELB’s sticky session support. Here are some things it can’t do.

Update October 2010: ELB now supports SSL termination, and it can provide sticky sessions over HTTPS as well.

HTTPS

Remember how sticky sessions are typically provided via cookies? The cookie is inserted into the HTTP request by the client’s browser, and any server or load balancer that can read the request can recover the cookie. This works great for plain old HTTP-based communications.

With HTTPS connections the entire communications stream is encrypted. Only servers that have the proper decryption credentials can decipher the stream and discover the cookies. If the load balancer has the server’s SSL certificate then it can decrypt the stream. Because it does not have your application’s SSL certificate (and there’s no way to give it your certificate) ELB does not support HTTPS communications. If you need to support sticky sessions and HTTPS in EC2 then you can’t use ELB today. You need to use HAProxy or aiCache or another product that provide load balancing with session affinity and SSL termination.

Scaling-down-proof stickiness

What happens when you add or remove an application server to/from the load balancer? Depending on the stickiness implementation the load balancer may or may not be able to route requests to the same application servers as it did before the scaling event (caused, for example, by an AutoScaling trigger).

When scaling up (adding more application servers) ELB maintains stickiness of existing sessions. Only new connections will be forwarded to the newly-added application servers.

When scaling down (removing application servers), you should expect some of your clients to lose their sessions and require logins again. This is because some of the stored sessions were on the application server that is no longer servicing requests.

If you really want your sessions to persist even through scaling-down events, you need to go back to basics: your application will need to store the sessions independently, as it did before sticky sessions were supported. In this case, sticky session support can provide an added optimization, allowing you to cache the session locally on each application server and only retrieve it from the central session store (the DB?) if it’s not in the local cache. Such cache misses would happen when application servers are removed from the load balancing pool, but otherwise would not impact performance. Note that this technique can be used equally well with ELB and with other load balancers.

With the introduction of sticky sessions for ELB, you – the application developer – can avoid modifying your application in order to retain session context behind a load balancer. The technical term for this is “a good thing”. Sticky sessions are, despite their limitations, a very welcome addition to ELB’s features.

Thanks to Jeff Barr of Amazon Web Services for providing feedback on drafts of this article.

Categories
Cloud Developer Tips

EC2 Reserved Instance Availability Zone Problem? No Problem.

You may know that Amazon Web Services Reserved Instances have some gotchas. One of these gotchas is that the availability zone in which the reservation is purchased cannot be changed. So if you need to use an instance in a different availability zone (e.g. if you hit InsufficientCapacity errors*) than your reservation, you’re out of luck – and you end up paying the on-demand price.

Today at CloudConnect I chatted with Jeremy Edberg of Reddit and Joe Arnold of Cloudscaling, and we had an insight: there’s a workaround. Read on for more details.

* AWS says this shouldn’t happen, but I’ve seen it happen.

The Availability Zone Cha-Cha

EC2 availability zones are different for each customer: my us-east-1a may be your us-east-1c. This makes it confusing when you discuss specific availability zones with other users. Eric Hammond published a technique to discover the correspondence between availability zones across accounts, but there has been limited practical use for this technique.

What if you could share reserved instances across accounts? What if you could do so without regard to the availability zone? Wouldn’t this help circumvent insufficient capacity errors?

Guess what: you can. EC2 Consolidated Billing allows you to set up a master account that consolidates the billing of many sub-accounts. If you use consolidated billing then your reservations from one sub-account’s us-east-1a are usable for another “sister” account’s us-east-1a. Here’s the quote:

Bob receives the cost benefit from Susan’s Reserved Instances only if he launches his instances in the Availability Zone where Susan purchased her Reserved Instances. For example, if Susan specified us-east-1a when she purchased her Reserved Instances, Bob must specify us-east-1a when he launches his instances in order to get the cost benefit on his Consolidated Bill. However, the actual locations of Availability Zones are independent from one account to another. For example, the us-east-1a Availability Zone for Bob’s account might be in a different location than for Susan’s account.

So, here’s the idea:

  1. Set up two accounts in AWS. You might want to use the same credit card for both of them to make your life easier.
  2. Use Eric Hammond’s technique to determine whether or not us-east-1a on the two accounts match up. If they do: make a new account. You’re looking for two accounts whose us-east-1a do not match.
  3. Repeat step 2, creating new accounts and making sure that us-east-1a does not match any of the other accounts. Do this until you have four accounts, all with different physical availability zones behind us-east-1a.
  4. Set up Consolidated Billing for those accounts.
  5. Launch an on-demand instance in one account’s us-east-1a. It doesn’t matter which, because the reserved instance pricing will apply.
  6. If that original instance has a problem and you need to launch another one in a different availability zone, just choose one of the other accounts. Launch the new instance in that account’s us-east-1a availability zone. Reserved instance pricing will apply to the new instance as soon as you terminate the original.

One caveat: I haven’t actually tried this. Please let me know if this helps you.

Categories
Cloud Developer Tips

Top Ten Reasons Why You, a Developer, Should Come to CloudConnect

You’re a developer – or you want to be – writing software that uses cloud computing. Why should you care about the CloudConnect Conference? Here are the top ten reasons why it’s worth your while. Full disclosure: Nothing to disclose. I have no commercial connection to CloudConnect or its producers.

#10: Snacks

Enjoy short, sweet, thought-provoking keynotes about applications, metrics, SaaS, and more. I happen to know of three keynote presenters who will be particularly interesting: William Louth, William Vambenepe, and Guy Rosen. Come hear them challenge the way you think about cloud applications, environments, and infrastructure.

#9: Get Into It

You can attend crash courses on building cloud applications using popular cloud platforms, including Amazon Web Services, Google App Engine, Force.com, and more. I’ll be assisting Jinesh Varia, AWS Technology Evangelist, to deliver the Develop Your First Amazon Web Services Application workshop. If you want to develop for the cloud but don’t know where or how to start, the Developer Workshops are the place.

#8: Big Data

You can discover techniques for dealing with big data. When you have problems with big data. you have big data problems: how to store it, how to process it, and how to get the analytics you need from it. Brad Cross, co-founder and head of research at everyone’s favorite frustration-saving service for travelers FlightCaster, is leading the Dealing with Big Data track, which will address these challenges.

#7: Enterprise

You can hear how to develop a strategy for migrating enterprise applications into the cloud. Randy Bias, founder of Cloudscaling, will help you make sense of what to move into the cloud, what to leave behind, and how to do it in a way that makes sense for your organization and budget. If you’re interested in the enterprise perspective on cloud computing adoption, the Migration Strategies track is for you.

#6: Standards

You can get up to speed on the many standardization efforts underway in the cloud ecosystem. Bob Marcus, who has been working to ensure all these efforts act in concert, is leading the Standards, Government, and Industry track. This is where you can hear about the current status and future plans of the standards, and you can let the representatives of various efforts know what’s important to you. Be heard, make a difference.

#5: APIs

You can learn about the myriad of cloud APIs out there: How can you write code that works on all these APIs? Why are there so many different ones? What were the API designers thinking when they architected them? What directions are planned for the future of these APIs? Come hear architects, implementers, and power users of the AWS, Rackspace, GoGrid, and other cloud APIs. Ask the architects tough questions. Hear how users overcome some of the issues. Let them all know what you think in the Writing Code for Many Clouds session.

#4: Design Patterns

You can learn how to think about cloud application architecture. The author of the seminal whitepaper Architecting for the Cloud: Best Practices (the same Jinesh Varia who’s giving the AWS workshop) will present Design Patterns for Cloud Computing Applications. If you want to learn the major architectural patterns that software designers use for cloud applications, this is the session for you.

#3: Deployment

You can hear all about some of the leading platform providers’ solutions (RightScale, EngineYard, GigaSpaces) and how they can help you deploy your applications in the cloud more easily. Plus, hear what actual customers of these platforms are doing, and what they wish these platforms could do. If you ever wondered how you’d get your application to live a healthy life in the cloud, the Deploying to the Cloud: The Care and Feeding of a Cloud Application session is the place to be.

#2: Orchestration

You can discover what many other cloud developers are realizing: that you need to know about orchestration. It’s the practice (art?) of keeping a composed set of machines and services working together dynamically to achieve certain goals (an SLA, perhaps). Hear the history of orchestration, learn the different layers into which orchestration must be integrated, and see examples of how you can do this in your applications – presented by the leading practitioners in the field. All this, plus ask your own questions at the Orchestration: The Next Frontier for Cloud Applications session.

#1: Me

You can meet me. I’m leading the Developing for the Cloud track, which includes the four sessions mentioned above. I’ll be happy to autograph your AWS Multifactor Authentication fob or your Secret Access Key.

An additional reason, for those of you who have read this far: You now have a discount code CNJRCC05 good for a free Expo pass or 40% off the conference sessions. Enjoy, and come introduce yourself.

Categories
Cloud Developer Tips

Creating Consistent Snapshots of a Live Instance with XFS on a Boot-from-EBS AMI

Eric Hammond has taught us how to create consistent snapshots of EBS volumes. Amazon has allowed us to use EBS snapshots as AMIs, providing a persistent root filesystem. Wouldn’t it be great if you could use both of these techniques together, to take a consistent snapshot of the root filesystem without stopping the instance? Read on for my instructions how to create an XFS-formatted boot-from-EBS AMI, allowing consistent live snapshots of the root filesystem to be created.

The technique presented below owes its success to the Canonical Ubuntu team, who created a kernel image that already contains XFS support. That’s why these instructions use the official Canonical Ubuntu 9.10 Karmic Koala AMI – because it has XFS support built in. There may be other AKIs out there with XFS support built in – if so, the technique should work with them, too.

How to Do It

The general steps are as follows:

  1. Run an instance and set it up the way you like.
  2. Create an XFS-formatted EBS volume.
  3. Copy the contents of the instance’s root filesystem to the EBS volume.
  4. Unmount the EBS volume, snapshot it, and register it as an AMI.
  5. Launch an instance of the new AMI.

More details on each of these steps follows.

1. Run an instance and set it up the way you like.

As mentioned above, I use the official Canonical Ubuntu 9.10 Karmic Koala AMI (currently ami-1515f67c for 32-bit architecture – see the table on Alestic.com for the most current Ubuntu AMI IDs).

ami=ami-1515f67c
security_groups=default
keypair=my-keypair
instance_type=m1.small
ec2-run-instances $ami -t $instance_type -g $security_groups -k $keypair

Wait until the ec2-describe-instances command shows the instance is running and then ssh into it:

ssh -i my-keypair ubuntu@ec2-1-2-3-4.amazonaws.com

Now that you’re in, set up the instance’s root filesystem the way you want. Don’t forget that you probably want to run

sudo apt-get update

to allow you to pull in the latest packages.

In our case we’ll want to install ec2-consistent-snapshot, as per Eric Hammond’s article:

codename=$(lsb_release -cs)
echo "deb http://ppa.launchpad.net/alestic/ppa/ubuntu $codename main" | sudo tee /etc/apt/sources.list.d/alestic-ppa.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys BE09C571
sudo apt-get update
sudo apt-get install -y ec2-consistent-snapshot
sudo PERL_MM_USE_DEFAULT=1 cpan Net::Amazon::EC2

2. Create an XFS-formatted EBS volume.

First, install the XFS tools:

sudo apt-get install -y xfsprogs

These utilities allow you to format filesystems using XFS and to freeze and unfreeze the XFS filesystem. They are not necessary in order to read from XFS filesystems, but we want these programs installed on the AMI we create because they are used in the process of creating a consistent snapshot.

Next, create an EBS volume in the availability zone your instance is running in. I use a 10GB volume, but you can use any size and grow it later using this technique. This command is run on your local machine:

ec2-create-volume --size 10 -z $zone

Wait until the ec2-describe-volumes command shows the volume is available and then attach it to the instance:

ec2-attach-volume $volume --instance $instance --device /dev/sdh

Back on the instance, format the volume with XFS:

sudo mkfs.xfs /dev/sdh
sudo mkdir -m 000 /vol
sudo mount -t xfs /dev/sdh /vol

Now you should have an XFS-formatted EBS volume, ready for you to copy the contents of the instance’s root filesystem.

3. Copy the contents of the instance’s root filesystem to the EBS volume.

Here’s the command to copy over the entire root filesystem, preserving soft-links, onto the mounted EBS volume – but ignoring the volume itself:

sudo rsync -avx --exclude /vol / /vol

My command reports that it copied about 444 MB to the EBS volume.

4. Unmount the EBS volume, snapshot it, and register it as an AMI.

You’re ready to create the AMI. On the instance do this:

sudo umount /vol

Now, back on your local machine, create the snapshot:

ec2-create-snapshot $volume

Once ec2-describe-snapshots shows the snapshot is 100% complete, you can register it as an AMI. The AKI and ARI values used here should match the AKI and ARI that the instance is running – in this case, they are the default Canonical AKI and ARI for this AMI. Note that I give a descriptive “name” and “description” for the new AMI – this will make your life easier as the number of AMIs you create grows. Another note: some AMIs (such as the Ubuntu 10.04 Lucid AMIs) do not have a ramdisk, so skip the --ramdisk $ramdisk arguments if you’ve used such an AMI.

kernel=aki-5f15f636
ramdisk=ari-0915f660
description="Ubuntu 9.10 Karmic formatted with XFS"
ami_name=ubuntu-9.10-32-bit-ami-1515f67c-xfs
ec2-register --snapshot $snapshot --kernel $kernel --ramdisk $ramdisk '--description=$description' --name=$ami_name --architecture i386 --root-device-name /dev/sda1 --block-device-mapping /dev/sda2=ephemeral0

This displays the newly registered AMI ID – let’s say it’s ami-00000000.

5. Launch an instance of the new AMI.

Here comes the moment of truth. Launch an instance of the newly registered AMI:

ami=ami-00000000
security_groups=default
keypair=my-keypair
instance_type=m1.small
ec2-run-instances $ami -t $instance_type -g $security_groups -k $keypair

Again, wait until ec2-describe-instances shows it is running and ssh into it:

ssh -i my-keypair ubuntu@ec2-5-6-7-8.amazonaws.com

Now, on the instance, you should be able to see that the root filesystem is XFS with the mount command. The output should contain:

/dev/sda1 on / type xfs (rw)
...

We did it! Let’s create a consistent snapshot of the root filesystem. Look back into the output of ec2-describe-instances to determine the volume ID of the root volume for the instance.

sudo ec2-consistent-snapshot --aws-access-key-id $aws_access_key_id --aws-secret-access-key $aws_secret_access_key --xfs-filesystem / $volumeID

The command should display the snapshot ID of the snapshot that was created.

Using ec2-consistent-snapshot and an XFS-formatted EBS AMI, you can create snapshots of the running instance without stopping it. Please comment below if you find this helpful, or with any other feedback.

Categories
Cloud Developer Tips

Use ELB to Serve Multiple SSL Domains on One EC2 Instance

This is one of the coolest uses of Amazon’s ELB I’ve seen yet. Check out James Elwood’s article.

You may know that you can’t serve more than one SSL-enabled domain on a single EC2 instance. Okay, you can but only via a wildcard certificate (limited) or a multi-domain certificate (hard to maintain). So you really can’t do it properly. Serving multiple SSL domains is one of the main use cases behind the popular request to support multiple IP addresses per instance.

Why can’t you do it “normally”?

The reason why it doesn’t work is this: The HTTPS protocol encrypts the HTTP request, including the Host: header within. This header identifies what actual domain is being requested – and therefore what SSL certificate to use to authenticate the request. But without knowing what domain is being requested, there’s no way to choose the correct SSL certificate! So web servers can only use one SSL certificate.

If you have multiple IP addresses then you can serve different SSL domains from different IP addresses. The VirtualHost directive in Apache (or similar mechanisms in other web servers) can look at the target IP address in the TCP packets – not in the HTTP Host: header – to figure out which IP address is being requested, and therefore which domain’s SSL certificate to use.

But without multiple IP addresses on an EC2 instance, you’re stuck serving only a single SSL-enabled domain from each EC2 instance.

How can you?

Really, read James’ article. He explains it very nicely.

How much does it cost?

Much less than two EC2 instances, that’s for sure. According to the EC2 pricing charts, ELB costs:

  • $0.025 per Elastic Load Balancer-hour (or partial hour) ($0.028 in us-west-1 and eu-west-1)
  • $0.008 per GB of data processed by an Elastic Load Balancer

The smallest per-hour cost you can get in EC2 is for the m1.small instance, at $0.085 ($0.095 in us-west-1 and eu-west-1).

Using the ELB-for-multiple-SSL-sites trick saves you 75% of the cost of using separate instances.

Thanks, James!

Categories
Cloud Developer Tips

Read-After-Write Consistency in Amazon S3

S3 has an “eventual consistency” model, which presents certain limitations on how S3 can be used. Today, Amazon released an improvement called “read-after-write-consistency” in the EU and US-west regions (it’s there, hidden at the bottom of the blog post). Here’s an explanation of what this is, and why it’s cool.

What is Eventual Consistency?

Consistency is a key concept in data storage: it describes when changes committed to a system are visible to all participants. Classic transactional databases employ various levels of consistency, but the golden standard is that after a transaction commits the changes are guaranteed to be visible to all participants. A change committed at millisecond 1 is guaranteed to be available to all views of the system – all queries – immediately thereafter.

Eventual consistency relaxes the rules a bit, allowing a time lag between the point the data is committed to storage and the point where it is visible to all others. A change committed at millisecond 1 might be visible to all immediately. It might not be visible to all until millisecond 500. It might not even be visible to all until millisecond 1000. But, eventually it will be visible to all clients. Eventual consistency is a key engineering tradeoff employed in building distributed systems.

One issue with eventual consistency is that there’s no theoretical limit to how long you need to wait until all clients see the committed data. A delay must be employed (either explicitly or implicitly) to ensure the changes will be visible to all clients.

Practically speaking, I’ve observed that changes committed to S3 become visible to all within less than 2 seconds. If your distributed system reads data shortly after it was written to eventually consistent storage (such as S3) you’ll experience higher latency as a result of the compensating delays.

What is Read-After-Write Consistency?

Read-after-write consistency tightens things up a bit, guaranteeing immediate visibility of new data to all clients. With read-after-write consistency, a newly created object or file or table row will immediately be visible, without any delays.

Note that read-after-write is not complete consistency: there’s also read-after-update and read-after-delete. Read-after-update consistency would allow edits to an existing file or changes to an already-existing object or updates of an existing table row to be immediately visible to all clients. That’s not the same thing as read-after-write, which is only for new data. Read-after-delete would guarantee that reading a deleted object or file or table row will fail for all clients, immediately. That, too, is different from read-after-write, which only relates to the creation of data.

Why is Read-After-Write Consistency Useful?

Read-after-write consistency allows you to build distributed systems with less latency. As touched on above, without read-after-write consistency you’ll need to incorporate some kind of delay to ensure that the data you just wrote will be visible to the other parts of your system.

But no longer. If you use S3 in the US-west or EU regions (or other regions supporting read-after-write consistency), your systems need not wait for the data to become available.

Update March 2011: As more S3 regions come online they seem to be getting the same features as US-West. So far the AP-Singapore and AP-Tokyo regions also support Read-After-Write consistency. US Standard does not.

Update June 2012: As pointed out in the comments below, more S3 regions now support read-after-write consistency: US-West Oregon, SA-Sao Paolo, and AP-Tokyo. It’s not easy keeping up with the pace of AWS’s updates!

Why Only in the AWS US-west and EU Regions not in the US Standard region?

Read-after-write consistency for AWS S3 is was only available in the US-west and EU regions, not the US-Standard region. I asked Jeff Barr of AWS blogging fame why, and his answer makes a lot of sense:

This is a feature for EU and US-West. US Standard is bi-coastal and doesn’t have read-after-write consistency.

Aha! I had forgotten about the way Amazon defines its S3 regions. US-Standard has servers on both the east and west coasts (remember, this is S3 not EC2) in the same logical “region”. The engineering challenges in providing read-after-write consistency in a smaller geographical area are greatly magnified when that area is expanded. The fundamental physical limitation is the speed of light, which takes at least 16 milliseconds to cross the US coast-to-coast (that’s in a vacuum – it takes at least four times as long over the internet due to the latency introduced by routers and switches along the way).

If you use S3 and want to take advantage of the read-after-write consistency, make sure you understand the cost implications: some other regions have higher storage and bandwidth costs than the US-Standard region.

Next Up: SQS Improvements?

Some vague theorizing:

It’s been suggested that AWS Simple Queue Service leverages S3 under the hood. The improved S3 consistency model can be used to provide better consistency for SQS as well. Is this in the works? Jeff Barr, any comment? 🙂

Categories
Cloud Developer Tips The Business of IT

The Open Cloud Computing Interface at IGT2009

Today I participated in the Cloud Standards & Interoperability panel at the IGT2009 conference, together with Shahar Evron of Zend Technologies, and moderated by Reuven Cohen. Reuven gave an overview of his involvement with various governments on the efforts to define and standardize “cloud”, and Shahar presented an overview of the Zend Simple Cloud API (for PHP). I presented an overview of the Open Grid Forum‘s Open Cloud Computing Interface (OCCI).

The slides include a 20,000-foot view of the specification, a 5,000-foot view of the specification, and an eye-level view in which I illustrated the metadata travelling over the wire using the HTTP Header rendering.

Here’s my presentation.

Categories
Cloud Developer Tips The Business of IT

How to Work with Contractors on AWS EC2 Projects

Recently I answered a question on the EC2 forums about how to give third parties access to EC2 instances. I noticed there’s not a lot of info out there about how to work with contractors, consultants, or even internal groups to whom you want to grant access to your AWS account. Here’s how.


First, a Caveat

Please be very selective when you choose a contractor. You want to make sure you choose a candidate who can actually do the work you need – and unfortunately, not everyone who advertises as such can really deliver the goods. Reuven Cohen’s post about choosing a contractor/consultant for cloud projects examines six key factors to consider:

  1. Experience: experience solving real world problems is probably more important than anything else.
  2. Code: someone who can produce running code is often more useful than someone who just makes recommendations for others to follow.
  3. Community Engagement: discussion boards are a great way to gauge experience, and provide insight into the capabilities of the candidate.
  4. Blogs & Whitepaper: another good way to determine a candidate’s insight and capabilities.
  5. Interview: ask the candidate questions to gauge their qualifications.
  6. References: do your homework and make sure the candidate really did what s/he claims to have done.

Reuven’s post goes into more detail. It’s highly recommended for anyone considering using a third-party for cloud projects.


What’s Your Skill Level?

The best way to allow a contractor access to your resources depends on your level of familiarity with the EC2 environment and with systems administration in general.

If you know your way around the EC2 toolset and you’re comfortable managing SSH keypairs, then you probably already know how to allow third-party access safely. This article is not meant for you. (Sorry!)

If you don’t know your way around the EC2 toolset, specifically the command-line API tools, and the AWS Management Console or the ElasticFox Firefox Extension, then you will be better off allowing the contractor to launch and configure the EC2 resources for you. The next section is for you.


Giving EC2 Access to a Third Party

[An aside: It sounds strange, doesn’t it? “Third party”. Did I miss two parties already? Was there beer? Really, though, it makes sense. A third party is someone who is not you (you’re the first party) and not Amazon (they’re the counterparty, or the second party). An outside contractor is a third party.]

Let’s say you want a contractor to launch some EC2 instances for you and to set them up with specific software running on them. You also want them to set up automated EBS snapshots and other processes that will use the EC2 API.


What you should give the contractor

Give the contractor your Access Key ID and your Secret Access Key, which you should get from the Security Credentials page:

The Access Key ID is not a secret – but the Secret Access Key is, so make sure you transfer it securely. Don’t send it over email! Use a private DropBox or other secure method.

Don’t give out the email address and password that allows you to log into the AWS Management Console. You don’t want anyone but you to be able to change the billing information or to sign you up for new services. Or to order merchandise from Amazon.com using your account (!).


What the contractor will do

Using ElasticFox and your Access Key ID and Secret Access Key the contractor will be able to launch EC2 instances and make all the necessary configuration changes on your account. Plus they’ll be able to put these credentials in place for automated scripts to make EC2 API calls on your behalf – like to take an EBS snapshot. [There are some rare exceptions which will require your X.509 Certificates and the use of the command-line API tools.]

For example, here’s what the contractor will do to set up a Linux instance:

  1. Install ElasticFox and put in your access credentials, allowing him access to your account.
  2. Set up a security group allowing him to access the instance.
  3. Create a keypair, saving the private key to his machine (and to give to you later).
  4. Choose an appropriate AMI from among the many available. (I recommend the Alestic Ubuntu AMIs).
  5. Launch an instance of the chosen AMI, in the security group, using the keypair.
  6. Once the instance is launched he’ll SSH into the instance and set it up. He’ll use the instance’s public IP address and the private key half of the keypair (from step 3), and the user name (most likely “root”) to do this.

The contractor can also set up some code to take EBS snapshots – and the code will require your credentials.


What deliverables to expect from the contractor

When he’s done, the contractor will give you a few things. These should include:

  • the instance ids of the instances, their IP addresses, and a description of their roles.
  • the names of any load balancers, auto scaling groups, etc. created.
  • the private key he created in step 3 and the login name (usually “root”). Make sure you get this via a secure communications method – it allows privileged access to the instances.

Make sure you also get a thorough explanation of how to change the credentials used by any code requiring them. In fact, you should insist that this must be easy for you to do.

Plus, ask your contractor to set up the Security Groups so you will have the authorization you need to access your EC2 deployment from your location.

And, of course, before you release the contractor you should verify that everything works as expected.


What to do when the contractor’s engagement is over

When your contractor no longer needs access to your EC2 account you should create new access key credentials (see the “Create a new Access Key” link on the Security Credentials page mentioned above).

But don’t disable the old credentials just yet. First, update any code the contractor installed to use the new credentials and test it.

Once you’re sure the new credentials are working, disable the credentials given to the contractor (the “Make Inactive” link).

The above guidelines also apply to working with internal groups within your organization. You might not need to revoke their credentials, depending on their role – but you should follow the suggestions above so you can if you need to.

Categories
Cloud Developer Tips The Business of IT

What Language Does the Cloud Speak, Now and In the Future?

You’re a developer writing applications that use the cloud. Your code manipulates cloud resources, creating and destroying VMs, defining storage and networking, and gluing these resources together to create the infrastructure upon which your application runs. You use an API to perform these cloud operations – and this API is specific to the programming language and to the cloud provider you’re using: for example, for Java EC2 applications you’d use typica, for Python EC2 applications you’d use boto, etc. But what’s happening under the hood, when you call these APIs? How do these libraries communicate with the cloud? What language does the cloud speak?

I’ll explore this question for today’s cloud, and touch upon what the future holds for cloud APIs.

Java? Python? Perl? PHP? Ruby? .NET?

It’s tempting to say that the cloud speaks the same programming language whose API you’re using. Don’t be fooled: it doesn’t.

“Wait,” you say. “All these languages have Remote Procedure Call (RPC) mechanisms. Doesn’t the cloud use them?”

No. The reason why RPCs are not provided for every language is simple: would you want to support a product that needed to understand the RPC mechanism of many languages? Would you want to add support for another RPC mechanism as a new language becomes popular?

No? Neither do cloud providers.

So they use HTTP.

HTTP: It’s a Protocol

The cloud speaks HTTP. HTTP is a protocol: it prescribes a specific on-the-wire representation for the traffic. Commands are sent to the cloud and results returned using the internet’s most ubiquitous protocol, spoken by every browser and web server, routable by all routers, bridgeable by all bridges, and securable by any number of different methods (HTTP + SSL/TLS being the most popular, a.k.a. HTTPS). RPC mechanisms cannot provide all these benefits.

Cloud APIs all use HTTP under the hood. EC2 actually has two different ways of using HTTP: the SOAP API and the Query API. SOAP uses XML wrappers in the body of the HTTP request and response. The Query API places all the parameters into the URL itself and returns the raw XML in the response.

So, the lingua franca of the cloud is HTTP.

But EC2’s use of HTTP to transport the SOAP API and the Query API is not the only way to use HTTP.

HTTP: It’s an API

HTTP itself can be used as a rudimentary API. HTTP has methods (GET, PUT, POST, DELETE) and return codes and conventions for passing arguments to the invoked method. While SOAP wraps method calls in XML, and Query APIs wrap method calls in the URL (e.g. http://ec2.amazonaws.com/?Action=DescribeRegions), HTTP itself can be used to encode those same operations. For example:

GET /regions HTTP/1.1
Host: cloud.example.com
Accept: */*

That’s a (theoretical) way to use raw HTTP to request the regions available from a cloud located at cloud.example.com. It’s about a simple as you can get for an on-the-wire representation of the API call.

Using raw HTTP methods we can model a simple API as follows:

  • HTTP GET is used as a “getter” method.
  • HTTP PUT and POST are used as “setter” or “constructor” methods.
  • HTTP DELETE is used to delete resources.

All CRUD operations can be modeled in this manner. This technique of using HTTP to model a higher-level API is called Representational State Transfer, or REST. RESTful APIs are mapped to the HTTP verbs and are very lightweight. They can be used directly by any language (OK, any language that supports HTTP – which is every useful language) and also by browsers directly.

RESTful APIs are “close to the metal” – they do not require a higher-level object model in order to be usable by servers or clients, because bare HTTP constructs are used.

Unfortunately, EC2’s APIs are not RESTful. Amazon was the undisputed leader in bringing cloud to the masses, and its cloud API was built before RESTful principles were popular and well understood.

Why Should the Cloud Speak RESTful HTTP?

Many benefits can be gained by having the cloud speak RESTful HTTP. For example:

  • The cloud can be operated directly from the command-line, using curl, without any language libraries needed.
  • Operations require less parsing and higher-level modeling because they are represented close to the “native” HTTP layer.
  • Cache control, hashing and conditional retrieval, alternate representations of the same resource, etc., can be easily provided via the usual HTTP headers. No special coding is required.
  • Anything that can run a web server can be a cloud. Your embedded device can easily advertise itself as a cloud and make its processing power available for use via a lightweight HTTP server.

All these benefits are important enough to be provided by any cloud API standard.

Where are Cloud API Standards Headed?

There are many cloud API standardization efforts. Some groups are creating open standards, involving all industry stakeholders and allowing you (the developer) to use them or implement them without fear of infringing on any IP. Some of them are not open, where those guarantees cannot be made. Some are language-specific APIs, and others are HTTP-based APIs (RESTful or not).

The following are some popular cloud APIs:

jClouds
libcloud
Cloud::Infrastructure
Zend Simple Cloud API
Dasein Cloud API
Open Cloud Computing Interface (OCCI)
Microsoft Azure
Amazon EC2
VMware vCloud
deltacloud

Here’s how the above products (APIs) compare, based on these criteria:

Open: The specification is available for anyone to implement without licensing IP, and the API was designed in a process open to the public.
Proprietary: The specification is either IP encumbered or the specification was developed without the free involvement of all ecosystem participants (providers, ISVs, SIs, developers, end-users).
API: The standard defines an API requiring a programming language to operate.
Protocol: The standard defines a protocol – HTTP.

This chart shows the following:

  • There are many language-specific APIs, most open-source.
  • Proprietary standards are the dominant players in the marketplace today.
  • OCCI is the only completely open standard defining a protocol.
  • Deltacloud was begun by RedHat and is currently open, but its initial development was closed and did not involve players from across the ecosystem (hence its location on the border between Open and Proprietary).

What Does This Mean for the Cloud Developer?

The future of the cloud will have a single protocol that can be used to operate multiple providers. Libraries will still exist for every language, and they will be able to control any standards-compliant cloud. In this world, a RESTful API based on HTTP is a highly attractive option.

I highly recommend taking a look at the work being done in OCCI, an open standard that reflects the needs of the entire ecosystem. It’ll be in your future.

Update 27 October 2009:
Further Reading
No mention of cloud APIs would be complete without reference to William Vambenepe’s articles on the subject: