Categories
Cloud Developer Tips

Amazon S3 Gotcha: Using Virtual Host URLs with HTTPS

Amazon S3 is a great place to store static content for your web site. If the content is sensitive you’ll want to prevent the content from being visible while in transit from the S3 servers to the client. The standard way to secure the content during transfer is by https – simply request the content via an https URL. However, this approach has a problem: it does not work for content in S3 buckets that are accessed via a virtual host URL. Here is an examination of the problem and a workaround.

Accessing S3 Buckets via Virtual Host URLs

S3 provides two ways to access your content. One way uses s3.amazonaws.com host name URLs, such as this:

http://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey

The other way to access your S3 content uses a virtual host name in the URL:

http://mybucket.mydomain.com.s3.amazonaws.com/myObjectKey

Both of these URLs map to the same object in S3.

You can make the virtual host name URL shorter by setting up a DNS CNAME that maps mybucket.mydomain.com to mybucket.mydomain.com.s3.amazonaws.com. With this DNS CNAME alias in place, the above URL can also be written as follows:

http://mybucket.mydomain.com/myObjectKey

This shorter virtual host name URL works only if you setup the DNS CNAME alias for the bucket.

Virtual host names in S3 is a convenient feature because it allows you to hide the actual location of the content from the end-user: you can provide the URL http://mybucket.mydomain.com/myObjectKey and then freely change the DNS entry for mybucket.mydomain.com (to point to an actual server, perhaps) without changing the application. With the CNAME alias pointing to mybucket.mydomain.com.s3.amazonaws.com, end-users do not know that the content is actually being served from S3. Without the DNS CNAME alias you’ll need to explicitly use one of the URLs that contain s3.amazonaws.com in the host name.

The Problem with Accessing S3 via https URLs

https encrypts the transferred data and prevents it from being recovered by anyone other than the client and the server. Thus, it is the natural choice for applications where protecting the content in transit is important. However, https relies on internet host names for verifying the identity certificate of the server, and so it is very sensitive to the host name specified in the URL.

To illustrate this more clearly, consider the servers at s3.amazonaws.com. They all have a certificate issued to *.s3.amazonaws.com. [“Huh?” you say. Yes, the SSL certificate for a site specifies the host name that the certificate represents. Part of the handshaking that sets up the secure connection ensures that the host name of the certificate matches the host name in the request. The * indicates a wildcard certificate, and means that the certificate is valid for any subdomain also.] If you request the https URL https://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey, then the certificate’s host name matches the requested URL’s host name component, and the secure connection can be established.

If you request an object in a bucket without any periods in its name via a virtual host https URL, things also work fine. The requested URL can be https://aSimpleBucketName.s3.amazonaws.com/myObjectKey. This request will arrive at an S3 server (whose certificate was issued to *.s3.amazonaws.com), which will notice that the URL’s host name is indeed a subdomain of s3.amazonaws.com, and the secure connection will succeed.

However, if you request the virtual host URL https://mybucket.mydomain.com.s3.amazonaws.com/myObjectKey, what happens? The host name component of the URL is mybucket.mydomain.com.s3.amazonaws.com, but the actual server that gets the request is an S3 server whose certificate was issued to *.s3.amazonaws.com. Is mybucket.mydomain.com.s3.amazonaws.com a subdomain of s3.amazonaws.com? It depends who you ask, but most up-to-date browsers and SSL implementations will say “no.” A multi-level subdomain – that is, a subdomain that has more than one period in it – is not considered to be a proper subdomain by recent Firefox, Internet Explorer, Java, and wget clients. So the client will report that the server’s SSL certificate, issued to *.s3.amazonaws.com, does not match the host name of the request, mybucket.mydomain.com.s3.amazonaws.com, and refuse to establish a secure connection.

The same problem occurs when you request the virtual host https URL https://mybucket.mydomain.com/myObjectKey. The request arrives – after the client discovers that mybucket.mydomain.com is a DNS CNAME alias for mybucket.mydomain.com.s3.amazonaws.com – at an S3 server with an SSL certificate issued to *.s3.amazonaws.com. In this case the host name mybucket.mydomain.com clearly does not match the host name on the certificate, so the secure connection again fails.

Here is what a failed certificate check looks like in Firefox 3.5, when requesting https://images.mydrifts.com.s3.amazonaws.com/someContent.txt:

Here is what happens in Java:

javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching media.mydrifts.com.s3.amazonaws.com found.
at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1591)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:187)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:181)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:975)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:123)
at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:516)
at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:454)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:884)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1096)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1123)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1107)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:405)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:166)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching media.mydrifts.com.s3.amazonaws.com found.
at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:193)
at sun.security.util.HostnameChecker.match(HostnameChecker.java:77)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkIden
tity(X509TrustManagerImpl.java:264)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:250)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:954)

And here is what happens in wget:

$ wget -nv https://media.mydrifts.com.s3.amazonaws.com/someContent.txt
ERROR: Certificate verification error for media.mydrifts.com.s3.amazonaws.com: unable to get local issuer certificate
ERROR: certificate common name `*.s3.amazonaws.com' doesn't match requested host name `media.mydrifts.com.s3.amazonaws.com'.
To connect to media.mydrifts.com.s3.amazonaws.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

Requesting the https URL using the DNS CNAME images.mydrifts.com results in the same errors, with the messages saying that the certificate *.s3.amazonaws.com does not match the requested host name images.mydrifts.com.

Notice that the browsers and wget clients offer a way to circumvent the mis-matched SSL certificate. You could, theoretically, ask your users to add an exception to the browser’s security settings. However, most web users are scared off by a “This Connection is Untrusted” message, and will turn away when confronted with that screen.

How to Access S3 via https URLs

As pointed out above, there are two forms of S3 URLs that work with https:

https://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey

and this:

https://simplebucketname.s3.amazonaws.com/myObjectKey

So, in order to get https to work seamlessly with your S3 buckets, you need to either:

  • choose a bucket whose name contains no periods and use the virtual host URL, such as https://simplebucketname.s3.amazonaws.com/myObjectKey or
  • use the URL form that specifies the bucket name separately, after the host name, like this: https://s3.amazonaws.com/mybucket.mydomain.com/myObjectKey.

Update 25 Aug 2009: For buckets created via the CreateBucketConfiguration API call, the only option is to use the virtual host URL. This is documented in the S3 docs here.