Shopping for a CDN

Saturday, June 5. 2010
Comments

In this blog post I'll compare different CDNs with each other, on the list are:

  • Akamai (through MySpace)
  • CacheFly
  • CloudFront
  • EdgeCast (twice, through Speedyrails)
  • LimeLight Networks (through mydeo)
  • … and Amazon S3 — the pseudo CDN

Thanks to SpeedyRails, EasyBib (CacheFly, Cloudfront, S3) and mydeo for helping with these tests.

What's a CDN?

A CDN (Content Delivery Network) is a service usually offered by Tier1's or at least companies that have a so-called global network footprint.

A CDN lets you distribute your assets/content on an array of servers and the nifty technology behind it makes sure that a customer is always transparently routed to a server closer to them, thus making it faster for the client to fetch the assets.

Content, or assets, can be anything such as images, css, JavaScript or media (audio, video). My numbers focus on assets primarily, I haven't run any tests with larger media files.

An example for CDN usage would be that, let's say I go to myspace.com — all the required assets are distributed using a CDN run by Akamai. When I browse myspace, the JavaScript files are pulled from a server located in Frankfurt. Whereas when I browse MySpace from the U.K., the files are pulled from a server in the U.K..

All of this is — as I said — transparent, which means that I don't really notice a difference when I go to the website. It should be faster though.

Performance

I'll skip over why it makes sense to use a CDN from a pure performance point of view. A much better blog article is available at the Yahoo! developer blog

When is a CDN necessary?

I wouldn't recommend getting a CDN for a blog — unless you're TechCrunch and live off of it. In my opinion this is a gray area. If you make money and your traffic is not just local (to the location of your server), consider a CDN, it's more affordable than you think.

On monitoring

Pingdom is a nifty distributed monitoring service.

What Pingdom does is the following: Pingdom allows you to setup checks (literally within minutes) and then it runs the monitoring from different locations world wide.

The advantage of multiple locations is that you do know if for example your website is not available for everyone, or if it's a local issue of a backbone provider, etc.. Beyond general availability, Pingdom also gather data on response times (average, fastest and slowest) and lets you filter on all of the above.

The current locations from which your website is monitored include Amsterdam (Netherlands), Atlanta, GA (U.S.), Chicago, IL (U.S.), Copenhagen (Denmark), Dallas, TX (U.S.), Frankfurt (Germany), Herndon, VA (U.S.), Houston, TX (U.S.), London (U.K.), Los Angeles, CA (U.S.), Montreal (Canada), New York, NY (U.S.), Stockholm (Sweden) and Tampa, FL (U.S.). In some locations, Pingdom employs multiple monitors.

The only downside I can see is that Pingdom has no footprint in all of Asia, South America or Africa. So in case you're target demo is from either of those places, I'd advice you to gather your own numbers.

Well, gathering your own research data might be a good idea regardless.

Numbers

I used a minified jQuery library to compare the results of the various CDN vendors.

Amazon S3

Why do I consider S3 to be a pseudo CDN. Well, for starters — Amazon S3 is not distributed.

By nature, it shouldn't be used as a CDN. The problem is though that many people still do. Take a look at Twitter and think twice why a page takes so long to load (and the avatars are always last). There's your answer.

In order to be fair — Twitter also sometimes switches to Cloudfront (216.137.61.222) (or Akamai (213.248.124.139)?). I haven't really figured out why they don't stick to a real CDN period.

Besides, I think using Cloudfront is still not the best choice, thinking about it, they should of course use Joe Stump's project tweetimag.es (which uses EdgeCast).

Stats porn

Spoiler: 100% uptime on all of them! ;-)

But on to the stats!

Akamai

akamai-7day

  • provider: Akamai
  • 7 day period
  • average response time: 65 ms
  • slowest average response time: 289 ms
  • fastest average response time: 19 ms

Akamai is probably the most well-known CDN. The clear advantage of Akamai over others — they are everywhere. And they charge an arm and a leg for it too. ;-) (No offense meant!)

CacheFly

cachefly-7days

  • provider: CacheFly
  • 7 day period
  • average response time: 132 ms
  • slowest average response time: 1,506 ms
  • fastest average response time: 69 ms

CacheFly is another older CDN providers (~11 years). Pretty nice support and lots of custom options available when you email them. On their todo is a transparent proxy (WANT).

CacheFly has never failed me in over four years.

Cloudfront

cloudfront-7day

  • provider: Amazon Cloudfront
  • 7 day period
  • average response time: 276 ms
  • slowest average response time: 1,983 ms
  • fastest average response time: 171 ms

Cloudfront is Amazon's idea of a CDN. It integrates well with Amazon S3. There's no transparent proxy option and it's not as distributed. And remember, it's all eventually consistent.

EdgeCast

EdgeCast offers two options. Small and large files. Small files are a little more expensive but it's generally suggested that they work just as well as large files. The small files option distributes your assets on SSD (Solid State Disk!).

The suggested use case is that large is for video and audio assets.

Regardless of the options, check the graphs and the numbers for some serious head scratching.

Large

edgecast-big-7days

  • provider: EdgeCast (big files)
  • 7 day period
  • average response time: 77 ms
  • slowest average response time: 987 ms
  • fastest average response time: 22 ms
Small

edgecast-small

  • provider: EdgeCast (small files)
  • 7 day period
  • average response time: 91 ms
  • slowest average response time: 1627 ms
  • fastest average response time: 28 ms

Limelight

limelight-7days

  • provider: Limelight through MyDeo
  • 7 day period
  • average response time: 216 ms
  • slowest average response time: 1,668 ms
  • fastest average response time: 28 ms

And why is Limelight so slow? I don't think I can blame it entirely on Limelight. In contrast to other resellers, such as Speedyrails (which resells EdgeCast), MyDeo gives you a url with mydeo.com. And this domain uses Godaddy's rather crappy DNS service so I'm guessing that part of the poor performance is due to them.

Amazon S3

amazon-s3-7days

ROFLMAO LOL!!!111one

  • provider: Amazon S3
  • 7 day period
  • average response time: 534 ms
  • slowest average response time: 2,323 ms
  • fastest average response time: 331 ms

Quo vadis CDN?

My first advice to all resellers would be to get Pingdom and constantly run monitoring to make sure the system behaves as expected. Or as the production description suggests. :-)

On Pingdom itself — of course there may be issues as well (not that I noticed). But I don't think these are a factor here. I've been running these tests for almost two months now and a different 7 day time frame didn't look too different. No one performed much better or far worse.

Here are the numbers again, side by side:

Provider Average (ms) Slowest average (ms) Fastest average (ms)
Akamai 65 289 19
CacheFly 132 1,506 69
Cloudfront 275 1,983 171
EdgeCast (large) 77 987 22
EdgeCast (small) 91 1627 28
Limelight 216 1,668 28
Amazon S3 534 2,323 331

Comment

Akamai is almost in a league of its own. Of all contenders they offer the best CDN hands down. If anyone reselling Akamai at a reasonable price reads this, feel free to leave a comment or email me. Of course I'd be interested.

Still, it's a little surprising that Akamai is not further ahead of Edgecast.

Cloudfront versus others — from personal testing and also doing the math on S3 (storage, PUT, GET) with the addition of Cloudfront on top of it, I have to say that this is a pretty expensive service and probably only useful in terms of unified billing (one provider to rule them all). If this is not an issue, I suggest you find another.

CacheFly has great support, but lacks feature and it's also pretty expensive compared to others.

EdgeCast vs. EdgeCast — I have to contact Speedrails to find out if they gave me the wrong URLs or why the more expensive option did worse in these tests. That'll be interesting to figure out. Regardless of this bit, the performance is pretty stellar and the closest to Akamai.

I'll revisit Limelight and mydeo later again.

Fin

It's pretty obvious for us that we are switching from CacheFly to another CDN over the summer.

And not just because of the general performance but also because for example EdgeCast (through SpeedyRails) seems to be a lot more cost effective while offering more features and of course the much better performance at the same time.

In case there are questions, I can extract more numbers.

EC2 security group owner ID

Sunday, May 9. 2010
Comments

I recently had the pleasure to setup an RDS instance and it took me a while to figure out what the --ec2-security-group-owner-id parameter needs to be populated with when you want to allow access to your RDS instance from instances with a certain security group.

To cut to the chase, you need to log into AWS and then click the following link — done.

A toolchain for CouchDB Lounge

Friday, February 26. 2010
Comments

One of our biggest issues with CouchDB is currently the lack of compaction of our database, and by lack of, I don't mean that CouchDB doesn't support it, I mean that we are unable to actually run it.

Compaction in a nutshell

Compaction in a nutshell is pretty cool.

As you know, CouchDB is not very space-efficient. For once, CouchDB saves revisions of all documents. Which means, whenever you update a document a new revision is saved. You can rollback any time, or expose it as a nifty feature in your application — regardless, those revisions are kept around until your database is compacted.

Think about it in terms of IMAP - emails are not deleted until you hit that magic "compact" button which 99% of all people who use IMAP don't know what it's for anyway.

Another thing is that whenever new documents are written to CouchDB and bulk mode is not used, it'll save them in a way which is not very efficient either. In terms of actual storage and indexing (so rumour has it).

Compaction woes

Since everything is simple with CouchDB, compaction is a simple process in CouchDB too. Yay!

When compaction is started, CouchDB will create a new database file where it stores the data in a very optimized way (I will not detail on this, go read a science book or google if you are really interested in this!). When the compaction process finished, CouchDB will exchange your old database file with the new database file.

The woes start with that e.g. when you have 700 GB uncompacted data, you will probably need another 400 GB for compaction to finish because it will create a second database file.

The second issue is that when you have constant writing on your database, the compaction process will actually never finish. It kind of sucks and for those people who aim to provide close to 100% availability, this is extremely painful to learn.


Continue reading "A toolchain for CouchDB Lounge"

Small notes on CouchDB's views

Wednesday, October 21. 2009
Comments

I've been wrestling with a couple views in CouchDB currently. This blog post serves as mental note to myself, and hopefully to others. As I write this, i'm using 0.9.1 and 0.10.0 in a production setup.

Here's the environment:

  • Amazon AWS L Instance (ami-eef61587)
  • Ubuntu 9.04 (Jaunty)
  • CouchDB 0.9.1 and 0.10.0
  • database size: 199.8 GB
  • documents: 157408793

On to the tips

These are some small pointers which I gathered by reading different sources (wiki, mailing list, IRC, blog posts, Jan ...). All those revolve around views and what not with a relatively large data set.

Do you want the speed?

Building a view on a database of this magnitude will take a while.

In the beginning I estimated about week and a half. And it really took that long.

Things to consider, always:

  • upgrade to trunk ;-) (or e.g. to 0.10.x)
  • view building is CPU-bound which leads us to MOAR hardware — a larger instance

The bottom line is, "Patience (really) is a virtue!". :-)

A side-note on upgrading: Double-check that upgrading doesn't require you to rebuild the views. That is, unless you got time.

View basics

When we initially tested if CouchDB was made for us we started off with a bunch off emit(doc.foo, doc)-like map functions in (sometimes) temporary views. On the production data, there are a few gotcha's.

First off — the obvious: temporary views are slow.

Back to JavaScript

Emitting the complete document will force CouchDB to duplicate data in the index which in return needs more space and also makes view building a little slower. Instead it's suggested to always emit(doc.foo, null) and then POST with multiple keys in the body to retrieve the documents.

Reads are cheap, and if not, get a cache.

doc._id

In case you wonder why I don't do emit(doc.foo, doc._id)? Well, that's because CouchDB is already kind enough to retrieve the document's ID anyway. (Sweet, no?)

include_docs

Sort of related, CouchDB has a ?include_docs=true parameter.

This is really convenient — especially when you develop the application.

I gathered from various sources that using them bears a performance penalty. The reason is that include_docs issues another b-tree lookup for every row returned in the initial result. Especially with larger sets, this may turn into a bottleneck, while it can be considered OK with smaller result sets.

As always — don't forget that HTTP itself is relatively cheap and a single POST request with multiple keys (e.g. document IDs) in the body is likely not the bottleneck of your operation — compared to everything else.

And if you really need to optimize that part, there's always caching. :-)

Need a little more?

Especially when documents of different types are stored into the same database (Oh, the beauty of document oriented storage!), one should consider the following map-example:

if (doc.foo) {
    emit(doc.foo, null)
}

.foo is obviously an attribute in the document.

JavaScript vs. Erlang

sum(), I haven't found too many of these — but with version 0.10+, the CouchDB folks implemented a couple JavaScript functions in Erlang, which is an easy replacement and adds a little speed on top. :-) So in this case, use _sum.

Compact

Compact, I learned, knows how to resume. So even if you kill the process, it'll manage to resume where it left off before.

When you populate a database through bulk writes, the gain from a compact is relatively small and is probably neglectable. Especially because compacting a database takes a long while. Keep in mind that compaction is disk-bound, which is often one of the final and inevitable bottlenecks in many environments. Unless hardware is designed ground up, this will most likely suck.

Compaction can have a larger impact when documents are written one by one to a database, or a lot of updates have been committed on the set.

I remember that when I build another set with 20 million documents one by one, I ended up with a database size of 60 GB. After I compacted the database, the size dropped to 20 GB. I don't have the numbers on read speed and what not, but it also felt more speedy. ;-)

Fin

That'd be it. More next time!

Thoughts on RightScale

Tuesday, October 20. 2009
Comments

RightScale provides all kinds of things — from a pre-configured MySQL master-slave setup (with automatic EBS/s3 backups), to a full LAMP stack, Rails app servers, virtually all kinds of other pre-configured server templates to a nifty auto-scaling feature.

We decided to leverage RightScale when we planned our move to AWS a couple months ago in order to not have to build everything ourselves. I've been writing this blog entry for the past five weeks and here are some observations, thoughts and tips.

RightScale

First off, whatever you think, and do, or have done so far, let me assure you, there's always a RightScale way of doing things. For (maybe) general sanity (and definitely your own), I suggest you don't do it their way — always.

One example for the RightScale way is, that all the so-called RightScripts will attempt to start services on reboot for you, instead of registering them with the init system (e.g., on Ubuntu, update-rc.d foo defaults) when they are set up.

You may argue that RightScale's attempt will provide you with a maybe more detailed protocol of what happened during the boot sequence, but at the same time it provides more potential for errors and introduces another layer around what the operating system provides, and what generally works pretty well already.

PHP and RightScale

RightScale's sales team knows how to charm people, and when I say charm, I do not mean scam (just for clarity)! :-)

The demos are very impressive and the client show cases not any less. Where they really need to excel though are PHP-related demos because not everyone in the world runs Ruby on Rails yet. No, really — there's still us PHP people and also folks who run Python, Java and so on.

Coming from the sales pitch, I felt disappointed a little because a standard PHP setup on RightScale is as standard as you would think three years ago. mod_php, Apache2 and so on. The configuration itself is a downer as well, a lot of unnecessary settings and generally not so speedy choices. Then remember that neither CentOS nor Ubuntu are exactly up to date on packages and add another constraint to the mix — Ubuntu is on 8.04 which is one and half years in the past as I write this entry.

And even though I can relate to RighScale's position — in terms of that supporting customers with all kinds of different software is a burden and messy to say the least — I am also not a fan.

Scaling up

The largest advantage when you select a service provider such as RightScale is, that they turn raw EC2 instances into usable servers out of the box. So far example setting up a lamp stack yourself requires time, while it's still a trivial task for many. With RightScale, it's a matter of a couple clicks — select image, start, provide input variables and done.

Aside from enhanced AMIs RightScale's advantage is auto-scaling. Auto-scaling has been done a couple times before. There are more than one service provider which leverages EC2 and provides scalability on top. Then take a look at Scalr, which is open source, and then recently Amazon themselves added their own Elastic Load Balancer.

In general, I think auto-scaling is something everyone gets, and wants, but of course it's not that dead simple to implement. And especially when you move to a new platform, it's a perfect trade off to sacrifice a flexibility and money for a warm and fuzzy "works out of the box" feeling.


Continue reading "Thoughts on RightScale"