April 11th, 2010
If you haven’t done so already, read this post The Challenge of Scaling an Adserver first.
I received a lot of feedback on this post, both in comments and private emails — some good, some bad. Two things were pointed out to me in relation to innovation and established players that I thought were worth a follow-up post.
When real businesses depend on you
Most startups I know release as often as they can. On the balance of testing versus finding bugs in production, most err towards production. Hell, it can take days of testing to ensure a clean release, whereas only 30 minutes in production will point out pretty much every bug in the code.
Here’s the problem — that attitude doesn’t really work once there are large public companies that depend on your platform for 100% of their entire business — often the case when you are the true adserver of record. I know, crazy idea, there are people that pay us money to run these adservers! Once a certain amount of money starts flowing through a serving system suddenly the engineering and operations teams are held to a whole new standard than before. 30 minutes of “bad data” in production can mean hundreds of thousands of dollars of lost revenue — ouch.
Generally this pressure doesn’t come until after a bad release which costs somebody a lot of money. At this point, the CEO mandates that all production releases must go through rigorous testing to ensure that such a catastrophic event will absolutely never happen again. Immediately job postings will go out for more QA engineers and very quickly every release has to go through several steps of vetting and testing before it is deemed ok to release. Very quickly, cultures change to focus less on features and more on stability. As we all know… stability is the evil enemy of change. Change, sadly, is a requirement for innovation.
The second thing I forgot to mention in my last post is the concept of “invisible features”. Running a simple adserver with little volume there are a very large number of “off the shelf” components that you can use that simply work at low scale.
Take for example server-side cookies. Imagine I’m the small startup discussed in our last post. I’m seeing about 100k unique users a day with my adserver, for a total of about 1M/month. I store about 1kb worth of data per user which means I have about a gigabyte of total data to store. I’m only doing about 3000 ads per second, so I buy myself a nice new shiny Dell server, install MySQL and use that as my server-side data store. Every few hours or so I use mysqldump to make a full copy of my 1GB of data and copy that to a central storage server. I’ve got gigabit ethernet so it only takes about 12.5 seconds to make a full backup of my data.
Compare that to an enterprise adserver… that’s got a few billion ads a day with 150M unique users/month, with 3kb of data each. Not only that, but enterprise active customers require redundant datacenters to meet SLAs — this means all data has to be available on both the east and west coasts — at the same time. This means the enterprise adserver has to store 450GB of data in two datacenters, and be able to read/write to this data 100s of thousands of times per second. Guess what MySQL will do in this situation?
I won’t dive into all the details, but suffice to say that just backing up 450GB of data will take over an hour and a half on a GigE connection — compare that to the 12.5 seconds it takes at small scale and you probably get an idea of how hard of a problem it is to keep 450GB of data in sync across the country. There is no off the shelf solution that addresses this problem. What that means is that instead the engineering teams now need to focus significant energy and resources into building a mass distributed primary-key value store.
Functionally there is little to no difference to the end-user of the adserver of the two different systems. Both store a certain amount of data on a user… the difference is one system scales and requires a significant amount of effort while the other doesn’t. This is what I would call an “invisible feature”. It’s one of the things that one has to do to be a viable scalable adserving company but one that goes utterly unnoticed by customers (assuming everything works).
Server side cookies aren’t the only one of these systems that can be found “off the shelf” at low volumes but break when we start talking billions. Data pipelines, data warehouses, ETL systems, operational systems that can manage 100s of servers at a time — and this isn’t even to mention all the supporting hardware an infrastructure — core switches, load-balancers, CDNs and ISPs — each items that start to fail at billions and require upgrades and custom integrations to function properly at scale.
How do you know?
One of the things that’s so challenging in this market is that few of the new technology companies has had to deal much with the scale problem. So as a buyer of an adserving system, how do you know that it’ll work? If both company A and company B have server-side cookies, how do you pick the one that’ll work at scale? That’s the problem… if neither is doing serious volume, who knows! More on that in another blog post.
PS: There’s a great paper on Amazon’s Dynamo Store — a similar technical problem to the one most adservers face on building server-side cookie stores. It’s a great read if you’re curious as to the problems one needs to solve at scale.
- On Scale Webinar!
- Tech Talk on Adserving & Scalability Thursday May 18th
- Are you generating revenue?
- Architecting for immediate need or future flexibility
- The Challenge of Scaling an Adserver