This past Friday, we migrated DoneDone – top-to-bottom – over to Amazon Web Services.
We spent many weeks re-architecting parts of the application so they’d be better suited for a Cloud hosting environment like AWS. One major change is the way we now handle how email is delivered within DoneDone.
On our old environment, DoneDone emails were sent out by the host web server when you performed any action that necessitated an email. Adding an issue, submitting a comment, or creating a release build likely triggered a set of emails that were sent directly from your button click.
Offloading work to the Amazon queues
With AWS, we decided to take advantage of both SQS and SES, Amazon’s queue and email services. They were quite simple to integrate, and AWS comes with a relatively slick SDK for .NET to boot. With the queues, we’re now punting the email workload off of our web servers. When you perform any email-generating action, DoneDone now serializes messaging objects to JSON, pushes them up to SQS, and goes about it’s merry way.
If you use HipChat, the same thing happens. Rather than have HipChat messages sent out on button click, DoneDone serializes and pushes those messages up to a queue as well. A separate Jobs server pulls these messages off the queue, within seconds, and processes them accordingly. All emails are also now being sent through SES.
However, one of the limitations AWS places on SES is the number of emails you can send per day. You can request an increase to that limit by filling out a form on the AWS console. But, rather than saying “I’d like to send ONE BILLION EMAILS per day!” I thought it prudent to find out how many DoneDone actually might send. This led me down interesting paths.
In search for some real numbers
On our old environment, this metric really wasn’t that important to us. So, there wasn’t a readily available log for me to easily pull these numbers from. Instead, I took a snapshot of our live database and wrote a few queries against it to get at an educated guess for the number of emails DoneDone sends daily. My goal here wasn’t to get an exact number, but at least something better than an order of magnitude from the real number. And, when you migrate your app to a service like Amazon, where metrics are king, you just get into that scientific mood sometimes.
I wrote a SQL query to see, roughly, how many potential emails might be sent out by DoneDone per day by…
- Joining issues to their various histories (comments, status updates, and the like).
- Then, I joined each of those records to the members on each issue.
- Then, I ran a count on the entire recordset, grouped by day.
- Finally, I sorted those counts from the highest number on a given day to the lowest for all days in 2013.
Now, the resulting numbers aren’t necessarily exactly how many emails are sent out per day. In fact, they are very likely an optimistically large estimate because emails aren’t sent to the person who created an issue or comment, and users can opt out of emails. But it’s at least in the ballpark. And, that’s all I care about.
Now, with any query like this – one that groups very common actions into equal periods of time and sorts them from most to least, I’ve typically observed long tail curves like this one:
There are usually a handful of really high peak anomaly periods (denoted in green) followed by a long tail of periods that have roughly the same amount of actions. However, when I ran this particular query, here’s what I got:
Not one, but two long tails! DOUBLE RAINBOW!!! OHHHH…HO HO….OHHHHhhh….
My initial thought was that I surely had messed up my query. After all, I ran this query while I was on an airplane connected to Wi-Fi whilst watching ESPN on Satellite TV. Surely, some wires got crossed somewhere between my 35,000 foot position over Kansas and planet Earth.
It then dawned on me. I usually run metrics like these against week or month ranges, rather than days. Like most work-based web apps, our traffic oscillates predictably throughout the work week and then flat lines on the weekends.
So, that second long tail must have something to do with weekend traffic. As I thought some more, I realized that there actaully should be two long tails: One that represents Monday-Friday traffic, and another that represents Saturday-Sunday traffic.
In other words, there should be an obvious drop in sent emails between any given weekday and any given weekend-day. But, the collective days during the week and the collective days during the weekends, ordered by descending volume of emails sent, should create their own separate long tail curves.
To prove my theory, I measured out the graph to a point 5/7 of the way down the x-axis. If my theory is right, then the gigantic dip into the second long tail should occur right at that 5/7 mark: Five out of seven days are during the week. Let’s see how my hypothesis worked out:
Success! As you can see, the 5/7 marker falls right where the graph tumbles to the second long tail curve.
As it turns out, at our peak day on 2013, approximately 40,000 emails were sent out from DoneDone. On an average weekday, it’s more like 25,000. And, on the weekends, emails fall well below 5,000.
Again, given the corner-cutting nature of my query, I can tell you that while it’s not a very precise number (in fact, likely an overshoot), it gave me some evidence as to what types of email limits I can safely place for using Amazon’s Simple Email Service.
Perhaps, more importantly, it was a double rainbow….all the way.