Nine months into selling our first web-based product, DoneDone, we’ve learned a lot of lessons. Bugs, even the most hidden and seemingly dormant, are eventually found because people are using your product all over the world at all different times of day.
One of the big lessons we’ve learned is how we program against third-party services. Like most small businesses, we can only build so much software. Using third-party tools is crucial – particularly on the onerous things that aren’t unique to your software. In our case, we are currently using the Rackspace Cloud to host uploaded files and Zuora to handle billing. They are both crucial components of our software that we simply don’t have the time to build ourselves. And, having third-party services strictly dedicated to specific tasks (like file hosting or recurring billing) means we’re almost always in good hands.
The downside to third party services is you’re somewhat at their mercy. If our billing system goes down or uploads aren’t working, it’s a phone call or a ticket submission and a waiting game. There’s no opening up an IDE, finding the leak, compiling, and pushing a quick update.
But, there is something you can control – how your application handles the occasional broken external service. While it should seem like programming 101, we initially didn’t handle exceptions from our third-party services very well at all.
At one point, if our billing service was down, it actually meant you couldn’t even login to DoneDone at all. Whenever you login to DoneDone, it checks the account status by hitting an internal service we have called WAMBAM (We Are Mammoth Business Account Manager). Only a small subset of the methods in WAMBAM require talking to Zuora, but we were logging into Zuora each time we made a call to WAMBAM. So, if Zuora servers were down, so were we. An unfortunate, yet entirely easy mistake to make.
We recently had a similar issue with the Rackspace Cloud. Their API was down for approximately an hour last week, but it inadvertently caused issues with submitting a bug, even if you weren’t uploading anything. Again, we were accessing the third-party when we needn’t be.
We could pit the blame on third-party services. From our perspective, they should never go down! But, they do. All things break down now and then. We needed to do a better job of handling problems when they have problems. It’s too easy to make the mistake of not catching exceptions properly from third party apps. It’s also crucial to only call third party services when you absolutely need to.
When you couple the fact that your third party services are probably, in turn, using other third party services (and so on) it’s everyone’s duty to make sure their apps run as smoothly as possible when things out of their control go down for the count.