Since 2010, Optoro has used Amazon Web Services (AWS) as its cloud-computing provider.  We relied on them to supply the horsepower needed to drive our IT resources and applications. However, after some hard analysis, we decided to move away from the AWS and onto our own infrastructure. At a time when so many SaaS’s/IaaS’s/PaaS’s exist, why would we decide to run a data-center’s worth of gear? AWS has been a large drain on our budget at scale, and we wanted a more cost-efficient solution.

In the summer of 2015, we ran a high monthly bill and a lot of EC2 instances. See our historic AWS usage below:

screen-shot-2016-10-13-at-11-46-31-am

Historic AWS Usage

During summer 2015, we had already begun planning for the holiday season. As a company that manages returns and D2C sales, we knew our inventory would increase. We knew that we could scale and spend our way through the holidays, but Optoro aimed to maintain a high level of volume after the season ended. We had already started hitting the limits of what our AWS magnetic volumes could handle on our principal MySQL database server, as well as our MongoDB cluster. Once the holidays arrived, our expenditures would continue to increase to cover newer GP2 disks, which cost more on AWS. Given the high traffic levels, our annual spend would have increased even more than we anticipated.

To determine the best way to tackle this issue, we chatted with our financial team and reviewed quotes from vendor conversions, and created a financial model. We knew that Optoro prioritized spreading out payments and moving away from three-year RIs in order sustain long-term flexibility, and when creating the model, we aimed to compare the value of AWS to the value of running on our own system. Our cloud cost estimation assumed a 2% increase every month (this was our historic average) and building 1 YR all upfront RIs (for which Amazon provides us with a 40% overall discount on our EC2 instances). We did not factor performance into this model because AWS instances have variable performance across the board, so we considered AWS to be as performant as bare metal. As a result of our analysis, we concluded that moving onto our own servers would reduce our costs by a significant amount (see below).  

screen-shot-2016-10-13-at-11-47-51-am

Hybrid Infrastructure vs. AWS – Projected Monthly Spend with RIs

The red spikes are the re-upping of the AWS RIs while the troughs account for the steady state (EBS volumes, network charges, etc) as well as EC2 instances that hadn’t been converted to RIs.

 

screen-shot-2016-10-13-at-11-49-58-am

Hybrid Infrastructure vs. AWS – Total Cash Out

In the last month we spent less on the hybrid model than on the AWS 1YR RI model.

Upon implementation, our self-hosted solution halved our costs, including the integrated power costs required for expanding our data center capabilities. Additionally, with all SSDs and 10 Gb networking, we ended up with better performance and twice the capacity needed to move off of AWS. According to our current numbers, running our own stack resulted in 66% savings.

We have received a lot of questions from the tech community about our decision to switch, and we realize that it might have been possible to double down on AWS and better leverage our AWS services. However, this wasn’t a tenable, long-term solution for us; our app simply wasn’t designed for it.  We still plan on leveraging the cloud for its specialties: burst capacity, testing, and object storage, but like many others, we are moving to a more service-oriented architecture (SOA). However, we aren’t quite there yet. Even with an SOA, we may not be better served by on-demand infrastructure. Ultimately, it made a lot of financial sense to develop our own stack so we could focus on stabilizing the ground we have covered since Optoro’s inception and keep the road clear for scaling and building in the future.

Some Takeaways:

  • Better Visibility: With cloud services, we did not have the breakdown of the charges and usage that we needed for proper analytics. Hosting an in-house solution simplified reporting and usage monitoring for us. Ice or teevity help for those still on the cloud, but it is a pretty complex alternative to DIY.
  • Simplified Pricing Scheme: Relying on internal hardware simplifies monthly cost management. When using AWS, we spent 20 – 25% on EBS, and the available RIs cannot cover for that at scale. Spreading out the payments was a more viable solution for us since RIs are an expensive upfront cost.
  • Cost-efficiency: To put things in perspective, running a d2.8xlarge in the cloud is 1.6x the price of buying an equivalent Dell R730 to handle the same number of Virtual Machines (VMs). It’s interesting to note that AWS’s ROI calculator will produce similar results if you input that you run your stack in forty 256GB 16 core machines.
  • Containers Improved Our Performance:  Running containers enables a user to more readily leverage the performance of a bare metal machine. We can better utilize large machines with a container framework than is possible on the cloud because we aren’t trying to pack containers into the VMs we have running on the cloud.
  • Bare Metal Performance is Fantastic: Once we went to bare metal, we saw significant performance increases across the board. In some places (our databases servers in particular), we saw a performance improvement of nearly 6x.