Over the past year at Avocet we've been busy building a platform from the ground up to handle the high-scale, low-latency demands of programmatic advertising.
Each month the number of transactions we process increases several-fold, with the first month resulting in over 6.2 billion requests at an average of around 200 million requests per day. All this whilst maintaining an average latency of ~10ms and whilst providing realtime insight into campaign spend and performance.
It's been important to find the balance between monolith and micro-service.
At Avocet we decided on a service-oriented architecture that would allow us to keep each service well-defined, highly-available and importantly decoupled. Given our latency requirements for responding to ad requests, we've found that unnecessary network requests needed to be avoided, so it's been important to find the balance between monolith and micro-service.
Choosing the right tools for the job
There are a number of technical investments that we made early on that have certainly paid off. These are:
We chose to use Go for all of our backend web-services and even some of our batch ETL processes. Go provides great performance, accessible code, and with its statically linked binaries, it's a doddle to deploy.
We use Consul as a central discovery service for each environment. This has allowed us to simplify our configuration management and ensure every service has a consistent view of the overall platform topology. Consul's locking allows us to quickly build highly available services distributed across multiple regions and it's health checks provide an excellent backbone to our monitoring and alerting systems.
Stream vs. Batch Processing
We use NSQ as a messaging platform between all our services. This helps us to process system-critical tasks such as budget updates, conversion attribution and real-time analytics with as little latency as possible. In the cases where we must use batch tasks, everything is scheduled by Oozie workflows and coordinators allowing us to pause, resume and re-run tasks with ease.
Business Intelligence and Reporting
Our business runs on data, so the ability to execute ad-hoc queries on huge datasets is important in identifying what decisions to make for both our clients and our tech teams. We use Presto for analysing data stored within our data warehouse (which is backed by HDFS and S3) and InfluxDB stores our service level metrics which can be queried/displayed using Grafana. In addition to this, we have a proprietary system which provides real-time statistics on campaign spend and performance that are displayed within our application. This allows both us and our clients to instantly understand the impact of campaign targeting changes.
The Road Ahead
There are still many challenges for us to face. As the number of transactions we process rises we are increasingly looking toward probabilistic data structures in order to provide additional real-time analytics and business intelligence so that we can quickly react to changing patterns.