The BBC has announced it now relies on Google Cloud serverless architecture to process up to 26 billion log lines per day.
The BBC relies on Traffic Manager and CDN access logs to identify issues and make sure its online properties are running efficiently. According to Neil Craig, part of the BBC’s Digital Distribution team, the outlet sees anywhere from 3 billion to 26 billion log lines per day.
In a blog post for Google Cloud, Craig highlights the challenges of dealing with that much data:
As initially designed, we stored log data in a Cloud Storage bucket. But every time we needed to access that data, we had to download terabytes of logs down to a virtual machine (VM) with a large amount of attached storage, and use the ‘grep’ tool to search and analyze them. From beginning to end, this took us several hours. On heavy news days, the time lag made it difficult for the engineering team to do their jobs.
Craig goes on to describes the changes moving to Google Cloud’s serverless architecture brought:
In this new system, we still leverage Cloud Storage buckets, but on arrival, each log generates an event using EventArc. That event triggers Cloud Run to validate, transform and enrich various pieces of information about the log file such as filename, prefix, and type, then processes it and outputs the processed data as a stream into BigQuery. This event-driven design allows us to process files quickly and frequently — processing a single log file typically takes less than a second. Most of the files that we feed into the system are small, fewer than 100 Megabytes, but for larger files, we automatically split those into multiple files and Cloud Run automatically creates additional parallel instances very quickly, helping the system scale almost instantly.
In addition to improved speed and scaling, Craig says cost was a major benefit of the transition:
Our initial concern about choosing serverless was cost. It turns out that using Cloud Run is significantly more cost-effective than running the number of VMs we would need for a system that could survive reasonable traffic spikes with a similar level of confidence.
Switching to Cloud Run also allows us to use our time more efficiently, as we no longer need to spend time managing and monitoring VM scaling or resource usage. We picked Cloud Run intentionally because we wanted a system that could scale well without manual intervention. As the digital distribution team, our job is not to do ops work on the underlying components of this system — we leave that to the specialist ops teams at Google.
The BBC’s experience is a ringing endorsement of Google’s Cloud architecture and should serve as a reference point for companies in similar situations.