Amazon has migrated the majority of Alexa to the next generation of its custom silicon chips.
Last year, Amazon was reported to be working on the next generation of its ARM-based custom silicon, as it works to improve cost, performance and efficiency. The company’s latest effort is the AWS Inferentia, with four NeuronCores. The NeuronCores are designed to speed up deep learning operations, making them an ideal option for powering Alexa.
“Today, we are announcing that the Amazon Alexa team has migrated the vast majority of their GPU-based machine learning inference workloads to Amazon Elastic Compute Cloud (EC2) Inf1 instances, powered by AWS Inferentia,” writes Sébastien Stormacq. “This resulted in 25% lower end-to-end latency, and 30% lower cost compared to GPU-based instances for Alexa’s text-to-speech workloads. The lower latency allows Alexa engineers to innovate with more complex algorithms and to improve the overall Alexa experience for our customers.
“AWS built AWS Inferentia chips from the ground up to provide the lowest-cost machine learning (ML) inference in the cloud. They power the Inf1 instances that we launched at AWS re:Invent 2019. Inf1 instances provide up to 30% higher throughput and up to 45% lower cost per inference compared to GPU-based G4 instances, which were, before Inf1, the lowest-cost instances in the cloud for ML inference.”
This has been a big week for custom silicon, between Apple unveiling its first Macs running on its M1 chip Tuesday, and now AWS’ announcement.