T-Mobile experienced a major outage Monday, impacting thousands of customers around the world as voice and text services were down.
The issues began around 12:00 PM Monday, with users having trouble making and receiving calls and texts. President of Technology Neville Ray took to Twitter to reassure users the company’s engineers were working on the issue.
Despite their efforts, however, the problems continued for roughly 12 hours. They were so widespread that users of other networks reported having issues as well, as we reported. FCC chairman Ajit Pai even got in on the action, tweeting that T-Mobile’s “network outage is unacceptable,” and vowing the FCC would demand answers.
It appears T-Mobile is providing those answers, with Ray explaining what caused the issue, as well as what the company is doing to make sure it doesn’t happen in the future.
“The trigger event is known to be a leased fiber circuit failure from a third party provider in the Southeast,” writes Ray. “This is something that happens on every mobile network, so we’ve worked with our vendors to build redundancy and resiliency to make sure that these types of circuit failures don’t affect customers. This redundancy failed us and resulted in an overload situation that was then compounded by other factors. This overload resulted in an IP traffic storm that spread from the Southeast to create significant capacity issues across the IMS (IP multimedia Subsystem) core network that supports VoLTE calls.
“We have worked with our IMS (IP Multimedia Subsystem) and IP vendors to add permanent additional safeguards to prevent this from happening again and we’re continuing to work on determining the cause of the initial overload failure.”
Ray personally apologized for the problem, while acknowledging T-Mobile and his team did not meet their own bar for excellence.