Around 9:30AM EST, we started experiencing some choppy and dropped phone calls through our own internal SIP based phone system, and shortly after began receiving customer reports of slow and limited access to internet resources. After some investigation (20-30 minutes of troubleshooting) we determined that the access problems were isolated to one of our three major internet peers, Level3 communications. We immediately disconnected from Level3 to minimize the impact to us and our and customers, and this resolved most connectivity problems over the next few minutes while other networks began to make routing changes. Fortunately, our other ample upstream paths carried our full traffic load without issue.
Despite the change we made, Level3 is one of the largest and most utilized carriers in the world. As a result, many external networks still rely on Level3 as a transit or intermediary path to reach one of our remaining peers (Cogent, TWTelecom) which results in poor performance for some users.
At this time, Level3 is still showing signs of loss in the Midwest and we are still de-peered. We have had a ticket open with Level3 since early this morning, and we are poking our contacts for updates as they become available. We will be monitoring throughout the evening, and re-peering when we have confirmation that things are stable.
So while it’s common to have multiple paths to reaching the rest of the internet, does your IT partner really have the ability to respond quickly and efficiently to a critical failure? If you’re not confident in the answer to that question, give us a call and you will be.