On the morning of September 15, 2015 connectivity to our Long Island datacenter was interrupted by a distributed denial of service (DDoS) attack against a neighboring subnet in the datacenter. Analysis shows that this was a RIPv1 reflector attack, wherein old equipment still on the Internet is used to create malicious packet streams designed specifically to cause problems with routing and switching equipment. While the attack stream was not overly large, the specific profile of the attack was such that the existing DDoS edge protection on the network was not sufficient for mitigation. This caused problems in the routing core of the network, effectively breaking connectivity for customers with equipment located in the datacenter.
Connectivity was restored by isolating the attack target using blackhole routing, by completely removing the target from the network, by establishing dedicated and isolated ingress and egress paths for the target, and also by adding additional traffic filters both on the local network and on the networks of several upstream providers. The initial impairment lasted for approximately 3 hours on the morning of the 15th. There were additional partial outages on the afternoon of the 15th, then again in the early morning hours of September 16 when additional filtering was added that permanently mitigated the attack.
A root cause analysis of this issue shows us that the edge DDoS protection on the network was simply not sufficient to quickly and automatically handle this type of attack. Instead, manual intervention and engineering was required, which is naturally slower to implement. To prevent this from happening again, edge DD0S protection has been upgraded and augmented with on-demand BGP-based mitigation from Arbor Networks, one of the industry’s largest protection providers. This solution was implemented late last week and has been tested thoroughly with no impact on service. In the event of a similar or larger attack, all traffic is diverted in real-time to the Arbor network, where it is “scrubbed” before being passed through to us. In most cases the only noticeable impact could be a few extra milliseconds of transit time, with little to impact on real-world performance from an end-user perspective.
While we cannot stop future attack attempts, we feel confident that the upgraded protection now in place will prevent any future attack from causing the disruption experienced last week.