On Tuesday & Wednesday, Grasshopper faced a major outage that was unprecedented in our 8-year history. For more details, please read the full explanation below. In summary, we experienced a massive hardware failure, and unfortunately, our disaster recovery systems failed to function as intended.
While outages are a reality for technology providers, an outage of this scale is simply unacceptable, and we deeply apologize for the inconvenience caused. We understand the critical role your phone plays in your business, and we are committed to ensuring such incidents do not repeat themselves.
Now that all services have been restored, our team is working diligently to prevent future occurrences. Here’s what we are focusing on:
-
Increasing investment in our disaster recovery systems to avoid similar failures
-
Enhancing our network operations procedures
-
Implementing a notification system to promptly communicate any service-affecting issues
-
Introducing a fail-over feature to ensure continuity for your customers during downtime
For those who expressed frustration, we acknowledge your concerns. To those who supported us during this challenging time, we extend our heartfelt gratitude. We are grateful for our loyal customers and the trust you place in us.
If you have any questions, feel free to reach out to us via email, phone, or Twitter. Our support team is available 24/7 at 800-279-1455 or support@grasshopper.com.
Siamak & David
About the Outage: Details
On Tuesday morning, our primary production NetApp Storage Area Network (SAN) experienced a 2-disk failure, causing a disruption in essential services. This rare event triggered our RAID-DP protocol to protect data. After extensive efforts and collaboration with our SAN vendor, systems were gradually restored, although not without challenges.
The restoration process continued into Wednesday, where a core networking issue at our disaster recovery site posed further complications. To expedite recovery, a new storage array was deployed, eventually leading to the restoration of all systems.
As we move forward, our priorities include swiftly replacing necessary systems, resolving core networking issues, and conducting a comprehensive disaster recovery evaluation to prevent future incidents.