How IT Departments Rushed To Deal With CrowdStrike Chaos

At 1 a.m. local time on Friday, a systems administrator at a West Coast funeral and cremation company woke up to find his computer screen glowing. When he checked his company phone, it was blowing up with messages about what his coworkers were calling a network problem. Their entire infrastructure was down, threatening to disrupt funerals and cremations.

It quickly became clear that the massive disruption was caused by the CrowdStrike outage. The security firm inadvertently caused chaos around the world on Friday and into the weekend by spreading defective software to its Falcon monitoring platform, causing problems for airlines, hospitals and other businesses both large and small.

The administrator, who asked not to be identified because he is not authorized to speak publicly about the outage, sprang into action. He ended up working nearly 20 hours a day, driving from morgue to morgue and personally resetting dozens of computers to fix the problem. The situation was urgent, the administrator explained, because the computers had to be back online so there would be no disruptions to the scheduling of funeral services and the morgue’s communication with hospitals.

“With a problem as big as the CrowdStrike outage, it made sense to make sure our business was ready to go so that we could bring these families in, so that they could go through the services and be with their family members,” the sysadmin said. “People are grieving.”

The flawed CrowdStrike update has bricked some 8.5 million Windows computers worldwide, sending them into the dreaded Blue Screen of Death (BSOD) spiral. “The trust we’ve built up over the years in drops has evaporated in buckets in a matter of hours, and it was a slap in the face,” CrowdStrike Chief Security Officer Shawn Henry wrote on LinkedIn early Monday morning. “But this is nothing compared to the pain we’ve caused our customers and partners. We’ve let down the very people we were sworn to protect.”

Cloud platform outages and other software problems, including malicious cyberattacks, have caused major IT outages and global disruptions in the past. But last week’s incident was particularly notable for two reasons. First, it was the result of a flaw in software that was meant to support and defend networks, not harm them. And second, fixing the problem required hands-on access to each affected machine; a person had to manually boot each computer into Windows Safe Mode and apply the fix.

IT is often an unattractive and thankless job, but the CrowdStrike debacle was a next-level test. Some IT professionals had to work with remote workers or multiple locations across borders, guiding them through the manual reset of devices. A junior systems administrator for a fashion brand in Indonesia had to figure out how to overcome language barriers to do so. “It was daunting,” he says.

“We don't get noticed unless something goes wrong,” a systems administrator at a Maryland healthcare facility told WIRED.

That person woke up shortly before 1 a.m. EDT to find the screens at the organization’s physical locations blue and unresponsive. Their team spent several morning hours getting the servers back online, then had to manually fix more than 5,000 other devices across the company. The outage blocked phone calls to the hospital and threw the system that dispensed medications into disarray, requiring everyone to write them down by hand and walk to the pharmacy.

How IT Departments Rushed to Deal with CrowdStrike Chaos