A lack of coherent rules
According to the announcement statement by CrowdStrike, the company issued a software update on July 19 that included a flaw that went undetected in validation checks. The error immediately crashed certain Windows systems connected to the web, causing them to display a crash message known as the blue screen of death.
CrowdStrike says it’s responding to the matter by reworking how it prepares its software updates, including more stringent testing and staggering deployment to prevent a global systems collapse in the future.
A bigger problem?
Outside of a need for a regimented approach to IT failures, the CrowdStrike outage also points to a broader problem within the backbone of the world’s tech infrastructure: A small number of companies have an outsized impact on how the web operates.
“We definitely know that these are very fragile systems, and the fact that they work as well as they do is, frankly, a miracle, given all of the different players, the lack of heterogeneity of the stack,” Gregory Falco, assistant professor of mechanical and aerospace engineering and systems engineering at Cornell University’s Sibley School, told Yahoo Finance.
But expanding the number of companies that plug directly into our internet infrastructure isn’t exactly an easy fix either. That’s because the more companies there are, the more opportunities there are for failures.
Ultimately, the solution to these kinds of world-scale problems might just come down to forcing companies to be better prepared for catastrophe. And if software does fail, understanding how to contain the fallout.