Cybersecurity giant, CrowdStrike has detected a bug in the faulty testing software behind the crash of 8.5 million Windows machines around the world.
Recall that there was an outage that impacted multiple companies worldwide including airlines, broadcasters, and many organizations.
This problem forced Windows machines into a boot loop, with technicians requiring local access to machines to recover (Apple and Linux machines weren’t affected). Many companies, like Delta Airlines, are still recovering.
“Due to a bug in the Content Validator, one of the two [updates] passed validation despite containing problematic data. It promised a series of new measures to avoid a repeat of the problem.” the cybersecurity company said in a post in review statement.
“To prevent DDoS and other types of attacks, CrowdStrike has a tool called the Falcon Sensor. It ships with content that functions at the kernel level (called Sensor Content) and uses a “Template Type” to define how it defends against threats. If something new comes along, it ships “Rapid Response Content” in the form of “Template Instances.”
A Template Type for a new sensor was released on March 5, 2024, and performed as expected. However, on July 19, two new Template Instances were released and one (just 40KB in size) passed validation despite having “problematic data,” CrowdStrike said.
“When received by the sensor and loaded into the Content Interpreter, [this] resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD).” The statement said
ALSO READ: KENYA: UNCOVER RAISES $1.4M TO DRIVE TECH PLATFORM AND MARKET EXPANSION
Consequently, CrowdStrike promised to take several measures to prevent a repeat of the incident. First is more thorough testing of Rapid Response content, including local developer testing, content update, and rollback testing, stress testing, stability testing, and more. It’s also adding validation checks and enhancing error handling.
Furthermore, the company will start using a staggered deployment strategy for Rapid Response Content to avoid a repeat of the global outage. It’ll also provide customers greater control over the delivery of such content and provide release notes for updates.