CrowdStrike—How Microsoft Will Protect 8.5 Million Windows Machines

July 29, 2024

Just over a week has passed since the unfortunate CrowdStrike update led to the crashing of millions of Windows-based machines. In response, Microsoft has released a detailed analysis of the incident, shedding light on the significance of kernel-level access for security products like CrowdStrike and outlining future protective measures for Windows systems.

In a recent blog post titled Windows Security Best Practices for Integrating and Managing Security Tools, Microsoft delves into the CrowdStrike outage, offering its perspective on the underlying causes.

Microsoft Confirms CrowdStrike’s Analysis

Last week, CrowdStrike shared its preliminary Post Incident Review, acknowledging that the disruption stemmed from a bug in the software used to test regular content updates. While this error was rare and challenging to detect, CrowdStrike has taken responsibility and committed to enhancing its quality assurance and testing processes.

Microsoft corroborates CrowdStrike’s findings, identifying the issue as a read-out-of-bounds memory safety error in the CSagent.sys driver developed by CrowdStrike. “Based on Microsoft’s analysis of the Windows Error Reporting kernel crash dumps related to the incident, we observe global crash patterns that reflect this,” the company stated.

Microsoft’s Take On Kernel Drivers

Turning to the matter of kernel drivers, Microsoft explains how many security vendors, including itself, utilize a kernel driver architecture. “Kernel drivers allow for system-wide visibility and the capability to load during early boot, enabling detection of threats like boot kits and root kits that may load before user-mode applications,” the company elaborates.

Additionally, kernel drivers enhance performance, particularly for tasks such as analyzing high-throughput network activity. Microsoft emphasizes that these drivers also provide tamper resistance, ensuring that security software remains operational against malware, targeted attacks, or even malicious insiders, regardless of their administrative privileges. “They also want to ensure that their drivers load as early as possible to observe system events at the earliest opportunity,” Microsoft notes.

The Trade-Off

However, the use of kernel drivers is not without its complexities. Microsoft acknowledges the inherent trade-offs that security vendors face when operating at this level. “Since kernel drivers run at the most trusted level of Windows, where containment and recovery capabilities are inherently limited, security vendors must carefully balance needs like visibility and tamper resistance with the risks associated with kernel mode operations.”

“All code operating at kernel level requires extensive validation because it cannot fail and restart like a normal user application. This is a universal requirement across all operating systems,” Microsoft adds. The company has made strides in transitioning complex Windows core services from kernel to user mode, suggesting that it is now feasible for security tools to strike a balance between security and reliability.

Kernel Drivers And CrowdStrike

In the aftermath of the CrowdStrike incident, some experts hastily directed blame towards Microsoft. Cybersecurity consultant Daniel Card likened this reaction to “blaming the road for someone driving at 100mph into someone’s house by the roadside.”

Kernel drivers are not exclusive to security products; they are also utilized by various other solutions, including anti-cheat engines. Card points out that the widespread adoption of kernel drivers among security vendors, including Microsoft, underscores the necessity of balancing visibility, stability, speed, and availability. “Apple doesn’t expose this capability, which has resulted in some negative impacts regarding visibility,” he observes.

Sean Wright, head of application security at Featurespace, emphasizes the importance of evaluating vendors carefully, given that kernel-level access is crucial for the functionality of most security tools. “This concern is not unique to Windows; Linux offers similar access for drivers. While macOS has a different ecosystem, it is essential to recognize these distinctions,” he states. Wright adds that the absence of significant issues prior to the CrowdStrike event suggests a largely effective system.

What’s Next For Windows Security

In its blog post, Microsoft outlines four key steps to bolster security following the CrowdStrike incident:

Providing safe rollout guidance, best practices, and technologies to enhance the safety of updates to security products.
Reducing the necessity for kernel drivers to access critical security data.
Enhancing isolation and anti-tampering capabilities through technologies such as its recently announced VBS enclaves.
Implementing zero trust approaches, including high integrity attestation, which assesses the security state of machines based on the health of Windows native security features.

While Microsoft appears to be curtailing the access granted to kernel drivers, Card advises against hasty reactions in the wake of the CrowdStrike incident. He believes the focus should be on addressing fundamental questions: “Why did CrowdStrike not test sufficiently? Why didn’t they stagger the deployment? Why didn’t they allow customers to self-delay this content type?”

This incident highlights a broader issue that extends beyond the level of access granted to security products. Card suggests that any changes to the Windows operating framework should be approached with caution, emphasizing that the resilience question transcends the kernel versus user mode debate and encompasses broader technological ecosystems and national considerations.

Winsage