Let me be clear this isn’t a blog about apportioning blame to anyone. These sort of events have happened multiple times in the past, perhaps not on the same scale, but this certainly isn’t the first nor will it be the last time an update from a supplier causes production issues.
As we are all now aware, CrowdStrike had a difficult day recently. But this recent event highlights a problem everyone in the IT industry faces and it helps explain why the “Patch Gap” exists in the first place.
To recap, the patch gap is basically the time it takes to patch software vs the time it takes for attackers to exploit it. Currently this sits at 80 days for critical CVE’s (reminder, time to patch minus time to exploit). This video also helps explain it.
But this patch gap doesn’t exist due to a collective incompetence. It exists as organisations need to test the impact patches will have on their production environment before they push this patch out. This testing needs to cover a lot of different variables, some of which are time dependent. For example, there may be a process that is only run one a month that is broken by the patch. If you don’t allow testing to run long enough, and push this patch live after brief testing, you will get a nasty surprise when it eventually runs. And an important note here, rolling back a patch is typically a LOT more difficult than applying it.
Avoiding this patch gap and staying in front is the key reason this particular event had such an impact. The key driver for allowing these updates directly into your production environment was to stay in front of the attackers. But if we can re-learn anything from this event, it is that sometimes the cure can be worse than the disease. Perhaps an expedited testing framework should be used for particular tools if you are going to allow direct access to update production environments? If these updates are happening multiple times a day, then perhaps an N-1 approach doesn’t increase your risk as much as the risk of a bad patch?
If we look at the standards set by various security frameworks, such as NIST, CIS etc all recommend and or set best practices for patching a critical CVE between 24-48 hours. Not many organisations meet this (I would wager the vast majority do not), and yet the same organisations will insist that their EDR is the most up to date it can be to real time? This seems an odd contradiction. An N-1 approach would introduce a protection gap for your system, but if these updates are happening multiple times a day, the gap would only be hours. Compared to the 80+ days organisations are allowing for patching critical CVE’s, this seems an acceptable risk. It is interesting to note that Crowdstrike are considering this approach going forward,