Vulnerability scoring was supposed to make security decisions easier. Assign a number, set a threshold, patch above it, defer below it. It is a clean system built for a messy world. But during Operation Lunar Peek in November 2024, that logic collapsed in a very public way, when two Palo Alto Networks vulnerabilities that looked manageable on paper were chained together to hand attackers unauthenticated root access to more than 13,000 exposed firewall management interfaces worldwide.
The two vulnerabilities at the center of the incident were CVE-2024-0012 and CVE-2024-9474. Palo Alto Networks scored them at 9.3 and 6.9 respectively under CVSS v4.0. The National Vulnerability Database scored the same pair at 9.8 and 7.2 under CVSS v3.1. Two authoritative systems, two different answers, for the exact same flaws. The 6.9 score on CVE-2024-9474 was particularly consequential. At that level, many organizations' internal patch policies would have placed it in a lower-priority queue. It appeared to require existing admin access to exploit, which made it seem contained. What the score did not capture was what happens when you pair it with the authentication bypass sitting right next to it.
CVSS, the Common Vulnerability Scoring System, was designed to evaluate vulnerabilities in isolation. Each flaw receives a score based on its own attack vector, complexity, privileges required, and impact. That is a reasonable framework when vulnerabilities exist in isolation. They rarely do. Attackers do not browse a CVE list and pick the highest number. They look for sequences, for the gap between what one flaw opens and what another flaw can then reach.
In this case, CVE-2024-0012 provided the entry point: an authentication bypass in PAN-OS that allowed unauthenticated access to the management interface. CVE-2024-9474 then provided the escalation path, a privilege escalation flaw that let an attacker run commands as root. Neither vulnerability alone was the catastrophe. Together, they were. The scoring system had no mechanism to reflect that relationship, and organizations relying on scores as a proxy for risk had no obvious signal that these two items in their patch queue were, in practice, a single critical threat.

This is not a new criticism of CVSS. Security researchers have raised the chaining problem for years. A 2019 analysis published by researchers at the FIRST organization, which maintains CVSS, acknowledged that the system was never intended to capture exploitability in combination with other vulnerabilities. The problem is that the gap between what CVSS was designed to do and how it is actually used in enterprise patch management has grown enormous. Automated patch prioritization tools, compliance frameworks, and vendor SLAs are all built around CVSS thresholds. When a 6.9 sits below the line, it frequently waits.
The scale of Operation Lunar Peek reflects something beyond a scoring methodology debate. More than 13,000 management interfaces were exposed to the public internet at the time of the campaign, which itself represents a significant configuration failure. Palo Alto Networks has long recommended that management interfaces not be exposed externally, a guidance that a substantial portion of its customer base appears not to have followed or enforced.
But the exposure problem and the scoring problem compounded each other. Organizations that had internet-facing management interfaces and that deprioritized CVE-2024-9474 based on its score were, in effect, doubly reliant on assumptions that did not hold. The assumption that admin access would be required to exploit the lower-scored flaw was technically accurate in isolation. It was operationally irrelevant once the authentication bypass was in play.
The second-order consequence worth watching here is how this incident reshapes enterprise trust in CVSS-driven automation. A growing number of security teams use platforms that ingest CVSS scores and automatically generate patch schedules or risk ratings. If those platforms had flagged CVE-2024-9474 as lower priority, and if the incident becomes a reference case in post-mortems and audits, there will be pressure to either supplement CVSS with chaining-aware scoring or to build in manual review gates for any vulnerability that shares a vendor, product, or attack surface with a high-severity flaw patched in the same cycle. That kind of contextual reasoning is difficult to automate, which means it tends not to happen at scale.
The deeper tension is that the security industry has spent years trying to reduce cognitive load on overwhelmed patch teams by turning risk into numbers. Operation Lunar Peek is a reminder that numbers abstract away exactly the kind of relational thinking that attackers exploit. The question is not whether CVSS is broken. It is whether the systems built on top of it have drifted so far from its original intent that the abstraction has become a liability. As vulnerability chaining becomes a more common offensive technique, the organizations that survive will likely be those that treat scoring as a starting point rather than a verdict.
Discussion (0)
Be the first to comment.
Leave a comment