A 27-year-old vulnerability sat inside OpenBSD's TCP stack while some of the most rigorous human auditors in the security world walked past it, year after year. Fuzzers ran. Code reviews happened. OpenBSD maintained its hard-won reputation as arguably the most security-conscious general-purpose operating system ever built. None of it mattered. Two packets, sent in the right sequence, could crash any server running the affected code. The flaw waited patiently for someone, or something, to notice.
That something was Mythos, a preview model from Anthropic built on Claude. The total cost of the discovery campaign was approximately $20,000. The specific model run that surfaced the vulnerability cost under $50. No human guided the process after the initial setup. Mythos found the bug autonomously.
The number that should stop security professionals cold is not $20,000. It is $50. That is the marginal cost of a discovery that eluded decades of expert human review on a codebase specifically designed to be reviewed. OpenBSD's development culture is famously adversarial toward its own code. The project's unofficial motto has long been "only two remote holes in the default install, in a long time." The community treats security not as a feature but as a discipline. And still, a flaw from 1997 survived inside the TCP stack until an AI model with a modest compute budget decided to look.
The persistence of this bug is not an indictment of OpenBSD's developers. It is a structural observation about how human cognition engages with complex, layered systems. Security audits are expensive, time-consuming, and cognitively taxing. Reviewers bring assumptions to code, and those assumptions tend to cluster around the same blind spots across a community. A bug that survives one expert review has often already learned, in a sense, how to survive the next one. It fits into the mental model of "safe enough" that experienced engineers carry.
AI systems like Mythos do not share those assumptions. They do not get tired. They do not carry the accumulated intuitions that make certain code paths feel familiar and therefore safe. They can hold the entire state of a complex protocol interaction in working context and probe it in ways that would take a human team weeks to replicate. The OpenBSD TCP bug was not hiding in an obscure corner of the codebase. It was hiding in plain sight, inside the kind of foundational networking code that reviewers tend to treat as settled ground.
This is the second-order consequence that deserves the most attention: if AI systems can now find vulnerabilities in hardened, well-reviewed codebases at a cost of under $50 per run, the economics of offensive security research have shifted in a way that defense has not yet absorbed. Nation-state actors and well-funded criminal organizations already have access to these tools. The asymmetry between finding vulnerabilities and patching them, which has always favored attackers, just got significantly wider.
Security teams have spent years building detection and response frameworks around a model of adversarial behavior that assumes human-speed discovery and human-scale economics. Penetration testing cycles are quarterly or annual. Vulnerability disclosure programs assume that finding a serious flaw takes weeks of expert effort. Bug bounty payouts are calibrated to the cost of skilled human labor. All of those assumptions are now under pressure.
The more immediate operational problem is that the same capability Anthropic used to find this bug is available, in various forms, to anyone with an API key and a few hundred dollars. The OpenBSD maintainers can patch this specific flaw, and they will. But the broader implication is that every long-lived codebase, every piece of infrastructure software that has accumulated years of incremental changes and survived on the assumption that no one had looked hard enough, is now a candidate for rapid autonomous review. The attack surface did not change. The cost of exploring it did.
What comes next is not simply a faster version of the existing security research cycle. It is a qualitative shift in who can find what, and how quickly. Defenders who treat this as a tooling upgrade rather than a structural change in threat economics will find themselves perpetually behind. The more productive framing is to ask what it would mean to run Mythos-style autonomous discovery campaigns against your own infrastructure before someone else does, and to build the organizational capacity to absorb and act on what those campaigns find at a pace that matches the speed of discovery.
The 27-year-old bug is fixed. The conditions that allowed it to survive for 27 years have not changed at all.
Discussion (0)
Be the first to comment.
Leave a comment