Google has begun rolling out Deep Think, its most advanced reasoning capability, inside the Gemini app for AI Ultra subscribers. At the same time, the company is granting a small group of select mathematicians direct access to the full version of the Gemini 2.5 Deep Think model β the same one Google entered into the International Mathematical Olympiad competition. The move is deliberate, and the audience it courts tells you almost everything about where the real competition in AI is heading.
For most of the past two years, the public benchmark wars between AI labs have been fought on standardised tests: bar exams, medical licensing boards, coding challenges. These are legible, reproducible, and easy to headline. But the IMO is something different. The International Mathematical Olympiad is widely regarded as the most demanding pre-university mathematics competition in the world, drawing students who have spent years training on problems that require not just calculation but genuine creative reasoning β the kind that cannot be gamed by pattern-matching on a large training corpus. Entering a model into that arena is a statement about capability that goes well beyond leaderboard positioning.
What makes Google's approach here particularly interesting is the two-track rollout. Giving Ultra subscribers access to Deep Think is a commercial play, a way to justify the premium tier and begin monetising the company's most expensive inference workloads. But seeding the full model to working mathematicians is something closer to a scientific credentialing exercise. These are people who can tell the difference between a model that produces plausible-looking proof steps and one that actually reasons correctly. Their informal verdicts, shared across departments and conferences, carry a kind of reputational weight that no benchmark score can replicate.
This dual strategy reflects a broader tension inside the frontier AI industry right now. Labs need revenue to fund the compute required to stay competitive, which pushes them toward consumer and enterprise products. But they also need the trust of the scientific and technical communities whose endorsement signals genuine progress rather than marketing. Google is trying to serve both masters simultaneously, and the mathematicians are essentially functioning as an independent quality-assurance layer β one that Google does not fully control.
There is a feedback loop embedded in this arrangement worth watching. If the mathematicians who receive early access find Deep Think genuinely useful for research-level work, they are likely to publish findings, mention the tool in seminars, and recommend it to colleagues. That organic diffusion into academic mathematics would be far more valuable to Google than any advertising campaign, because it would establish the model as a legitimate research instrument rather than a consumer novelty. Conversely, if the model stumbles on problems that working mathematicians consider tractable, that information will also circulate β and in a community that values precision above almost everything else, a single well-documented failure can be stickier than a dozen successes.
The IMO framing also raises a second-order question that the announcement does not address but that the mathematics community is already beginning to discuss: what happens to competition mathematics itself if AI systems can reliably solve Olympiad-level problems? The IMO has functioned for decades as a global talent pipeline, identifying young mathematicians who go on to careers in research, cryptography, theoretical computer science, and quantitative finance. If the problems that once served as a filter become solvable by a sufficiently capable model, the competition's role as a signal of human mathematical talent becomes more complicated to interpret.
This is not an immediate crisis, but it is the kind of slow-moving structural shift that tends to arrive before institutions have had time to adapt. The people best positioned to notice it first are precisely the mathematicians Google is now inviting to evaluate Deep Think β which gives this particular rollout a quietly recursive quality. Google is asking the community most likely to be affected by advanced mathematical AI to help validate the very system that may eventually reshape their field.
Whether Deep Think represents a genuine leap in machine reasoning or a very impressive approximation of one is a question that will take months of serious use to answer. But the fact that Google is routing that question through elite mathematicians rather than relying solely on benchmark scores suggests the company understands that, at this level of capability, the only credible judges are the humans who have spent their lives thinking about what rigorous reasoning actually requires.
Discussion (0)
Be the first to comment.
Leave a comment