Home

>

Mindpix Blog

>

Free LLM's

A Quiet Earthquake in Beijing: Moonshot’s Kimi K2 Thinking Just Rewrote the Rules

Written by Denis Williams
Originally published: November 18, 2025
Updated: November 18, 2025
Views: 118
prev

BEIJING - It’s past midnight in the Haidian District again, and the engineers at Moonshot AI are still staring at leaderboards that no one expected to flip this fast. On Nov. 6, the Alibaba-backed startup dropped Kimi K2 Thinking, a one-trillion-parameter, open-source reasoning model trained for roughly $4.6 million. 



That’s pocket change next to the billions burned by American labs. Yet the numbers are brutal: 44.9 percent on Humanity’s Last Exam (with tools), 60.2 percent on BrowseComp, 71.3 percent on SWE-Bench Verified - scores that leave GPT-5, Claude Sonnet 4.5 Thinking, and Grok-4 looking over their shoulders.


Beat GPT-5 with 1/1000 of compute. 

They openly shared that training cost only $4.6 million. Yet their model outperformed GPT-5, made by OpenAl (a trillion-dollar company).

Beat Grok-4 on HLE (Humanity's Last Exam)


Elon Musk said: This task was tough and important. Grok-4 led using a $5B, 200,000-GPU setup. But Kimi beat it in four months with just $1M only 1/5000 of Grok-4's scale.


The model isn’t just big. It thinks. Hard. It can chain 200 to 300 tool calls autonomously, chewing through multi-step research or debugging marathons without once asking for a human nudge. “We built it to reason like a tired graduate student at 4 a.m.,” one Moonshot researcher told me off the record, “but one who never makes arithmetic mistakes.” Early testers say the output feels eerily deliberate - long, visible thought chains, then crisp final answers. No fluff.


Where does it shine brightest? Coding and agentic tasks, hands down. On LiveCodeBench and SWE-Bench, it trades blows with the best proprietary coders on earth. Text generation is fluent and surprisingly natural, especially in Chinese-English bilingual work. But don’t ask it to paint you a picture or edit a TikTok - Kimi K2 Thinking remains text-only for now, no native image or video generation. Earlier Moonshot models flirted with multimodality; this one doubled down on pure reasoning muscle.


I remember similar late-night drops back in 2012, when a tiny Toronto team quietly open-sourced AlexNet and accidentally kicked off the deep-learning gold rush. History doesn’t repeat, but it rhymes.


The man behind the curtain is Yang Zhilin, 33, Tsinghua undergrad turned Carnegie Mellon Ph.D., co-author of Transformer-XL and XLNet - papers that helped birth the very architecture OpenAI later rode to fame.


Fun fact: Yang’s English nickname is Kimi, the chatbot’s namesake.

Another: the company is literally called “Dark Side of the Moon” in Chinese, a nod to his favorite Pink Floyd album (he launched the firm on the record’s 50th anniversary). Stubborn AGI purist, ex-Google Brain and Meta AI, now valued north of $3 billion and racing American giants with a fraction of the compute.


Thomas Wolf, Hugging Face’s science lead, summed it up on X: “Another DeepSeek moment? Or are we getting one every couple months now?”

The real question hanging in the humid Beijing air - the one keeping CEOs awake from San Francisco to Seattle - isn’t whether Kimi K2 Thinking is good. It’s how long the trillion-dollar labs can keep charging premium prices when a scrappy Chinese startup just matched them for less than the cost of a Hollywood blockbuster. And if the next leap comes in four months again, who’s ready for that pace?