Morning Edition LIVE
Vol. I · No. 1
Est.
MMXXVI

The A.I. Beat

Dispatches from the frontier of machine intelligence
Three
Dollars
← Front page Tools & Releases May 13, 2026 · 7 min read
Tools & Releases

GitHub Copilot gets a Max plan, llm 0.32 surfaces reasoning tokens, and a 26M model that can actually call tools

Three tool updates worth knowing about: Copilot's June 1 plan overhaul, a tiny distilled model for function calling, and Simon Willison's CLI gaining visibility into GPT-5-class reasoning chains.
GitHub Copilot gets a Max plan, llm 0.32 surfaces reasoning tokens, and a 26M model that can actually call tools

GitHub is reorganizing its Copilot individual plans starting June 1, and the headline addition is a new Max tier. Pro and Pro+ are getting “flex allotments,” which sounds like a way to handle usage that spills past your monthly included quota without cutting you off entirely. The Max plan sits above Pro+, presumably for developers running agents or heavy autocomplete sessions all day. GitHub hasn’t been shy about positioning Copilot as an agentic tool lately, so a higher ceiling makes sense. Check the official plan page for current pricing before June 1 if you’re deciding whether to stay on your current tier or move.

Who should care: anyone paying for Copilot individually, especially if you’ve been hitting limits. Who can ignore it: enterprise customers on org-managed plans, where this doesn’t apply.

llm 0.32a2: You can now see what the model is thinking

Simon Willison’s llm CLI tool dropped 0.32a2 with one change that matters more than the rest: reasoning-capable OpenAI models now route through /v1/responses instead of /v1/chat/completions. That’s the endpoint that enables interleaved reasoning across tool calls, which is how GPT-5-class models actually work under the hood.

The practical upside is that you can now see the summarized reasoning tokens in your terminal output, printed in a different color from standard output so you can distinguish the model’s “thinking” from its actual response. If you don’t want that, -R or --hide-reasoning will suppress it.

This matters because it’s not just aesthetic. Interleaved reasoning across tool calls means the model can think between each tool invocation, not just once at the start. For anyone using llm with tool plugins, that’s a meaningful capability difference. You’re no longer flying blind on what the model is actually doing when it chains calls together.

The alpha label means there may be rough edges. But if you’re already using llm in scripts or as part of a research workflow, it’s worth testing.

Needle: 26 million parameters, one job

Needle is a 26M parameter model that Cactus Compute distilled specifically from Gemini’s tool-calling behavior. Twenty-six million parameters is extremely small. For context, most useful LLMs today are measured in the billions. What you give up in general reasoning, you potentially get back in speed, cost, and the ability to run the thing locally or embed it in something resource-constrained.

The premise is that tool calling, specifically the structured output problem of deciding which function to call and with what arguments, doesn’t require a massive general-purpose model. If you can distill that capability out of a frontier model and pack it into something tiny, you can use it as a router or dispatcher while letting a larger model handle the actual reasoning.

Whether Needle delivers on that premise depends on your use case. If you’re building an agent that needs to route between tools reliably and cheaply, a 26M model that handles function dispatch while a bigger model handles everything else is a reasonable architecture. If you need the model to reason about which tool to call in ambiguous situations, 26M parameters probably isn’t enough.

The benchmarks on the repo are worth reading carefully before getting excited. Distilled models often look impressive on the tasks they were specifically trained to match and less impressive on anything adjacent.

Shopify’s transparency-by-default agent design

Worth noting on the organizational side: Tobias Lütke described how Shopify’s internal coding agent, River, is built to operate entirely in public Slack channels. It won’t respond to DMs. If you try, it declines and tells you to create a public channel instead.

Lütke himself works with River in a public channel that over 100 people follow. Every session is searchable. Anyone can jump in.

This is a deliberate design choice with real implications. Public AI sessions mean institutional knowledge accumulates in a searchable place instead of disappearing into DMs. Junior engineers can see how senior engineers prompt, debug, and iterate. The agent becomes a teaching tool by default, not by design of any particular session.

Most companies are deploying AI coding tools in exactly the opposite configuration: private, individual, invisible. Shopify’s approach is an interesting counterexample. Whether it works because of the transparency or in spite of the friction, it’s worth watching.

Small note: LLM scripts just got weirder

If you want to go down a rabbit hole today, Simon Willison also documented a trick where you use llm in a Unix shebang line. You can write a plain English file, start it with #!/usr/bin/env -S llm -f, make it executable, and run it like a script. The LLM interprets the English content as the prompt and executes it.

#!/usr/bin/env -S llm -f
Generate an SVG of a pelican riding a bicycle

It works. It’s strange. You can add -T name_of_tool to include tool calls. It’s not something you’d ship to production, but it’s a genuinely clever use of how llm handles fragments, and it points at how the line between “code” and “instructions” keeps getting weirder.

developer tools tools