The A.I. Beat

Dispatches from the frontier of machine intelligence

Three
Dollars

← Front page Tools & Releases May 12, 2026 · 8 min read

Tools & Releases

Copilot gets a Max plan, Needle packs Gemini tool-calling into 26M parameters, and a sobering reality check on AI productivity math

GitHub restructures Copilot pricing ahead of June 1, a tiny distilled model challenges the assumption that tool-calling requires a giant LLM, and one developer makes an uncomfortable argument about what "2x faster" actually costs you.

By The AI Beat · Tools Desk

Copilot gets a Max plan, Needle packs Gemini tool-calling into 26M parameters, and a sobering reality check on AI productivity math

GitHub Copilot is getting a new pricing structure on June 1, and if you’re on an individual plan, now is the time to pay attention.

Copilot’s New Lineup: Flex Allotments and a Max Tier

Starting June 1, GitHub is reshuffling its individual Copilot plans. The headline changes: Pro and Pro+ are getting “flex allotments,” which replaces the old fixed-bucket model with something more usage-sensitive. On top of that, there’s a new Max plan sitting above Pro+.

The flex allotment shift is the more interesting structural change. Fixed monthly limits on completions or chat messages have always been a blunt instrument. Flex suggests GitHub is moving toward something more elastic, where your baseline is covered but you can draw on additional capacity when you need it. The details of how that translates to dollars are on GitHub’s pricing page.

The new Max plan is aimed at developers who want higher model access and priority throughput. If you’re already on Pro+ and finding yourself hitting walls, Max is the obvious upgrade path. If you’re on free or basic Pro and not hitting limits, this doesn’t affect you much.

Worth noting: GitHub says this restructuring came from user feedback. The previous plan tiers had gotten confusing, and the fixed allotment model wasn’t working for how people actually use the tool. Whether the new structure is actually simpler remains to be seen, but at least it acknowledges the problem.

Needle: 26M Parameters, Gemini-Level Tool Calling

The most technically interesting thing to land today is Needle, from a team called Cactus Compute. It’s a 26-million-parameter model distilled specifically for function/tool calling, trained to replicate Gemini’s tool-calling behavior in a fraction of the footprint.

To put that in perspective: Gemini is a massive model. Needle is 26M parameters. For tool calling specifically, that’s a remarkable compression ratio, and it’s the kind of thing that matters if you’re building agents that need to run on-device, in a serverless function with tight memory limits, or anywhere you can’t afford to call a full frontier model for every tool dispatch.

The distillation approach is interesting. Rather than training a small model on raw tool-calling datasets from scratch, Cactus Compute used Gemini as the teacher, essentially baking Gemini’s tool-selection reasoning into a model that can run on a laptop. If the benchmarks hold up under real workloads, this is genuinely useful. A 26M model you can run locally that reliably dispatches to the right function is a very different proposition than spinning up a cloud API call for every agent step.

This is a Show HN project, so take “production ready” claims with appropriate skepticism. But the concept is sound, and it’s the kind of work that, if it performs, significantly lowers the cost of building capable tool-using agents.

Shopify’s River: Radical Transparency by Design

Shopify’s CEO Tobias Lütke shared details about River, the company’s internal AI coding agent, and the design decision that stands out is architectural: River won’t respond to direct messages. If you try to DM it, it redirects you to create a public channel.

Every River conversation happens in a searchable, company-visible Slack channel. Lütke himself works with River in #tobi_river, and over 100 Shopify employees have followed along. Anyone can jump in.

This is the opposite of how most people use AI coding tools, which tend to be private by default. The bet Shopify is making is that ambient institutional knowledge compounds over time. When senior engineers’ River sessions are visible to junior engineers, you get implicit mentorship. When bugs get solved in public, the solution doesn’t die in someone’s chat history.

It’s not a tool you can use, it’s a design principle worth stealing. If your team is using AI agents in private, you’re probably leaving learning on the table.

The Uncomfortable Math Behind “3x Productivity”

Simon Willison flagged a post from developer James Shore that deserves more attention than it’ll probably get. Shore’s argument is simple and brutal:

Your AI coding agent, the one you use to write code, needs to reduce your maintenance costs. Not by a little bit, either. You write code twice as quick now? Better hope you’ve halved your maintenance costs. Three times as productive? One third the maintenance costs. Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture. The math only works if the LLM decreases your maintenance costs, and by exactly the inverse of the rate it adds code.

This is the version of the AI productivity argument that nobody wants to have. Velocity is visible. Maintenance debt is deferred. If you’re shipping 3x the code with AI assistance but that code carries the same per-line maintenance burden as human-written code, you’re not 3x more productive. You’re building a larger system that future-you has to maintain.

The crux is whether AI-generated code is actually lower maintenance. In some cases, it probably is: boilerplate is boilerplate, tests are tests, and a well-prompted agent generating standard CRUD endpoints probably produces code that’s no harder to maintain than what a developer would write. But in complex domains, the generated code can be subtly wrong in ways that take longer to diagnose than code you wrote yourself, because you understand the intent.

There’s no clean answer here, but Shore’s framing is a useful forcing function: before you celebrate your velocity numbers, ask what the maintenance curve looks like at 6 months.

Also Worth Knowing

Simon Willison documented a fun trick: you can put LLM in the shebang line of a shell script to make an executable natural-language file. The simplest form:

#!/usr/bin/env -S llm -f Generate an SVG of a pelican riding a bicycle

Run that file, get an SVG. You can also wire in tool calls with the -T flag. It’s a toy, mostly, but it’s a surprisingly clean demonstration of what happens when a CLI tool is a proper Unix citizen.

Google’s Android pre-I/O dump today included vibe-coded home screen widgets (“Create My Widget”) and Gemini-powered Gboard dictation rolling out first to Samsung Galaxy and Pixel. These are consumer features, not developer tools. Worth knowing they exist; nothing actionable for most of you today.

developer tools tools

The A.I. Beat

Copilot’s New Lineup: Flex Allotments and a Max Tier

Needle: 26M Parameters, Gemini-Level Tool Calling

Shopify’s River: Radical Transparency by Design

The Uncomfortable Math Behind “3x Productivity”

Also Worth Knowing

Continue Reading

GitHub Copilot gets its own desktop app as Microsoft ships custom model underneath

Cyera's $12 Billion Bet on Data Security in the AI Era

UK Forces Google to Let Publishers Opt Out of AI Features, While Trump's "Voluntary" AI Order Draws Skepticism

The Morning Beat.