New AI Startup Makes Voice Assistants Respond Four Times Faster

Martin Holloway·Published 2d ago·3 min read·Based on 3 sources

Reading level

New AI Startup Makes Voice Assistants Respond Four Times Faster

A new company called Thinking Machines Lab, founded by Mira Murati who previously worked as a top engineer at OpenAI, just announced AI models that respond in less than half a second. That is roughly four times faster than the voice assistants made by OpenAI and Google today.

The company says its smallest model responds in 0.40 seconds — compared to 1.18 seconds for OpenAI's system and 0.94 seconds for Google's. That may sound like a small difference, but it changes how natural a conversation feels.

How It Works

The new model uses a clever trick: it listens and responds at the same time, much like two people having a normal conversation rather than taking strict turns.

Think of it this way. When you talk to most voice assistants today, you have to wait for them to finish their answer before you can speak again. The assistant finishes its sentence, then listens for what you say next. With the new system, the AI is listening while it talks, so you can interrupt it or ask a follow-up question without waiting — just like you would with a person.

The company achieves this speed by using a smaller, focused model that only does conversation. When the AI needs to do something complex, like search the web or think through a hard problem, it sends that work to a second model running in the background. The main model stays fast while the helper model handles the heavy lifting.

What This Enables

The speed advantage could open up new ways to use AI voice assistants. Right now, the pauses in conversation make them feel awkward for customer service, meetings, or tutoring — places where you need back-and-forth dialogue. If the pauses vanish, those uses become more practical.

This follows a pattern we have seen before in technology. When smartphones got faster cameras or apps launched quicker, those small improvements in speed changed how people actually used their phones. Response latency in conversation works the same way. The difference between waiting 1.2 seconds and 0.4 seconds shifts the experience from "I am using a tool" to something that feels more like talking to a person.

The Tradeoff

Building a fast system means making compromises. The new model is smaller and narrower in what it can do compared to the very large language models that tech companies have been building. It sacrifices some reasoning power to stay fast.

The company plans to release a limited test version in the coming months so real people can try it out. They have not announced when it will be available to the public. They also said that larger, more powerful versions of their model are too slow right now, but they expect to fix that by later in 2026.

What Comes Next

The real question is whether speed alone is enough. Thinking Machines Lab needs to show that people actually prefer these fast-but-focused conversations to slower systems that might give more thorough answers. The test release will provide that answer.

For now, the technical achievement is solid. Whether it changes how we use AI voice assistants depends on what people do with it.

New AI Startup Makes Voice Assistants Respond Four Times Faster

New AI Startup Makes Voice Assistants Respond Four Times Faster

How It Works

What This Enables

The Tradeoff

What Comes Next

Related Articles

New AI Lab Backed by Billions Wants Machines That Listen and Watch in Real Time

Microsoft and OpenAI Are Rethinking Their Partnership as AI Gets More Powerful

Meta Buys Startup to Help Build Robots That Work Around People