Technology

Thinking Machines Lab Announces New AI Architecture, Secures $2B Funding and NVIDIA Partnership

Martin HollowayPublished 2d ago6 min readBased on 8 sources
Reading level
Thinking Machines Lab Announces New AI Architecture, Secures $2B Funding and NVIDIA Partnership

Thinking Machines Lab Announces New AI Architecture, Secures $2B Funding and NVIDIA Partnership

On May 11, Mira Murati's startup Thinking Machines Lab announced a new type of AI system called "interaction models," designed to handle voice, video, and text at the same time without the delays that usually slow down these tasks. The announcement included news of a partnership with NVIDIA to provide computing power and a $2 billion funding round that values the company at $12 billion.

The interaction models take a different approach from how AI systems work today. Instead of adding voice and video capabilities on top of existing text-based AI, Thinking Machines trained these systems from the ground up to process audio, video, and text simultaneously while responding in real time. The company claims its models are faster and smarter than existing alternatives, though it hasn't released detailed performance comparisons yet.

How the Technology Works

Thinking Machines' models process continuous streams of input across different media types without the slowdowns typically built into current systems. Today's multimodal AI (systems that handle multiple types of input) often requires extra processing steps: converting speech to text, breaking video into individual frames, or feeding different types of information through the system one after another. These steps add delays between what a user inputs and what the system outputs.

Think of it like this: current multimodal AI is like a translator who has to listen to speech, write it down, translate it, write that down, and then speak it out. Thinking Machines' approach is more like having someone who understands multiple languages simultaneously and can respond right away without all those intermediate steps.

This matters in real-world applications. When you call a customer service chatbot that uses AI, noticeable pauses between your words and its responses can make the conversation feel awkward or broken. For businesses using AI to help with design, analysis, or customer interaction, natural back-and-forth communication is still hard to achieve with existing systems.

Thinking Machines released a research version called Tinker for developers who want to customize the models for specific tasks. The company also indicated its first commercial product will include open source components, a hybrid approach similar to strategies used by other AI companies like Hugging Face and Anthropic.

The Money and the Partnerships

NVIDIA's involvement goes beyond being a customer. The company is investing directly and committing at least one gigawatt of computing power (enough to run a small city) to run Thinking Machines' systems for training and real-world use. This computing power will use NVIDIA's next-generation Vera Rubin chips, which come after the H100 and H200 chips currently powering much of the AI world.

A $2 billion funding round puts Thinking Machines among the best-funded AI startups ever launched. The $12 billion valuation reflects investor confidence in Murati's leadership at her previous company, OpenAI, and the credentials of her founding team, many of whom are researchers who left OpenAI in late 2023.

The broader context here matters. During the cloud computing boom of the 2000s and 2010s, companies like Amazon Web Services and Google Cloud spent billions building computing infrastructure, but they did this over many years. AI startups now need access to massive amounts of computing power immediately just to stay competitive. The gigawatt-scale commitment suggests Thinking Machines expects to run extraordinarily large training and deployment operations that most individual businesses could never afford to build themselves.

Where Thinking Machines Fits

The announcement comes as the AI industry works through real limitations with current systems. OpenAI's GPT-4V, Google's Gemini, and Anthropic's Claude 3 all handle multiple types of input, but their underlying design still prioritizes text first. Voice interfaces typically need a separate step that converts speech to text before processing, and video analysis usually samples individual frames rather than treating video as one continuous stream.

Thinking Machines' architecture—training systems from scratch specifically to handle real-time multimodal input—marks a meaningful shift in how these systems might be built. If the performance claims hold, major AI labs may rethink how they build their next generation of models. The next 18 to 24 months will likely show which approach wins: traditional architectures optimized over time, or systems built from the ground up for real-time interaction.

The company's commitment to open source components sets it apart from competitors that keep everything proprietary. Open source could accelerate adoption among developers and create network effects around the technology. But balancing public contributions with the need to profit from expensive computing infrastructure will be a difficult balancing act.

Research and Industry Impact

Thinking Machines launched a research blog called "Connectionism" and plans to regularly publish research, code, and technical documentation. The first post addressed a practical production problem: how to build AI models that give the same answer every time you ask them the same question—something that matters more to businesses than pure creativity.

This research-first approach echoes lessons from OpenAI's history. OpenAI started as an open research organization but shifted toward commercial products. Thinking Machines appears to be trying a middle ground: publishing active research while building commercial products to attract top talent and contribute to the broader research community.

The announcement caught attention quickly. Murati's social media post about the news received over 250,000 views within hours, which suggests strong interest among researchers and developers in alternatives to how multimodal AI systems are currently built.

The convergence of large funding, infrastructure partnerships, and a focus on new architectural approaches positions Thinking Machines as a serious player in the next wave of AI development. If their interaction models work as promised, it could change how conversational AI systems are built and deployed across different industries.

The real test arrives when research prototypes become production systems. Real-time multimodal systems at large scale involve more than just model design. Edge deployment (running AI on user devices), bandwidth constraints, and integration challenges often prove harder than laboratory demonstrations suggest. That's where the gap between promising research and working products often emerges.