Technology

xAI Gives Anthropic Access to Colossus Supercomputer in Major AI Partnership

xAI has granted Anthropic access to its Colossus supercomputer, a system with 150,000 GPUs that was built in just four months. The partnership lets Anthropic train its Claude AI models on dedicated ha

Martin HollowayPublished 6h ago6 min readBased on 2 sources
Reading level
xAI Gives Anthropic Access to Colossus Supercomputer in Major AI Partnership

xAI Gives Anthropic Access to Colossus Supercomputer in Major AI Partnership

xAI, the AI company backed by Elon Musk, has announced a partnership to let Anthropic use its Colossus supercomputer. Anthropic, which builds the Claude AI assistant and focuses on safe AI development, will now have access to this massive computing resource to train and improve its models.

What Is Colossus and What Can It Do

The Colossus supercomputer is built around more than 150,000 GPUs—specialized computer chips designed for AI work. xAI reports the system stays online and working 99% of the time, and was assembled in four months, much faster than the typical 24-month construction timeline for systems this large.

To put this in perspective, think of a GPU as a specialized worker trained to do one job very well and very fast. Colossus has 150,000 of these workers operating together. That makes it one of the largest dedicated AI training systems in the world right now. xAI has plans to expand it to 1 million GPUs, which would make it enormous by any standard.

The system uses high-speed connections between all those GPUs so they can talk to each other quickly during training. The 99% uptime is notable because keeping this many machines running smoothly at once is genuinely difficult—any single failure can ripple across the whole system.

Why This Partnership Matters

Anthropic gets a direct pipeline to enormous computing power without going through a general cloud provider like Amazon or Microsoft. That means the company has more control over how its training runs are scheduled and which resources it gets. For a company developing large AI models, this is valuable because training these systems is expensive and time-consuming, and having dedicated access smooths out the process.

For xAI, this is a way to earn revenue from Colossus while it builds toward that 1 million GPU goal. Right now, xAI mostly uses the system for its own AI work. Renting capacity to Anthropic helps justify the enormous investment in building and running the infrastructure.

The bigger picture here is one we have seen before in computing history. When Amazon first launched AWS in the early 2000s, the company had built massive data centers for its own retail business. Eventually, Amazon realized it could sell access to those systems to other companies and entirely new businesses—cloud computing—was born. Google and Microsoft followed similar paths. xAI appears to be applying that same playbook: build what you need, then sell excess capacity to partners.

The Engineering Challenge

Running 150,000 GPUs together is not straightforward. The system needs enormous amounts of electricity—we are talking megawatts—supplied reliably and in multiple independent streams so a single power failure does not take everything down. Cooling is equally critical; all those chips generate intense heat, and letting them overheat stops training instantly.

The network connecting all these GPUs is also crucial. Imagine trying to coordinate 150,000 workers all communicating with each other simultaneously. That is roughly what happens during AI training, where each GPU sends and receives huge amounts of data. Colossus uses specialized switches and multiple redundant pathways to prevent any single link from becoming a bottleneck.

The fact that xAI built this in four months, rather than the typical 18 to 24 months, suggests the company had access to existing data centers and established relationships with hardware suppliers that let it move faster.

What Comes Next

xAI wants to grow Colossus to 1 million GPUs. At today's GPU costs and power bills, that would be an extraordinary expense—likely billions of dollars. Whether it makes economic sense depends on whether xAI can sign up enough customers to actually use all that capacity reliably.

The partnership with Anthropic is a first step. For Colossus to justify its full expansion, xAI would need additional deals with other AI companies, research labs, or large enterprises. Each additional customer helps spread the fixed costs.

From a technical standpoint, managing that scale brings new problems. Failures become more likely simply because there are more things that can break. Communication between components becomes harder to coordinate. The math of distributed computing—the field that handles problems split across many machines—gets genuinely harder at these scales.

The broader context here is straightforward: as AI models get bigger and more capable, they require more computing power, and that power gets more expensive and harder to access. Companies like xAI that can build and operate large-scale infrastructure are becoming critical players in the AI industry. Partnerships like this one between xAI and Anthropic may become how AI development happens going forward—not as isolated efforts by individual companies, but as collaborations between infrastructure builders and model developers, each doing what they do best.