7 Comments
User's avatar
Joe Marta's avatar

The H100 debuted in 2022. I don't know why its the central focus of pricing comparison when B200 has been in the wild for quite a while now, and B300 is available too. You would obviously expect new late 2025, early 2026 silicon to outperform the nVidia model from 2022. Also, if you're paying for hour, how fast the system obviously matters deeply because it reduces training run time. Which is the point of the article. But it glosses over what that performance differential is between chips and again fails to account for B200 & B300 being significantly more capable than h100 and h200 chips. Especially in full system builds with accompanying improved networking, cooling, CPU, system RAM, etc. A B200 is ~36% more expensive per hour on CoreWeave. nVidia claims as high as 15x more inference throughput on DGX B200 vs DGX H100 and 3x more training throughput (https://www.nvidia.com/en-us/data-center/dgx-b200/). You get a lot more for that 36% price uplift. And with B300's massively larger VRAM window you can theoretically not need to rent as many to get the same job done even if it does cost more per hour. H100 just isnt the target anymore.

Tumithak of the Corridors's avatar

Good piece. Lots of useful data in here and I appreciate the work that went into the comparisons.

I'd push back on one thing though. The essay treats Anthropic's multi-accelerator setup as a moat. Three accelerator families, two compiler stacks, two hyperscaler dependencies. That's framed as diversification and resilience.

I'd call it complexity.

Every additional architecture is a separate optimization pipeline, a separate debugging toolchain, a separate set of expertise your engineers have to maintain. TPUs run XLA. Trainium runs Neuron. Nvidia runs CUDA. These aren't interchangeable parts. You're tripling the surface area for things to go wrong.

The piece actually flags this as a "critical risk" and a "genuine cost that partially offsets the hardware savings." Then it just moves on and treats the rest as pure upside.

The more parts a machine has, the more ways it can break. The cost savings only hold when everything works. The moment something breaks in a way that's unique to the multi-architecture setup, those savings evaporate into engineering hours.

History tends to favor the simpler machine.

Chris Zeoli's avatar

Very true! This can go both ways.

Semi Fundamental's avatar

Strong analysis of the compute landscape

💀Public Skeleton🏴‍☠️'s avatar

💀 Gemini AI Pro, an upgraded version of Gemini, didn't create a political meme I was creating.

It claimed to do so for community safety guidelines rule.

That it couldn't create the likeness of the occupant in the white house.

I argued with Gemini, claiming it was censoring my freedom of speech at a time of great importance (Iran war).

I also told Gemini that Anthropic AI was being used to kill people, so it was contradictory to other AI practices.

I said Ill have to use another AI like grok if I had to.

It eventually produced the graphic which was an actual truthful reflection - not for spite or smearing.

It eventually produced the illustration, but not after trying to stifle the effort and creative process. Ill publish it later when done.

AI systems (aka a billionaire run system), from this real world experience, is actively working for those in power.

Try it. Argue back. See what happens.

🏴

Jose's avatar

Silicon strategy matters but the real bottleneck is power. Every custom ASIC roadmap is irrelevant if you can't get grid connection in time. Grid wait times are 3-5 years in most US markets. The companies solving power procurement will be key in the next months

Adell Hanson-Kahn's avatar

good perspective, little bit repetitive. I appreciate the sources and data