Discussion about this post

User's avatar
Joe Marta's avatar

The H100 debuted in 2022. I don't know why its the central focus of pricing comparison when B200 has been in the wild for quite a while now, and B300 is available too. You would obviously expect new late 2025, early 2026 silicon to outperform the nVidia model from 2022. Also, if you're paying for hour, how fast the system obviously matters deeply because it reduces training run time. Which is the point of the article. But it glosses over what that performance differential is between chips and again fails to account for B200 & B300 being significantly more capable than h100 and h200 chips. Especially in full system builds with accompanying improved networking, cooling, CPU, system RAM, etc. A B200 is ~36% more expensive per hour on CoreWeave. nVidia claims as high as 15x more inference throughput on DGX B200 vs DGX H100 and 3x more training throughput (https://www.nvidia.com/en-us/data-center/dgx-b200/). You get a lot more for that 36% price uplift. And with B300's massively larger VRAM window you can theoretically not need to rent as many to get the same job done even if it does cost more per hour. H100 just isnt the target anymore.

Tumithak of the Corridors's avatar

Good piece. Lots of useful data in here and I appreciate the work that went into the comparisons.

I'd push back on one thing though. The essay treats Anthropic's multi-accelerator setup as a moat. Three accelerator families, two compiler stacks, two hyperscaler dependencies. That's framed as diversification and resilience.

I'd call it complexity.

Every additional architecture is a separate optimization pipeline, a separate debugging toolchain, a separate set of expertise your engineers have to maintain. TPUs run XLA. Trainium runs Neuron. Nvidia runs CUDA. These aren't interchangeable parts. You're tripling the surface area for things to go wrong.

The piece actually flags this as a "critical risk" and a "genuine cost that partially offsets the hardware savings." Then it just moves on and treats the rest as pure upside.

The more parts a machine has, the more ways it can break. The cost savings only hold when everything works. The moment something breaks in a way that's unique to the multi-architecture setup, those savings evaporate into engineering hours.

History tends to favor the simpler machine.

5 more comments...

No posts

Ready for more?