Skip to content

NVIDIA · workstation

Verdict · buy-if

RTX PRO 6000 Blackwell: 96 GB on a single workstation card

Triples the 5090's frame buffer to 96 GB ECC on the same Blackwell silicon. Same 1,792 GB/s bandwidth, so per-token throughput tracks the 5090 — you pay the multiple for capacity, ECC, and pro drivers.

Product
NVIDIA RTX PRO 6000 Blackwell Workstation Edition
Published
2026-05-01
Score
8 / 10
8/10
Stylized line drawing of the NVIDIA RTX PRO 6000 Blackwell

Pros

  • 96 GB GDDR7 ECC fits 70B-class at Q8 with full context on one card
  • Dual-slot at 600 W still drops into a workstation chassis without exotic cooling
  • ECC plus the NVIDIA RTX Enterprise driver stack — long support cycle, signed builds

Cons

  • Memory bandwidth is 1,792 GB/s — identical to a 5090, so tokens-per-second scales with capacity, not speed
  • Partner pricing runs a multiple of a 5090; no published vendor MSRP to anchor against

Verified numbers

verified 2026-05-01

  • vram (GB)

    96

  • bandwidth (GB/s)

    1,792

  • tdp (W)

    600

  • comparator 5090

    5,090

  • comparator 5090 vram (GB)

    32

  • displayport count

    4

What we tested

A single PRO 6000 in a Threadripper PRO chassis on PCIe Gen 5. Driver: NVIDIA RTX Enterprise. Workloads: a 70B-class dense model at Q8 with full context; a 100B+ MoE at IQ4; a multi-LoRA Flux stack for image generation; a short SDXL video pipeline with two LoRAs hot.

The headline is the one number that matters here — 96 GB. The 70B Q8 fits with KV-cache headroom; you do not pick a quant tier to get under the wire. The MoE at IQ4 fits where a 5090 forces you down to IQ3 or off-card entirely. The multi-LoRA stacks fit because LoRAs cost capacity, not bandwidth, and capacity is what this card sells.

What you'll feel

If you came from a 5090, the difference is what stops happening. No KV-cache truncation at long context. No swapping LoRAs in and out between generations. No "this batch won't fit, drop to fp8 attention." Per-token throughput on the dense 70B is in the same neighborhood as a 5090 running a smaller quant of the same model — the bandwidth is the same 1,792 GB/s, the SM count is higher, and the math works out close.

What you will not feel is a speed jump on workloads that already fit on a 5090. Same bandwidth, same memory subsystem class. If your model fits in 32 GB at the quant you want, this card runs it at roughly 5090 pace, not faster.

Setup notes

Dual-slot, 5.4 in H x 12.0 in L — it physically drops in. 600 W max means a real PSU (1000 W class minimum with headroom) and a 12V-2x6 cable rated for it. Four DisplayPort 2.1 outputs, no HDMI. PCIe Gen 5 — pair it with a Gen 5 platform or the bandwidth on the slot becomes the conversation. ECC is on by default in the enterprise driver; leave it on.

Who should buy

  • Engineers running 70B-class dense models or 100B+ MoE on a single workstation, where capacity is the constraint.
  • Anyone who needs ECC for a research workflow that has to be reproducible.
  • Studios running multi-LoRA stacks where the LoRA count is the bottleneck.

Who should skip

  • You are running 32B-class or smaller and a 5090 fits your quant tier.
  • You need raw tokens-per-second on a model that already fits — buy a second 5090 instead, the bandwidth math favors it.

Bottom line

96 GB on one card with ECC and the enterprise driver. Same Blackwell bandwidth as the consumer 5090, so the win is capacity, not speed. Partner pricing varies and runs a multiple of a 5090; the buy decision is whether your model size makes that multiple worth it. If 32 GB is the wall you keep hitting, this is the answer.