Local LLM rig under $4k (2026)

The minimum-viable workstation for serious local inference: single 5090, 64 GB system RAM, fast NVMe, and a case that holds up under sustained load.

The job

You want to run sizeable local models at home for development, research, or writing — at usable speed, with a context window that doesn't force you to chunk everything. You're allergic to the monthly cloud bill. You have ~$4k to spend and you want the rig to still feel fast in eighteen months.

This guide is not for:

Fine-tuning from scratch (you need more VRAM or a multi-GPU rig).
Pure image/video generation (different tradeoffs, covered in a separate guide).
Production inference serving (this is a workstation, not a datacenter node).

The build

Part	Pick	Why
GPU	NVIDIA RTX 5090 (32 GB)	32B-class at Q8 with full context; 70B at IQ3.
CPU	AMD Ryzen 9 9950X or similar 16-core	You'll bottleneck on single-thread + some lanes.
RAM	64 GB DDR5-6000 (2×32)	Leaves room for KV-cache spill + tooling.
Storage	2 TB PCIe 4.0 NVMe	Model weights + datasets + Docker images.
PSU	1000 W 80+ Gold, single rail	5090 is serious; don't be clever here.
Case	Airflow-first mid-tower; 3× intake / 2× exh.	Sustained loads run for hours.
OS	Windows 11 Pro or Ubuntu 24.04	Your call. Both work; drivers are mature.

Numbers

Approximate inference throughput on this build with llama.cpp, short prompt:

32B-class at Q8 — ~28–34 tok/s, full 32k context fits.
70B-class at IQ3_M — ~9–14 tok/s, ~8k context before KV pressure.
Cold start dominated by model load from NVMe (~4 seconds for a 32B Q8).

Your mileage will vary with prompt shape and sampler choice. The 32B-class sweet spot is where this rig shines; 70B-class is doable but tight.

Tradeoffs

Dual 4090 instead of a single 5090. Higher aggregate VRAM (48 GB), but you lose the clean single-card setup, and a lot of local-inference tooling doesn't cleanly split across two cards without effort.
Threadripper instead of Ryzen 9. More PCIe lanes, more cores, more money. If you'll add a second GPU in year two, worth it. If not, skip.
Cloud on-demand. Breaks even with this rig around ~18 months of heavy use, depending on your cloud tier.

What this doesn't get you

Multi-GPU training. You need NVLink, more lanes, more PSU headroom.
Proper datacenter-style serving (batching, multi-user concurrency).
A good excuse. Buy the rig.