Skip to content

Guide · under-1k

Local LLM rig under $1k (2026)

A $1k rig won't run a 70B model at production speed. It will run 14B-class models in Q8 with full context, or 32B in Q4 — fast enough for daily dev work, no cloud bill.

Job-to-be-done · Run 14-32B models at home on a budget rig — small models, full context.

Measured

14b-class-q8 · tokens_per_second35–50 tokens_per_second
14b-class-q8 · usable_context_tokens_k32 usable_context_tokens_k
32b-class-q4-k-m · tokens_per_second18–25 tokens_per_second
32b-class-q4-k-m · usable_context_tokens_k_low8 usable_context_tokens_k_low
32b-class-q4-k-m · usable_context_tokens_k_high16 usable_context_tokens_k_high
70b-class-iq2-xxs · tokens_per_second5–8 tokens_per_second

Bars scaled to largest value in set

The job

You have $1,000 and you want a local LLM rig that runs 14-32B models at usable speed, with enough context to actually work — not toy demos. You'd rather own the hardware than rent tokens. You accept that a 70B model at this budget is a stretch goal, not the daily driver.

This guide is not for you if:

  • You need to fine-tune anything past a small LoRA. Buy more VRAM.
  • You want 70B inference at coding-assistant latency. That's a $4k floor.
  • You're doing image or video generation as the primary load. Different math.

The build

PartPickWhy
GPUNVIDIA RTX 5070 Ti (16 GB)$749 MSRP, GDDR7, 256-bit bus, current driver and CUDA support, full warranty.
CPUAMD Ryzen 5 7600Six cores is plenty when the GPU does the work. Frees ~$150 vs the 7700X for storage and PSU.
RAM32 GB DDR5-6000 (2x16 GB)Two-stick kit hits 6,000 MT/s on AM5 without drama. 32 GB is the floor for spilling layers and running a browser.
Storage1 TB NVMe Gen 4 (WD SN770 / Crucial P3 Plus)Models are big. A 70B Q4 file is roughly 40 GB. Gen 4 is the cheap-fast tier; Gen 5 is wasted here.
PSU750 W 80+ Gold (Corsair RM750e)Vendor-recommended floor for the 5070 Ti's 300 W TGP. Don't cheap out — bad PSUs cost GPUs.
CaseMid-tower with mesh frontThe 5070 Ti dumps 300 W as heat. Airflow over looks.
OSUbuntu 24.04 LTS or Windows 11Linux for fewer driver fights with vLLM/llama.cpp; Windows if you need it for other reasons.

Total lands at roughly $1,050-1,100 with current street prices. Trim the case or shop a 7600 sale to crack $1k flat.

Numbers

  • 14B-class at Q8 — ~35-50 tok/s in llama.cpp, full 32k context fits in 16 GB.
  • 32B-class at Q4_K_M — ~18-25 tok/s, 8-16k context comfortable.
  • 70B-class at IQ2_XXS — runs, ~5-8 tok/s, quality drops noticeably. Demo only.
  • Idle draw — ~60 W at the wall. Reasonable to leave on.

Tradeoffs

  • Used RTX 3090 (24 GB): street price $700-900. Eight GB more VRAM than the 5070 Ti, which lets a 32B Q5 fit and gives 70B IQ3 a real shot. You give up warranty, lose the GDDR7 bandwidth, and inherit whatever a stranger did to the card. Worth it if you're patient and know how to inspect a used GPU.
  • RTX 5070 (non-Ti): $549 MSRP, 12 GB VRAM. Saves $200 but the 12 GB ceiling pushes 14B models into Q4 territory and rules out 32B at any sane quant. Skip unless the budget is hard.
  • Apple M4 Mac mini base: $599, 16 GB unified. Quiet, efficient, sips power. Tok/s on a 14B model is roughly half the 5070 Ti and the toolchain (MLX, llama.cpp Metal) lags CUDA on day-one model support. A nice second machine, not a primary rig.

What this doesn't get you

  • 70B at production speed. That's the next budget tier.
  • Multi-GPU. One PCIe slot, one card, one PSU rail.
  • A fine-tuning station. Inference rig only.
  • Headroom for the next generation of frontier-quality open weights without a quant compromise.