The job
You have $1,000 and you want a local LLM rig that runs 14-32B models at usable speed, with enough context to actually work — not toy demos. You'd rather own the hardware than rent tokens. You accept that a 70B model at this budget is a stretch goal, not the daily driver.
This guide is not for you if:
- You need to fine-tune anything past a small LoRA. Buy more VRAM.
- You want 70B inference at coding-assistant latency. That's a $4k floor.
- You're doing image or video generation as the primary load. Different math.
The build
| Part | Pick | Why |
|---|---|---|
| GPU | NVIDIA RTX 5070 Ti (16 GB) | $749 MSRP, GDDR7, 256-bit bus, current driver and CUDA support, full warranty. |
| CPU | AMD Ryzen 5 7600 | Six cores is plenty when the GPU does the work. Frees ~$150 vs the 7700X for storage and PSU. |
| RAM | 32 GB DDR5-6000 (2x16 GB) | Two-stick kit hits 6,000 MT/s on AM5 without drama. 32 GB is the floor for spilling layers and running a browser. |
| Storage | 1 TB NVMe Gen 4 (WD SN770 / Crucial P3 Plus) | Models are big. A 70B Q4 file is roughly 40 GB. Gen 4 is the cheap-fast tier; Gen 5 is wasted here. |
| PSU | 750 W 80+ Gold (Corsair RM750e) | Vendor-recommended floor for the 5070 Ti's 300 W TGP. Don't cheap out — bad PSUs cost GPUs. |
| Case | Mid-tower with mesh front | The 5070 Ti dumps 300 W as heat. Airflow over looks. |
| OS | Ubuntu 24.04 LTS or Windows 11 | Linux for fewer driver fights with vLLM/llama.cpp; Windows if you need it for other reasons. |
Total lands at roughly $1,050-1,100 with current street prices. Trim the case or shop a 7600 sale to crack $1k flat.
Numbers
- 14B-class at Q8 — ~35-50 tok/s in llama.cpp, full 32k context fits in 16 GB.
- 32B-class at Q4_K_M — ~18-25 tok/s, 8-16k context comfortable.
- 70B-class at IQ2_XXS — runs, ~5-8 tok/s, quality drops noticeably. Demo only.
- Idle draw — ~60 W at the wall. Reasonable to leave on.
Tradeoffs
- Used RTX 3090 (24 GB): street price $700-900. Eight GB more VRAM than the 5070 Ti, which lets a 32B Q5 fit and gives 70B IQ3 a real shot. You give up warranty, lose the GDDR7 bandwidth, and inherit whatever a stranger did to the card. Worth it if you're patient and know how to inspect a used GPU.
- RTX 5070 (non-Ti): $549 MSRP, 12 GB VRAM. Saves $200 but the 12 GB ceiling pushes 14B models into Q4 territory and rules out 32B at any sane quant. Skip unless the budget is hard.
- Apple M4 Mac mini base: $599, 16 GB unified. Quiet, efficient, sips power. Tok/s on a 14B model is roughly half the 5070 Ti and the toolchain (MLX, llama.cpp Metal) lags CUDA on day-one model support. A nice second machine, not a primary rig.
What this doesn't get you
- 70B at production speed. That's the next budget tier.
- Multi-GPU. One PCIe slot, one card, one PSU rail.
- A fine-tuning station. Inference rig only.
- Headroom for the next generation of frontier-quality open weights without a quant compromise.