Skip to content

Guide · under-4k

Image and video generation rig under $4k (2026)

Single RTX 5090, 64 GB DDR5, 4 TB NVMe. Built for sustained image and short-video runs — Flux, SDXL, Wan 2.2 — without the cloud bill or the queue.

Job-to-be-done · Generate images and short videos locally — Flux, SDXL, Wan 2.2 — at production iteration speeds.

Measured

sdxl-1024-1024 · seconds_per_image_low4 seconds_per_image_low
sdxl-1024-1024 · seconds_per_image_high7 seconds_per_image_high
flux-1-dev-1024-1024 · seconds_per_image_low12 seconds_per_image_low
flux-1-dev-1024-1024 · seconds_per_image_high20 seconds_per_image_high

Bars scaled to largest value in set

The job

You generate images and short videos locally. Flux for stills you actually ship. SDXL when speed matters more than fidelity. Wan 2.2 when the brief calls for motion. You iterate in dozens-to-hundreds per session, not ones, and you're tired of waiting on a shared cloud queue at 9pm. You have ~$4k and you'd like to spend it once.

The shape of this workload is different from an LLM rig. VRAM ceilings are softer — Flux fits in 24 GB, SDXL fits in 12 GB, Wan 2.2 scales with what you give it. What hurts is everything around the model: the checkpoint stack, the LoRA library, the VAE intermediates, the sustained 100% GPU draw across a multi-hour session.

This guide is not for you if:

  • LLM inference is the primary load. Different math, different rig.
  • You need real-time video. Wan 2.2 is minutes per clip.
  • You're training a foundation model. This is an inference + small-LoRA box.

The build

PartPickWhy
GPUNVIDIA RTX 5090 (32 GB)32 GB GDDR7, 1,792 GB/s memory bandwidth. Flux + LoRA + ControlNet stack fits with room to spare.
CPUAMD Ryzen 9 9950X16 cores soak VAE decode, image preprocessing, and ffmpeg encode without choking the GPU pipeline.
RAM64 GB DDR5-6000 (2x32)VAE tiles, model swaps, and Wan 2.2 intermediates spill into system RAM. 32 GB runs out the moment you queue a batch.
Storage4 TB Samsung 990 Pro NVMe (Gen 4)A working checkpoint + LoRA library is 1-2 TB before you notice. Cold-loading models from a slow disk wastes session time.
PSU1000 W 80+ GoldRTX 5090 draws 575 W TGP. Headroom for transient spikes and a sustained-load duty cycle.
CaseFractal Define 7 / Lian Li O11D EVOThree intake fans minimum. Sustained compute is the workload — burst-tuned cases thermal-throttle by hour two.
OSWindows 11 + WSL2, or Ubuntu 24.04ComfyUI, Wan2GP, Forge all run on either. Pick what your toolchain already targets.

Approximate total: $3,800. GPU is $1,999 of that.

Numbers

  • SDXL 1024x1024 — 4-7 sec per image.
  • Flux 1.dev 1024x1024 — 12-20 sec per image.
  • Wan 2.2 short clip — minutes per clip; varies wildly with length, resolution, and steps.
  • SDXL character LoRA training — under an hour on a small dataset.

Tradeoffs

  • Drop to a 4090 (24 GB), spend the savings on storage. You lose the 32 GB Flux-plus-everything-loaded headroom and the GDDR7 bandwidth, but you keep most of the throughput. Reasonable if you found a deal.
  • Drop the GPU to a 5080 (16 GB). Don't. SDXL is fine; Flux gets tight; Wan 2.2 starts forcing offloads. Rigs you have to fight aren't fun rigs.
  • Add a second 5090 later. ComfyUI parallelizes batch jobs across GPUs cleanly. Leave PSU headroom and a free PCIe slot now if this is the plan.

What this doesn't get you

  • Real-time video generation. Wan 2.2 is minutes per clip, not frames per second.
  • Training a foundation model. This is an inference + small-LoRA rig, not an H100 substitute.
  • A quiet room. 575 W of sustained GPU draw is going to be audible.