Best AI models to run on your hardware level (8-24 gb ram)

---- 8 GB —-

Autocomplete for coding (like Cursor Tab)

Tool calling, assistant style

---- 16 Gb —-

Here things get better:

Multimodal

- huggingface.co/Qwen/Qwen3.5-9B

---- 24 GB —-

- The best model you can get (thanks Qwen) https://huggingface.co/Qwen/Qwen3.5-27B

—— 64 GB ——

- Qwen3-coder-next-80B-4bit (coding, Claude code, general agent)

- Qwen3.5-122B-reap: (browser use, multimodal, tool calling, general agent)

—— 96 GB ——

- GLM-4.6V (multimodal and tool calls)

- Hermes-70B (Jailbroken)

- Nemotron-120B-Super: (openclaw)

- Mistral-4-Small (general agent)

—— 192 GB ——

All these are excellent top tier LLMs and approach sonnet in capabilities

- Step-3.5-Flash

- Qwen3.5-397B-REAP

- MiniMax-M2.5 (soon M2.7)

- GLM-4.7-Reap

—— 256 GB ——

#1 MiniMax-M2.5 (M2.7) - 6bit MLX

#2 Qwen3.5-262B-REAP (4-6 bits)

#3 Nemotron-122B (8-9bits)

#4 GLM-5-358B (4bit)

—— 512 GB ——

#1 MiniMax-M2.* - FP16

#2 Qwen3.5-397B - 8bit

#3 Kimi-k2.5-530B-PRISM - 4bit

#4 GLM-5 - 4bit

For non bs ai news join