
---- 8 GB —-
Autocomplete for coding (like Cursor Tab)
- https://huggingface.co/NexVeridian/zeta-2-4bit
- https://huggingface.co/bartowski/zed-industries_zeta-2-GGUF
Tool calling, assistant style
- https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF
---- 16 Gb —-
Here things get better:
Multimodal
- huggingface.co/Qwen/Qwen3.5-9B
- https://huggingface.co/Tesslate/OmniCoder-9B
- https://huggingface.co/unsloth/Qwen3.5-27B-GGUF
---- 24 GB —-
- The best model you can get (thanks Qwen) https://huggingface.co/Qwen/Qwen3.5-27B
- Great model (strong agents) https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B
- Mine hehe https://huggingface.co/0xSero/Qwen-3.5-28B-A3B-REAP
—— 64 GB ——
- Qwen3-coder-next-80B-4bit (coding, Claude code, general agent)
- Qwen3.5-122B-reap: (browser use, multimodal, tool calling, general agent)
—— 96 GB ——
- GLM-4.6V (multimodal and tool calls)
- Hermes-70B (Jailbroken)
- Nemotron-120B-Super: (openclaw)
- Mistral-4-Small (general agent)
—— 192 GB ——
All these are excellent top tier LLMs and approach sonnet in capabilities
- Step-3.5-Flash
- Qwen3.5-397B-REAP
- MiniMax-M2.5 (soon M2.7)
- GLM-4.7-Reap
—— 256 GB ——
#1 MiniMax-M2.5 (M2.7) - 6bit MLX
#2 Qwen3.5-262B-REAP (4-6 bits)
#3 Nemotron-122B (8-9bits)
#4 GLM-5-358B (4bit)
—— 512 GB ——
#1 MiniMax-M2.* - FP16
#2 Qwen3.5-397B - 8bit
#3 Kimi-k2.5-530B-PRISM - 4bit
#4 GLM-5 - 4bit