~~ 8 to 16gb ~~
Granite models are amazing: [NEW]
- https://huggingface.co/ibm-granite/granite-4.1-8b
Gemma-E4B is a good general QA model
- https://huggingface.co/google/gemma-4-E4B-it
Qwen3.5-9B is the best at this level imo
- https://huggingface.co/Qwen/Qwen3.5-9B
~~ 16 to 64gb ~~
Another larger Granite: This is a general chat model, really dense with world knowledge. [NEW]
- https://huggingface.co/ibm-granite/granite-4.1-30b
- Undisputed kings:
The Qwens at various precisions: (Higher ceiling)
- https://huggingface.co/Qwen (check latest)
The Gemmas at various precisions: (More efficient)
- https://huggingface.co/google/gemma-4-E4B-it
- https://huggingface.co/google (Gemma family)
~~ 64 to 128gb ~~
- Ling is a new 100B~ contender decent agent [NEW]
https://huggingface.co/inclusionAI/Ling-flash-2.0
- Mistral medium: from my experience their models have been the most consistent! [NEW]
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
~~ 128gb - 256gb ~~
Undisputed king: DeepSeek-V4-Flash [NEW]
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash