\~\~ 8 to 16gb \~\~ Granite models are amazing: [NEW] \- https://huggingface.co/ibm-granite/granite-4.1-8b Gemma-E4B is a good general QA model \- https://huggingface.co/google/gemma-4-E4B-it Qwen3.5-9B is the best at this level imo \- https://huggingface.co/Qwen/Qwen3.5-9B \~\~ 16 to 64gb \~\~ Another larger Granite: This is a general chat model, really dense with world knowledge. [NEW] \- https://huggingface.co/ibm-granite/granite-4.1-30b \- Undisputed kings: The Qwens at various precisions: (Higher ceiling) - https://huggingface.co/Qwen/Qwen3.5-9B (and larger variants like 32B/72B) \- https://huggingface.co/Qwen (check latest) The Gemmas at various precisions: (More efficient) \- https://huggingface.co/google/gemma-4-E4B-it \- https://huggingface.co/google (Gemma family) \~\~ 64 to 128gb \~\~ \- Ling is a new 100B\~ contender decent agent [NEW] https://huggingface.co/inclusionAI/Ling-flash-2.0 \- Mistral medium: from my experience their models have been the most consistent! [NEW] https://huggingface.co/mistralai/Mistral-Medium-3.5-128B \~\~ 128gb - 256gb \~\~ Undisputed king: DeepSeek-V4-Flash [NEW] https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

Weekly best models for your hardware:

~~ 8 to 16gb ~~

Granite models are amazing: [NEW]

Gemma-E4B is a good general QA model

Qwen3.5-9B is the best at this level imo

~~ 16 to 64gb ~~

Another larger Granite: This is a general chat model, really dense with world knowledge. [NEW]

- Undisputed kings:

The Qwens at various precisions: (Higher ceiling)

The Gemmas at various precisions: (More efficient)

- https://huggingface.co/google/gemma-4-E4B-it

~~ 64 to 128gb ~~

- Ling is a new 100B~ contender decent agent [NEW]

- Mistral medium: from my experience their models have been the most consistent! [NEW]

~~ 128gb - 256gb ~~

Undisputed king: DeepSeek-V4-Flash [NEW]