It will be slower for a 70b model since Deepseek is an MoE that only activates 3...

		bick_nyers on Feb 1, 2025 \| parent \| context \| favorite \| on: How to Run DeepSeek R1 671B Locally on a $2000 EPY... It will be slower for a 70b model since Deepseek is an MoE that only activates 37b at a time. That's what makes CPU inference remotely feasible here.