Most of the HF models have a code snippet that you can use in order to run inference on the model. The transformers library will take care of the download as a dependency when you run the code. Typically, a python 3.10-3.11 environment is sufficient as environment. Example: https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct#t...
If you have a MBP, you need to adjust the device name in the examples from "cuda" to "mps".
If you have a MBP, you need to adjust the device name in the examples from "cuda" to "mps".