Instructions to use inclusionAI/LLaDA2.0-flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/LLaDA2.0-flash with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("inclusionAI/LLaDA2.0-flash", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add support for greedy decoding
#3
by adityastomar - opened
The current implementation of sampling only uses torch.multinomial and does not support greedy decoding when temperature is 0.0 / top-k is 0 / top-p is 1.0. This PR adds support for greedy decoding.