Instructions to use Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF

SGLang

How to use Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF",
    max_seq_length=2048,
)

Docker Model Runner
How to use Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF
```

Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF

⚡ What is MTP (Multi-Token Prediction)?

MTP (Multi-Token Prediction) is a technique introduced in the Qwen3.6 architecture that enables the model to predict multiple future tokens simultaneously. By leveraging dedicated MTP heads, this model supports speculative decoding, where a draft model predicts multiple tokens at once and the target model verifies them in parallel, resulting in significant inference speedups without sacrificing output quality.

This GGUF release preserves the MTP heads from unsloth/Qwen3.6-35B-A3B, making it compatible with mainstream inference frameworks that support MTP-based speculative decoding (such as llama.cpp and its derivatives). For optimal throughput, pair this MTP-enabled GGUF with a corresponding draft model.

Source model: Jackrong/Qwopus3.6-35B-A3B-v1 MTP source: unsloth/Qwen3.6-35B-A3B

Uploaded GGUF variants:

Qwopus3.6-35B-A3B-v1-MTP-Q2_K.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q3_K_S.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q3_K_M.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q3_K_L.gguf
Qwopus3.6-35B-A3B-v1-MTP-IQ4_XS.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q4_K_S.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q4_K_M.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q5_K_S.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q5_K_M.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q6_K.gguf
Qwopus3.6-35B-A3B-v1-MTP-Q8_0.gguf
Qwopus3.6-35B-A3B-v1-MTP-BF16.gguf

This release was prepared by validating or injecting Qwen MTP/nextn tensors before GGUF conversion.

Downloads last month: 17,965

GGUF

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

unsloth/Qwen3.6-35B-A3B

Adapter

(7)

this model

Datasets used to train Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF

Collection including Jackrong/Qwopus3.6-35B-A3B-v1-MTP-GGUF

🚀 Qwen-MTP

Collection

⚡ MTP (Multi Token Prediction) speculative decoding enables models like Qwen3.6 to have ~1.4-2.2x faster generation with no change in accuracy. • 6 items • Updated 6 days ago • 17