Instructions to use richardyoung/bfs-prover-v2-32b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use richardyoung/bfs-prover-v2-32b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="richardyoung/bfs-prover-v2-32b",
	filename="BFS-Prover-V2-32B-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use richardyoung/bfs-prover-v2-32b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf richardyoung/bfs-prover-v2-32b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf richardyoung/bfs-prover-v2-32b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf richardyoung/bfs-prover-v2-32b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf richardyoung/bfs-prover-v2-32b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf richardyoung/bfs-prover-v2-32b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf richardyoung/bfs-prover-v2-32b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf richardyoung/bfs-prover-v2-32b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf richardyoung/bfs-prover-v2-32b:Q4_K_M

Use Docker

docker model run hf.co/richardyoung/bfs-prover-v2-32b:Q4_K_M

LM Studio
Jan

vLLM

How to use richardyoung/bfs-prover-v2-32b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "richardyoung/bfs-prover-v2-32b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "richardyoung/bfs-prover-v2-32b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/richardyoung/bfs-prover-v2-32b:Q4_K_M

Ollama
How to use richardyoung/bfs-prover-v2-32b with Ollama:
```
ollama run hf.co/richardyoung/bfs-prover-v2-32b:Q4_K_M
```

Unsloth Studio

How to use richardyoung/bfs-prover-v2-32b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for richardyoung/bfs-prover-v2-32b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for richardyoung/bfs-prover-v2-32b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for richardyoung/bfs-prover-v2-32b to start chatting

Docker Model Runner
How to use richardyoung/bfs-prover-v2-32b with Docker Model Runner:
```
docker model run hf.co/richardyoung/bfs-prover-v2-32b:Q4_K_M
```

Lemonade

How to use richardyoung/bfs-prover-v2-32b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull richardyoung/bfs-prover-v2-32b:Q4_K_M

Run and chat with the model

lemonade run user.bfs-prover-v2-32b-Q4_K_M

List all available models

lemonade list

🔬 BFS-Prover-V2 32B - GGUF

Formal Mathematical Proof Generation for Lean4

Original Model | Ollama Registry | llama.cpp

📖 What is This?

This is BFS-Prover-V2 32B, a specialized 32-billion parameter model fine-tuned for formal mathematical proof generation and theorem proving in Lean4. Quantized to GGUF format for efficient local inference with Ollama or llama.cpp!

✨ Why You'll Love It

🔬 Formal Verification - Generates valid Lean4 proofs
🧮 Mathematical Reasoning - Deep understanding of mathematical concepts
📚 131K Context - Handle complex proof sequences and large mathematical contexts
⚡ Local Inference - Run entirely on your machine, no API calls
🔒 Privacy First - Your proofs never leave your computer
🎯 Multiple Quantizations - Choose your speed/quality trade-off
🚀 Ollama Ready - One command to start proving theorems
🔧 llama.cpp Compatible - Works with your favorite tools

🎯 Quick Start

Option 1: Ollama (Easiest!)

Pull and run directly from the Ollama registry:

# Recommended: Q5_K_M (best balance)
ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M

# Other variants
ollama run richardyoung/bfs-prover-v2-32b:Q6_K    # Highest quality
ollama run richardyoung/bfs-prover-v2-32b:Q4_K_M  # Faster, smaller

That's it! Start proving theorems! 🎉

Option 2: Build from Modelfile

Download this repo and build locally:

# Clone or download the modelfiles
ollama create bfs-prover-v2-q5 -f modelfiles/bfs-prover-v2-32b--Q5_K_M.Modelfile
ollama run bfs-prover-v2-q5

Option 3: llama.cpp

Use with llama.cpp directly:

# Download the GGUF file (replace variant as needed)
huggingface-cli download richardyoung/bfs-prover-v2-32b BFS-Prover-V2-32B-Q5_K_M.gguf --local-dir ./

# Run with llama.cpp
./llama-cli -m BFS-Prover-V2-32B-Q5_K_M.gguf -p "theorem add_comm (a b : Nat) : a + b = b + a :::"

💻 System Requirements

Component	Minimum	Recommended
RAM	24 GB	32 GB+
Storage	30 GB free	40+ GB free
CPU	Modern 8-core	16+ cores
GPU	Optional (CPU-only works!)	Metal/CUDA for acceleration
OS	macOS, Linux, Windows	Latest versions

💡 Tip: Larger quantizations (Q6_K) need more RAM but produce better proofs. Smaller ones (Q4_K_M) are faster but less precise.

🎨 Available Quantizations

Choose the right balance for your needs:

Quantization	Size	Quality	Speed	RAM Usage	Best For
Q6_K	27 GB	⭐⭐⭐⭐⭐	⭐⭐⭐	~32 GB	Production proofs, research
Q5_K_M (recommended)	23 GB	⭐⭐⭐⭐	⭐⭐⭐⭐	~28 GB	Daily use, best balance
Q4_K_M	19 GB	⭐⭐⭐	⭐⭐⭐⭐⭐	~24 GB	Quick iteration, constrained systems

Variant Details

Variant	Size	Context	Best For
`Q6_K`	27 GB	131K	Highest quality, complex proofs
`Q5_K_M`	23 GB	131K	Recommended for most users
`Q4_K_M`	19 GB	131K	Faster inference, lower memory

📚 Usage Examples

Theorem Proving

ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M "theorem add_comm (a b : Nat) : a + b = b + a :::"

Proof State Completion

ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M "∀ n : Nat, n + 0 = n :::"

Interactive Proving

ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M
>>> theorem zero_add (n : Nat) : 0 + n = n :::
>>> lemma succ_eq_add_one (n : Nat) : Nat.succ n = n + 1 :::

Using with Python

import requests
import json

def prove_lean4(theorem_statement):
    response = requests.post('http://localhost:11434/api/generate',
        json={
            'model': 'richardyoung/bfs-prover-v2-32b:Q5_K_M',
            'prompt': f'{theorem_statement} :::',
            'stream': False
        })
    return response.json()['response']

# Example
proof = prove_lean4("theorem add_zero (n : Nat) : n + 0 = n")
print(proof)

🏗️ Model Details

Click to expand technical details

Architecture

Base Model: BFS-Prover-V2-32B by ByteDance-Seed
Architecture: Qwen2.5-32B
Parameters: 32 Billion
Context Length: 131,072 tokens (131K)
Quantization: GGUF format (Q4_K_M to Q6_K)
Optimization: Formal mathematical proof generation
Training: Specialized for Lean4 theorem proving

Specialization

The model excels at:

Lean4 Theorem Proving - Generate formal proofs
Proof Completion - Complete partial proof states
Mathematical Formalization - Translate informal math to formal statements
Tactic Generation - Suggest proof tactics for given goals
Proof Search - Explore proof spaces using BFS (Breadth-First Search)

Prompt Format

The model uses a special format where proof requests end with ::::

theorem add_comm (a b : Nat) : a + b = b + a :::

This signals the model to generate a formal proof for the given theorem statement.

⚡ Performance Tips

Getting the best results

Choose the right quantization - Q5_K_M is recommended for daily use
Use proper Lean4 syntax - Follow standard Lean4 theorem statement format
Include the ::: suffix - This is critical for the model to understand you want a proof
Set low temperature - Use 0.1-0.3 for precise, valid proofs
Provide context - Include relevant definitions or prior lemmas
GPU acceleration - Use Metal (Mac) or CUDA (NVIDIA) for faster inference
Large context - Take advantage of the 131K context for complex proofs

Example Ollama Configuration

# Create with custom parameters
ollama create my-bfs-prover -f modelfiles/bfs-prover-v2-32b--Q5_K_M.Modelfile

# Edit the Modelfile to add:
PARAMETER temperature 0.2
PARAMETER top_p 0.95
PARAMETER num_ctx 131072

🔧 Building Custom Variants

You can modify the included Modelfiles to customize behavior:

FROM ./BFS-Prover-V2-32B-Q5_K_M.gguf

# Lean4-specific prompt template (appends ::: to user messages)
TEMPLATE """{{- range $i, $m := .Messages }}{{- if eq $m.Role "user" }}{{ $m.Content }}:::{{ end }}{{- if eq $m.Role "assistant" }}{{ $m.Content }}{{ end }}{{- end }}"""

# System prompt
SYSTEM You are a formal mathematics expert specializing in Lean4 theorem proving. Generate valid, well-structured proofs.

# Parameters
PARAMETER temperature 0.2
PARAMETER num_ctx 131072

Then build:

ollama create my-custom-prover -f custom.Modelfile

🎓 Use Cases

📐 Mathematical Research - Formalize and verify mathematical theorems
🏫 Education - Learn formal mathematics and proof techniques
🔍 Proof Exploration - Discover new proof strategies
✅ Verification - Verify correctness of mathematical statements
📚 Library Building - Contribute to Lean4 mathematical libraries
🤖 Automated Reasoning - Build automated theorem proving systems

⚠️ Known Limitations

💾 Large Size - Even the smallest variant needs 19+ GB of storage
🐏 RAM Intensive - Requires significant system memory (24+ GB)
⏱️ Inference Speed - Slower than smaller models (trade-off for quality)
🌐 English-Focused - Best performance with English mathematical language
📝 Lean4-Specialized - Optimized specifically for Lean4, not general math
🎯 Proof Format - Requires proper Lean4 syntax and ::: suffix

📄 License

Apache 2.0 - Same as the original model. Free for commercial use!

🙏 Acknowledgments

Original Model: ByteDance-Seed for creating BFS-Prover-V2
GGUF Format: Georgi Gerganov for llama.cpp
Ollama: Ollama team for the amazing runtime
Lean Community: All the mathematicians and developers working on formal verification

🔗 Useful Links

📦 Original Model: ByteDance-Seed/BFS-Prover-V2-32B
🚀 Ollama Registry: richardyoung/bfs-prover-v2-32b
🛠️ llama.cpp: GitHub
📖 Lean4 Documentation: Lean Community
💬 Discussions: Ask questions here!

📚 Citation

If you use this model in your research, please cite:

@misc{bfs-prover-v2-2024,
  title={BFS-Prover-V2: Formal Mathematical Proof Generation},
  author={ByteDance-Seed Team},
  year={2024},
  url={https://huggingface.co/ByteDance-Seed/BFS-Prover-V2-32B}
}

🎮 Pro Tips

Advanced usage patterns

1. Integration with Lean4 IDE

Use with Lean4 VSCode extension for interactive proving:

-- Use the model to suggest proof tactics
theorem my_theorem : ∀ n : Nat, n + 0 = n := by
  -- Query model for tactics
  sorry

2. API Server Mode

Run as an OpenAI-compatible API:

ollama serve
# Then use the API at http://localhost:11434

3. Batch Proof Generation

Process multiple theorems:

for theorem in theorems.txt; do
  ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M "$theorem :::" >> proofs.txt
done

4. Chain-of-Thought Proving

Break complex proofs into steps:

# Step 1: Generate high-level strategy
ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M "Outline proof strategy for: theorem_statement :::"

# Step 2: Generate detailed proof
ollama run richardyoung/bfs-prover-v2-32b:Q5_K_M "Complete proof: theorem_statement :::"

Quantized with ❤️ by richardyoung

If you find this useful, please ⭐ star the repo and share with other mathematicians!

Format: GGUF | Runtime: Ollama / llama.cpp | Created: October 2025

Hardware Requirements

BFS-Prover-V2 is a 32B model specialized for formal theorem proving in Lean4:

Quantization	Model Size	VRAM Required	Quality
Q3_K_M	~15 GB	18 GB	Good
Q4_K_M	~19 GB	24 GB	Very Good - recommended
Q5_K_M	~23 GB	28 GB	Excellent
Q6_K	~27 GB	32 GB	Near original
Q8_0	~35 GB	40 GB	Original quality

Recommended Setups

Hardware	Recommended Quantization
RTX 4090 (24GB)	Q4_K_M
RTX 3090 (24GB)	Q4_K_M
A100 (40GB)	Q8_0
Mac M2 Ultra (192GB)	Q8_0

Use Case

This model excels at:

Lean4 theorem proving
Formal mathematical proofs
Proof search and verification

Downloads last month: 25

GGUF

Model size

33B params

Architecture

qwen2

Hardware compatibility

4-bit

5-bit

6-bit

Model tree for richardyoung/bfs-prover-v2-32b

Base model

Qwen/Qwen2.5-32B

Finetuned

ByteDance-Seed/BFS-Prover-V2-32B

Quantized

(4)

this model

Collection including richardyoung/bfs-prover-v2-32b

GGUF Models for Ollama

Collection

Ready-to-use GGUF quantizations for Ollama, llama.cpp, and local inference. • 8 items • Updated Dec 1, 2025