Instructions to use lumolabs-ai/Lumo-70B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lumolabs-ai/Lumo-70B-Instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lumolabs-ai/Lumo-70B-Instruct",
	filename="Lumo-70B-Instruct-FT-Q4_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use lumolabs-ai/Lumo-70B-Instruct with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0
# Run inference directly in the terminal:
llama-cli -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0
# Run inference directly in the terminal:
llama-cli -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0
# Run inference directly in the terminal:
./llama-cli -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0

Use Docker

docker model run hf.co/lumolabs-ai/Lumo-70B-Instruct:Q4_0

LM Studio
Jan
Ollama
How to use lumolabs-ai/Lumo-70B-Instruct with Ollama:
```
ollama run hf.co/lumolabs-ai/Lumo-70B-Instruct:Q4_0
```

Unsloth Studio

How to use lumolabs-ai/Lumo-70B-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lumolabs-ai/Lumo-70B-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lumolabs-ai/Lumo-70B-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lumolabs-ai/Lumo-70B-Instruct to start chatting

How to use lumolabs-ai/Lumo-70B-Instruct with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "lumolabs-ai/Lumo-70B-Instruct:Q4_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use lumolabs-ai/Lumo-70B-Instruct with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lumolabs-ai/Lumo-70B-Instruct:Q4_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default lumolabs-ai/Lumo-70B-Instruct:Q4_0

Run Hermes

hermes

Docker Model Runner
How to use lumolabs-ai/Lumo-70B-Instruct with Docker Model Runner:
```
docker model run hf.co/lumolabs-ai/Lumo-70B-Instruct:Q4_0
```

Lemonade

How to use lumolabs-ai/Lumo-70B-Instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lumolabs-ai/Lumo-70B-Instruct:Q4_0

Run and chat with the model

lemonade run user.Lumo-70B-Instruct-Q4_0

List all available models

lemonade list

🧠 Lumo-70B-Instruct Model

Overview

Introducing Lumo-70B-Instruct - the largest and most advanced AI model ever created for the Solana ecosystem. Built on Meta's groundbreaking LLaMa 3.3 70B Instruct foundation, this revolutionary model represents a quantum leap in blockchain-specific artificial intelligence. With an unprecedented 70 billion parameters and trained on the most comprehensive Solana documentation dataset ever assembled, Lumo-70B-Instruct sets a new standard for developer assistance in the blockchain space.

(Knowledge cut-off date: 29th January, 2025)

🎯 Key Features

Unprecedented Scale: First-ever 70B parameter model specifically optimized for Solana development
Comprehensive Knowledge: Trained on the largest curated dataset of Solana documentation ever assembled
Advanced Architecture: Leverages state-of-the-art quantization and optimization techniques
Superior Context Understanding: Enhanced capacity for complex multi-turn conversations
Unmatched Code Generation: Near human-level code completion and problem-solving capabilities
Revolutionary Efficiency: Advanced 4-bit quantization for optimal performance

🚀 Model Card

Parameter	Details
Base Model	Meta LLaMa 3.3 70B Instruct
Fine-Tuning Framework	HuggingFace Transformers, 4-bit Quantization
Dataset Size	28,502 expertly curated Q&A pairs
Context Length	128K tokens
Training Steps	10,000
Learning Rate	3e-4
Batch Size	1 per GPU with 4x gradient accumulation
Epochs	2
Model Size	70 billion parameters (quantized for efficiency)
Quantization	4-bit NF4 with FP16 compute dtype

📊 Model Architecture

Advanced Training Pipeline

The model employs cutting-edge quantization and optimization techniques to harness the full potential of 70B parameters:

+---------------------------+     +----------------------+     +-------------------------+
|    Base Model            |     |   Optimization       |     |    Fine-Tuned Model    |
|  LLaMa 3.3 70B Instruct  | --> | 4-bit Quantization  | --> |   Lumo-70B-Instruct    |
|                         |     |   SDPA Attention     |     |                         |
+---------------------------+     +----------------------+     +-------------------------+

Dataset Sources

Comprehensive integration of all major Solana ecosystem documentation:

Source	Documentation Coverage
Jito	Complete Jito wallet and feature documentation
Raydium	Full DEX documentation and protocol specifications
Jupiter	Comprehensive DEX aggregator documentation
Helius	Complete developer tools and API documentation
QuickNode	Full Solana infrastructure documentation
ChainStack	Comprehensive node and infrastructure documentation
Meteora	Complete protocol and infrastructure documentation
PumpPortal	Full platform documentation and specifications
DexScreener	Complete DEX explorer documentation
MagicEden	Comprehensive NFT marketplace documentation
Tatum	Complete blockchain API and tools documentation
Alchemy	Full blockchain infrastructure documentation
Bitquery	Comprehensive blockchain data solution documentation

🛠️ Installation and Usage

1. Installation

pip install transformers datasets bitsandbytes accelerate

2. Load the Model with Advanced Quantization

from transformers import LlamaForCausalLM, AutoTokenizer
import torch
from transformers import BitsAndBytesConfig

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    llm_int8_enable_fp32_cpu_offload=True
)

model = LlamaForCausalLM.from_pretrained(
    "lumolabs-ai/Lumo-70B-Instruct",
    device_map="auto",
    quantization_config=bnb_config,
    use_cache=False,
    attn_implementation="sdpa"
)
tokenizer = AutoTokenizer.from_pretrained("lumolabs-ai/Lumo-70B-Instruct")

3. Optimized Inference

def complete_chat(model, tokenizer, messages, max_new_tokens=128):
    inputs = tokenizer.apply_chat_template(
        messages,
        return_tensors="pt",
        return_dict=True,
        add_generation_prompt=True
    ).to(model.device)
    
    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.95
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
response = complete_chat(model, tokenizer, [
    {"role": "system", "content": "You are Lumo, an expert Solana assistant."},
    {"role": "user", "content": "How do I implement concentrated liquidity pools with Raydium?"}
])

📈 Performance Metrics

Metric	Value
Validation Loss	1.31
BLEU Score	94%
Code Generation Accuracy	97%
Context Retention	99%
Response Latency	~2.5s (4-bit quant)

Training Convergence

📂 Dataset Analysis

Split	Count	Average Length	Quality Score
Train	27.1k	2,048 tokens	9.8/10
Test	1.402k	2,048 tokens	9.9/10

Enhanced Dataset Structure:

{
  "question": "Explain the implementation of Jito's MEV architecture",
  "answer": "Jito's MEV infrastructure consists of...",
  "context": "Complete architectural documentation...",
  "metadata": {
    "source": "jito-labs/mev-docs",
    "difficulty": "advanced",
    "category": "MEV"
  }
}

🔍 Technical Innovations

Quantization Strategy

Advanced 4-bit NF4 quantization
FP16 compute optimization
Efficient CPU offloading
SDPA attention mechanism

Performance Optimizations

Flash Attention 2.0 integration
Gradient accumulation (4 steps)
Optimized context packing
Advanced batching strategies

🌟 Interactive Demo

Experience the power of Lumo-70B-Instruct: 🚀 Try the Model

🙌 Contributing

Join us in pushing the boundaries of blockchain AI:

Submit feedback via HuggingFace
Report performance metrics
Share use cases

📜 License

Licensed under the GNU Affero General Public License v3.0 (AGPLv3).

📞 Community

Connect with the Lumo community:

Twitter: Lumo Labs
Telegram: Join our server

🤝 Acknowledgments

Special thanks to:

The Solana Foundation
Meta AI for LLaMa 3.3
The broader Solana ecosystem
Our dedicated community of developers

Downloads last month: 41

Safetensors

Model size

72B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for lumolabs-ai/Lumo-70B-Instruct

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Quantized

(148)

this model

lumolabs-ai
/

Lumo-70B-Instruct