Instructions to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Naphula/G4-Runic-Oarfish-26B-A4B-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Naphula/G4-Runic-Oarfish-26B-A4B-v1")
model = AutoModelForMultimodalLM.from_pretrained("Naphula/G4-Runic-Oarfish-26B-A4B-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/G4-Runic-Oarfish-26B-A4B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/G4-Runic-Oarfish-26B-A4B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Naphula/G4-Runic-Oarfish-26B-A4B-v1

SGLang

How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Naphula/G4-Runic-Oarfish-26B-A4B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/G4-Runic-Oarfish-26B-A4B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Naphula/G4-Runic-Oarfish-26B-A4B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/G4-Runic-Oarfish-26B-A4B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with Docker Model Runner:
```
docker model run hf.co/Naphula/G4-Runic-Oarfish-26B-A4B-v1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A newer version of this model is available: Naphula/G4-Runic-Oarfish-26B-A4B-v1.2

🐟 G4 Runic Oarfish 26B A4B v1

This is a creative RP merge which combines Musica with the full LORA of MeroMero.

It uses a custom method moe_karcher which adapts the standard karcher method to support mixture of experts. A few changes were made to the script to support the new Gemma4 architecture. Note there were some issues setting up the merge, so the vision mode might be disabled.

Why SFT-6 was chosen over MeroMero: Using a linear merge of base_model with SFT-6 seemed like it might capture less nuance than using the raw SFT finetune which is then "nudged" in the direction of the Musica finetune by the karcher algorithm. If the merge turns out to be stupid, then the next step would be to test adding the base_model as a donor as well.

A preview GGUF was released with max_iter: 5 instead of 1000. No big differences were noticed.

Runic Oarfish has some refusals but can be jailbroken or ablated as needed.

Updates to python libraries were required in order to merge this on Windows, in addition to patching the following mergekit files:

mergekit\_data\architectures\gemma4.json
mergekit\io\tasks.py
mergekit\architecture\auto.py
mergekit\architecture\base.py
mergekit\common.py
mergekit\merge_methods\moe_karcher.py

timeout /t 3 /nobreak && mergekit-yaml C:\mergekit-main\moe_karcher.yaml C:\mergekit-main\moe_karcher --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code --lazy-unpickle --random-seed 420 --cuda

architecture: Gemma4ForConditionalGeneration
merge_method: moe_karcher
# base_model: B:\26B\google--gemma-4-26B-A4B-it
models:
  - model: B:\26B\AuriAetherwiing--G4-26B-A4B-Musica-v1
  - model: B:\26B\ApocalypseParty--G4-26B-SFT-6 # zerofata/G4-MeroMero-26B-A4B
parameters:
  max_iter: 1000
  tol: 1.0e-9
  router_strategy: karcher  # Options: karcher, average, first, random_init
  blend_experts: true  # Blend corresponding experts (expert[0] + expert[0], etc.)
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
# chat_template: auto
trust_remote_code: true
name: 🐟 G4-Runic-Oarfish-26B-A4B-v1

{
  "model_type": "gemma4",
  "architectures": [
    "Gemma4ForConditionalGeneration"
  ],
  "num_layers_config_key": "text_config.num_hidden_layers",
  "vocab_size_config_key": "text_config.vocab_size",
  "pre_weights": [
    { "name": "model.language_model.embed_tokens.weight", "is_embed": true },
    { "name": "model.embed_vision.embedding_projection.weight", "optional": true },
    { "name": "model.vision_tower.std_bias", "optional": true },
    { "name": "model.vision_tower.std_scale", "optional": true },
    { "name": "model.vision_tower.patch_embedder.input_proj.weight", "optional": true },
    { "name": "model.vision_tower.patch_embedder.position_embedding_table", "optional": true }
  ],
  "layer_templates": {
    "weights": [
      { "name": "model.language_model.layers.${layer_index}.self_attn.q_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.k_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.v_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.o_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.q_norm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.self_attn.k_norm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.mlp.gate_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.mlp.up_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.mlp.down_proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.input_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_attention_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.pre_feedforward_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm_1.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm_2.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.pre_feedforward_layernorm_2.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.router.per_expert_scale", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.router.proj.weight", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.router.scale", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.experts.gate_up_proj", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.experts.down_proj", "optional": true },
      { "name": "model.language_model.layers.${layer_index}.layer_scalar", "optional": true }
    ]
  },
  "post_weights": [
    { "name": "model.language_model.norm.weight" }
  ]
}

AI Notes

To successfully merge the Gemma 4 architecture on Windows 10, several critical updates and patches were required to handle its unique nested configuration and heterogeneous layer structure.

Here is the summary of the working changes:

1. Library & Environment Updates

Transformers: Updated to the development version (pip install git+https://github.com/huggingface/transformers.git) to add native support for the gemma4 model type and its Gemma4Config.
Pydantic: Ensured Pydantic v2 compatibility for the complex nested configurations.

2. File Patches & Logic Fixes

`mergekit\_data\architectures\gemma4.json`

Explicit Path Mapping: Defined the absolute tensor paths (e.g., model.language_model.layers...) to bypass auto-inference failures.
Heterogeneous Support: Marked all layer weights as optional: true. This allows the merge to proceed when Gemma 4 alternates between sliding_attention and full_attention (where certain tensors like v_proj are missing in specific layers).
Config Keys: Pointed num_layers_config_key and vocab_size_config_key to the nested text_config block.

`mergekit\common.py`

Nested Config Access: Patched get_config_value to automatically "reach inside" the text_config sub-block if a requested key (like vocab_size) is missing from the root configuration.
Function Restoration: Restored set_config_value to ensure the output model's configuration can be correctly written to disk.

`mergekit\architecture\base.py`

Pydantic Rebuild: Added explicit calls to ConfiguredModuleArchitecture.model_rebuild() and ConfiguredModelArchitecture.model_rebuild() at the bottom of the file. This forces Pydantic to resolve type hints for torch.Tensor and PretrainedConfig that otherwise cause validation errors on Windows.

`mergekit\architecture\auto.py`

Alias Support: Updated the substitution logic to handle aliases as lists/tuples, allowing the architecture to try multiple possible path depths for the same weight.
Optional Enforcement: (Optional/Experimental) Forced the optional flag on inferred weights to prevent RuntimeError during the planning phase of heterogeneous models.

`mergekit\io\tasks.py`

Load Guard: Modified LoadTensor.execute to return None instead of raising a RuntimeError if a tensor is missing but marked as optional.
Save Guard: Updated SaveTensor.execute to safely skip the write process if it receives a None tensor, preventing the merge from crashing on alternating layers.

`mergekit\merge_methods\moe_karcher.py`

Expert Regex: Updated _is_expert_weight to recognize the specific naming convention of Gemma 4 experts (.experts.).
Execution Guard: Added a check at the start of the execute method to return None if any input tensors are missing, ensuring the Karcher Mean math doesn't run on empty data.

Result

These changes allow mergekit to handle the 26B A4B (Active 4B) architecture of Gemma 4, which features 128 experts and alternating attention types, resulting in a successful ~48GB FP16/BF16 merge on Windows 10.