Instructions to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Naphula/G4-Runic-Oarfish-26B-A4B-v1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Naphula/G4-Runic-Oarfish-26B-A4B-v1") model = AutoModelForMultimodalLM.from_pretrained("Naphula/G4-Runic-Oarfish-26B-A4B-v1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Naphula/G4-Runic-Oarfish-26B-A4B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/G4-Runic-Oarfish-26B-A4B-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Naphula/G4-Runic-Oarfish-26B-A4B-v1
- SGLang
How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Naphula/G4-Runic-Oarfish-26B-A4B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/G4-Runic-Oarfish-26B-A4B-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Naphula/G4-Runic-Oarfish-26B-A4B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/G4-Runic-Oarfish-26B-A4B-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Naphula/G4-Runic-Oarfish-26B-A4B-v1 with Docker Model Runner:
docker model run hf.co/Naphula/G4-Runic-Oarfish-26B-A4B-v1
🐟 G4 Runic Oarfish 26B A4B v1
This is a creative RP merge which combines Musica with the full LORA of MeroMero.
It uses a custom method moe_karcher which adapts the standard karcher method to support mixture of experts. A few changes were made to the script to support the new Gemma4 architecture. Note there were some issues setting up the merge, so the vision mode might be disabled.
Why SFT-6 was chosen over MeroMero: Using a linear merge of base_model with SFT-6 seemed like it might capture less nuance than using the raw SFT finetune which is then "nudged" in the direction of the Musica finetune by the karcher algorithm. If the merge turns out to be stupid, then the next step would be to test adding the base_model as a donor as well.
A preview GGUF was released with max_iter: 5 instead of 1000. No big differences were noticed.
Runic Oarfish has some refusals but can be jailbroken or ablated as needed.
Updates to python libraries were required in order to merge this on Windows, in addition to patching the following mergekit files:
mergekit\_data\architectures\gemma4.jsonmergekit\io\tasks.pymergekit\architecture\auto.pymergekit\architecture\base.pymergekit\common.pymergekit\merge_methods\moe_karcher.py
timeout /t 3 /nobreak && mergekit-yaml C:\mergekit-main\moe_karcher.yaml C:\mergekit-main\moe_karcher --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code --lazy-unpickle --random-seed 420 --cuda
architecture: Gemma4ForConditionalGeneration
merge_method: moe_karcher
# base_model: B:\26B\google--gemma-4-26B-A4B-it
models:
- model: B:\26B\AuriAetherwiing--G4-26B-A4B-Musica-v1
- model: B:\26B\ApocalypseParty--G4-26B-SFT-6 # zerofata/G4-MeroMero-26B-A4B
parameters:
max_iter: 1000
tol: 1.0e-9
router_strategy: karcher # Options: karcher, average, first, random_init
blend_experts: true # Blend corresponding experts (expert[0] + expert[0], etc.)
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
# chat_template: auto
trust_remote_code: true
name: 🐟 G4-Runic-Oarfish-26B-A4B-v1
{
"model_type": "gemma4",
"architectures": [
"Gemma4ForConditionalGeneration"
],
"num_layers_config_key": "text_config.num_hidden_layers",
"vocab_size_config_key": "text_config.vocab_size",
"pre_weights": [
{ "name": "model.language_model.embed_tokens.weight", "is_embed": true },
{ "name": "model.embed_vision.embedding_projection.weight", "optional": true },
{ "name": "model.vision_tower.std_bias", "optional": true },
{ "name": "model.vision_tower.std_scale", "optional": true },
{ "name": "model.vision_tower.patch_embedder.input_proj.weight", "optional": true },
{ "name": "model.vision_tower.patch_embedder.position_embedding_table", "optional": true }
],
"layer_templates": {
"weights": [
{ "name": "model.language_model.layers.${layer_index}.self_attn.q_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.self_attn.k_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.self_attn.v_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.self_attn.o_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.self_attn.q_norm.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.self_attn.k_norm.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.mlp.gate_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.mlp.up_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.mlp.down_proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.input_layernorm.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.post_attention_layernorm.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.pre_feedforward_layernorm.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm_1.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.post_feedforward_layernorm_2.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.pre_feedforward_layernorm_2.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.router.per_expert_scale", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.router.proj.weight", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.router.scale", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.experts.gate_up_proj", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.experts.down_proj", "optional": true },
{ "name": "model.language_model.layers.${layer_index}.layer_scalar", "optional": true }
]
},
"post_weights": [
{ "name": "model.language_model.norm.weight" }
]
}
AI Notes
To successfully merge the Gemma 4 architecture on Windows 10, several critical updates and patches were required to handle its unique nested configuration and heterogeneous layer structure.
Here is the summary of the working changes:
1. Library & Environment Updates
- Transformers: Updated to the development version (
pip install git+https://github.com/huggingface/transformers.git) to add native support for thegemma4model type and itsGemma4Config. - Pydantic: Ensured Pydantic v2 compatibility for the complex nested configurations.
2. File Patches & Logic Fixes
mergekit\_data\architectures\gemma4.json
- Explicit Path Mapping: Defined the absolute tensor paths (e.g.,
model.language_model.layers...) to bypass auto-inference failures. - Heterogeneous Support: Marked all layer weights as
optional: true. This allows the merge to proceed when Gemma 4 alternates betweensliding_attentionandfull_attention(where certain tensors likev_projare missing in specific layers). - Config Keys: Pointed
num_layers_config_keyandvocab_size_config_keyto the nestedtext_configblock.
mergekit\common.py
- Nested Config Access: Patched
get_config_valueto automatically "reach inside" thetext_configsub-block if a requested key (likevocab_size) is missing from the root configuration. - Function Restoration: Restored
set_config_valueto ensure the output model's configuration can be correctly written to disk.
mergekit\architecture\base.py
- Pydantic Rebuild: Added explicit calls to
ConfiguredModuleArchitecture.model_rebuild()andConfiguredModelArchitecture.model_rebuild()at the bottom of the file. This forces Pydantic to resolve type hints fortorch.TensorandPretrainedConfigthat otherwise cause validation errors on Windows.
mergekit\architecture\auto.py
- Alias Support: Updated the substitution logic to handle
aliasesas lists/tuples, allowing the architecture to try multiple possible path depths for the same weight. - Optional Enforcement: (Optional/Experimental) Forced the
optionalflag on inferred weights to preventRuntimeErrorduring the planning phase of heterogeneous models.
mergekit\io\tasks.py
- Load Guard: Modified
LoadTensor.executeto returnNoneinstead of raising aRuntimeErrorif a tensor is missing but marked asoptional. - Save Guard: Updated
SaveTensor.executeto safely skip the write process if it receives aNonetensor, preventing the merge from crashing on alternating layers.
mergekit\merge_methods\moe_karcher.py
- Expert Regex: Updated
_is_expert_weightto recognize the specific naming convention of Gemma 4 experts (.experts.). - Execution Guard: Added a check at the start of the
executemethod to returnNoneif any input tensors are missing, ensuring the Karcher Mean math doesn't run on empty data.
Result
These changes allow mergekit to handle the 26B A4B (Active 4B) architecture of Gemma 4, which features 128 experts and alternating attention types, resulting in a successful ~48GB FP16/BF16 merge on Windows 10.
- Downloads last month
- 11
