Revision as of 14:31, 13 August 2025

install

install on fedora:

sudo dnf install python3-ramalama

install via pypi:

pip install ramalama

install script linux/mac:

curl -fsSL https://ramalama.ai/install.sh | bash

usage

set variables:

RAMALAMA_CONTAINER_ENGINE=docker
CUDA_VISIBLE_DEVICES="0"

run model ibm granite:

ramalama run granite

pull model openai gpt-oss:

ramalama pull gpt-oss:latest

serve model:

ramalama serve gpt-oss

serve model with vulkan backend:

ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b

serve model with intel-gpu backend:

ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b

pull model deekseek-r1:

ramalama pull deepseek

serve model as daemon with llama-stack and other options:

ramalama serve --port 8080 --api llama-stack --name deepseek-service -d deepseek

chat webui for ramalama:

podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest

show container runtime command output without executing it:

ramalama --dryrun run deepseek

stop model service:

ramalama stop deepseek-service

convert specified model to an oci formatted ai model:

ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest

Howto ramalama: Difference between revisions

Revision as of 14:31, 13 August 2025

install

usage

references

Navigation menu

@@ Line 15: / Line 15: @@
 pull model openai gpt-oss:
   ramalama pull gpt-oss:latest
-run model as service:
+serve model:
   ramalama serve gpt-oss
+serve model with vulkan backend:
+ ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b
+serve model with intel-gpu backend:
+ ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b
 pull model deekseek-r1:
   ramalama pull deepseek
-run model as service with llama-stack and other options:
+serve model as daemon with llama-stack and other options:
   ramalama serve --port 8080 --api llama-stack --name deepseek-service -d deepseek
 chat webui for ramalama:
   podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest
-show container runtime command without executing it:
+show container runtime command output without executing it:
   ramalama --dryrun run deepseek
 stop model service:
@@ Line 29: / Line 33: @@
 convert specified model to an oci formatted ai model:
   ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest
-run with vulkan backend:
- ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b
-run with intel-gpu backend:
- ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b
 = references =

Howto ramalama: Difference between revisions

Revision as of 14:31, 13 August 2025

install

usage

references

Navigation menu

Search