Howto ramalama: Difference between revisions
Jump to navigation
Jump to search
Mandulete1 (talk | contribs) |
Mandulete1 (talk | contribs) |
||
Line 15: | Line 15: | ||
pull model openai gpt-oss: | pull model openai gpt-oss: | ||
ramalama pull gpt-oss:latest | ramalama pull gpt-oss:latest | ||
serve model: | |||
ramalama serve gpt-oss | ramalama serve gpt-oss | ||
serve model with vulkan backend: | |||
ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b | |||
serve model with intel-gpu backend: | |||
ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b | |||
pull model deekseek-r1: | pull model deekseek-r1: | ||
ramalama pull deepseek | ramalama pull deepseek | ||
serve model as daemon with llama-stack and other options: | |||
ramalama serve --port 8080 --api llama-stack --name deepseek-service -d deepseek | ramalama serve --port 8080 --api llama-stack --name deepseek-service -d deepseek | ||
chat webui for ramalama: | chat webui for ramalama: | ||
podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest | podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest | ||
show container runtime command without executing it: | show container runtime command output without executing it: | ||
ramalama --dryrun run deepseek | ramalama --dryrun run deepseek | ||
stop model service: | stop model service: | ||
Line 29: | Line 33: | ||
convert specified model to an oci formatted ai model: | convert specified model to an oci formatted ai model: | ||
ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest | ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest | ||
= references = | = references = |
Revision as of 14:31, 13 August 2025
install
install on fedora:
sudo dnf install python3-ramalama
install via pypi:
pip install ramalama
install script linux/mac:
curl -fsSL https://ramalama.ai/install.sh | bash
usage
set variables:
RAMALAMA_CONTAINER_ENGINE=docker CUDA_VISIBLE_DEVICES="0"
run model ibm granite:
ramalama run granite
pull model openai gpt-oss:
ramalama pull gpt-oss:latest
serve model:
ramalama serve gpt-oss
serve model with vulkan backend:
ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b
serve model with intel-gpu backend:
ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b
pull model deekseek-r1:
ramalama pull deepseek
serve model as daemon with llama-stack and other options:
ramalama serve --port 8080 --api llama-stack --name deepseek-service -d deepseek
chat webui for ramalama:
podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest
show container runtime command output without executing it:
ramalama --dryrun run deepseek
stop model service:
ramalama stop deepseek-service
convert specified model to an oci formatted ai model:
ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest