Howto ramalama: Difference between revisions
Jump to navigation
Jump to search
Mandulete1 (talk | contribs) No edit summary |
Mandulete1 (talk | contribs) No edit summary |
||
Line 34: | Line 34: | ||
ramalama pull deepseek | ramalama pull deepseek | ||
serve model as daemon with llama-stack and other options: | serve model as daemon with llama-stack and other options: | ||
ramalama serve --port 8080 --api llama-stack --name | ramalama serve --port 8080 --api llama-stack --name llamaserver -d deepseek | ||
chat webui for ramalama: | chat webui for ramalama: | ||
podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest | podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest | ||
Line 44: | Line 44: | ||
ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest | ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest | ||
create yaml from containers deployed: | create yaml from containers deployed: | ||
podman kube generate containerid -f | podman kube generate containerid -f llamaserver.yaml | ||
remove all containers running: | remove all containers running: | ||
podman rm -af | podman rm -af | ||
deploy containers using created yaml: | deploy containers using created yaml: | ||
podman kube play | podman kube play llamaserver.yaml | ||
for running container as systemd daemon create this directory: | |||
mkdir ~/.config/containers/systemd/ | |||
create systemd file: | |||
cat > ~/.config/containers/systemd/llamaserver.kube << EOF | |||
[Unit] | |||
Description = Run Kubernetes YAML with podman kube play | |||
[Kube] | |||
Yaml=llamaserver.yaml | |||
EOF | |||
= references = | = references = | ||
* https://crfm.stanford.edu/2023/03/13/alpaca.html | * https://crfm.stanford.edu/2023/03/13/alpaca.html |
Revision as of 23:14, 19 August 2025
install
- fedora
install podman:
sudo dnf -y install podman podman-compose
install ramala:
sudo dnf -y install python3-ramalama
install via pypi:
pip install ramalama
- debian/ubuntu
install podman:
apt install podman podman-compose -y
install ramala:
curl -fsSL https://ramalama.ai/install.sh | bash
- archlinux
install podman:
pacman -Sy podman podman-compose --noconfirm
install yay using chaotic repo:
https://wiki.vidalinux.org/index.php?title=Howto_NVK#enable_chaotic_repo
install ramalama using yay:
yay -S ramala
usage
pull model openai gpt-oss:
ramalama pull gpt-oss:latest
run model ibm granite:
ramalama run granite
serve model:
ramalama serve gpt-oss
serve model with vulkan backend:
ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b
serve model with intel-gpu backend:
ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b
pull model deekseek-r1:
ramalama pull deepseek
serve model as daemon with llama-stack and other options:
ramalama serve --port 8080 --api llama-stack --name llamaserver -d deepseek
chat webui for ramalama:
podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest
show container runtime command output without executing it:
ramalama --dryrun run deepseek
stop model service:
ramalama stop deepseek-service
convert specified model to an oci formatted ai model:
ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest
create yaml from containers deployed:
podman kube generate containerid -f llamaserver.yaml
remove all containers running:
podman rm -af
deploy containers using created yaml:
podman kube play llamaserver.yaml
for running container as systemd daemon create this directory:
mkdir ~/.config/containers/systemd/
create systemd file:
cat > ~/.config/containers/systemd/llamaserver.kube << EOF [Unit] Description = Run Kubernetes YAML with podman kube play [Kube] Yaml=llamaserver.yaml EOF