Revision as of 23:14, 19 August 2025

install

fedora

install podman:

sudo dnf -y install podman podman-compose

install ramala:

sudo dnf -y install python3-ramalama

install via pypi:

pip install ramalama

debian/ubuntu

install podman:

apt install podman podman-compose -y

install ramala:

curl -fsSL https://ramalama.ai/install.sh | bash

archlinux

install podman:

pacman -Sy podman podman-compose --noconfirm

install yay using chaotic repo:

https://wiki.vidalinux.org/index.php?title=Howto_NVK#enable_chaotic_repo

install ramalama using yay:

yay -S ramala

usage

pull model openai gpt-oss:

ramalama pull gpt-oss:latest

run model ibm granite:

ramalama run granite

serve model:

ramalama serve gpt-oss

serve model with vulkan backend:

ramalama serve --image=quay.io/ramalama/ramalama:latest gemma3:4b

serve model with intel-gpu backend:

ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b

pull model deekseek-r1:

ramalama pull deepseek

serve model as daemon with llama-stack and other options:

ramalama serve --port 8080 --api llama-stack --name llamaserver -d deepseek

chat webui for ramalama:

podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest

show container runtime command output without executing it:

ramalama --dryrun run deepseek

stop model service:

ramalama stop deepseek-service

convert specified model to an oci formatted ai model:

ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest

create yaml from containers deployed:

podman kube generate containerid -f llamaserver.yaml

remove all containers running:

podman rm -af

deploy containers using created yaml:

podman kube play llamaserver.yaml

for running container as systemd daemon create this directory:

mkdir ~/.config/containers/systemd/

create systemd file:

cat > ~/.config/containers/systemd/llamaserver.kube << EOF
[Unit]
Description = Run Kubernetes YAML with podman kube play

[Kube]
Yaml=llamaserver.yaml
EOF

Howto ramalama: Difference between revisions

Revision as of 23:14, 19 August 2025

install

usage

references

Navigation menu

@@ Line 34: / Line 34: @@
   ramalama pull deepseek
 serve model as daemon with llama-stack and other options:
-  ramalama serve --port 8080 --api llama-stack --name deepseek-service -d deepseek
+  ramalama serve --port 8080 --api llama-stack --name llamaserver -d deepseek
 chat webui for ramalama:
   podman run -it --rm --name ramalamastack-ui -p 8501:8501 -e LLAMA_STACK_ENDPOINT=http://host.containers.internal:8080 quay.io/redhat-et/streamlit_client:latest
@@ Line 44: / Line 44: @@
   ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest
 create yaml from containers deployed:
-  podman kube generate containerid -f myapp.yaml
+  podman kube generate containerid -f llamaserver.yaml
 remove all containers running:
   podman rm -af
 deploy containers using created yaml:
-  podman kube play myapp.yaml
+  podman kube play llamaserver.yaml
+for running container as systemd daemon create this directory:
+ mkdir ~/.config/containers/systemd/
+create systemd file:
+ cat > ~/.config/containers/systemd/llamaserver.kube << EOF
+ [Unit]
+ Description = Run Kubernetes YAML with podman kube play
+ [Kube]
+ Yaml=llamaserver.yaml
+ EOF
 = references =
 * https://crfm.stanford.edu/2023/03/13/alpaca.html

Howto ramalama: Difference between revisions

Revision as of 23:14, 19 August 2025

install

usage

references

Navigation menu

Search