Rootless podman on Oracle Enterprise Linux with local and kubernetes cluster DNS

September 29, 2022 | 18 minute read
Text Size 100%:


Introduction

 

Rootless Podman

 

 


This post is a walkthrough of how to run rootless podman in a VirtualBox VM running Oracle Enterprise Linux, using port 80 and allowing both external and internal DNS utilizing the Container Network Interface (CNI) on a local Kubernetes cluster. Running containers without the root user owning any process aims to provide greater flexibility and security.

 

We will be setting up multiple simple http servers that respond based on what hostname you connect to in your browser (or curl), and that have the ability to talk to each other inside the kubernetes network.

 

Depending on the exact version of technologies used, the steps to accomplish this setup may vary. There are various quarks and bugs in even minor dot releases of some applications.


There are two main areas to pay attention to while implementing this setup:

1. Bind your cluster to an actual IP address. Not the loopback (127.0.0.1 or ::1), and not to 0.0.0.0.

2. Make sure control cgroups v2 (cgroups v2) is configured properly.

 

This setup does not require disabling or making changes to SELinux or firewalld.

 

A note on disk space - running in rootless mode with cgroups saves data (containers, overlays, configs etc...) to your home directory.

With docker and rootful setups, most data goes to the root (/) mount.

Make sure you have sufficient free space in your /home/<user> area. 

 

Technologies used


* Oracle Linux Server release 8.6
* kind v0.11.1
* kubectl Client Version: v1.25.2
* podman version 4.1.1

 

Background


Podman is a container engine similar to Docker. A main difference between them is the overall architecture of how they run. Docker uses a client-server architecture with daemons running on each host. Podman uses a daemonless single-process architecture.

 

When running rootless, the network stack functions differently. With a rootful setup, it is possible to utilize the host network. For example, containers can see the /etc/hosts file and utilize DNS resolution on the host operating system. This rootless setup does not access the host network.

 

Podman with CNI networking utilizes plugins, typically located in /usr/libexec/cni.
One plugin of special interest for this setup is rootlessport. Rootlessport provides reexec for RootlessKit-based port forwarder, and is the process that will bind to a port on the host OS exposing an entry point into the local cluster.

 

Rootless podman also requires linux cgroup v2, which is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.

 

Setup steps

 

OS Setup


These steps assume a fresh install of Oracle Enterprise Linux 8.6 in a VirtualBox VM.
When you install linux, give the setup a hostname, not just localhost.localdomain. For my setup, I set my host to podman.testdomain.
It is assumed you have created a user other than root, and this user has sudo privileges.

 

[devuser@podman /]$ cat /etc/hostname
podman.testdomain

 

I also created the user account devuser. This user has ID 1000, which will be relevant later.
One way to check the ID of your user is:

[devuser@podman /]$ id -u
1000

 

Configure /etc/hosts


Now we associate our hostname to the primary network interface of our VM.
There are many ways to get your nic details. Two common commands are "ifconfig -a" and "ip addr".
You can also try this command to get just the IP of your primary nic:

 

[devuser@podman /]$ ip route get 1.1.1.1 | grep -oP 'src \K\S+'
10.0.2.15

 

Edit your /etc/hosts file to associate the hostname you created, both the FQDN podman.testdomain, and hostname podman, to the IP of the primary nic, 10.0.2.15. We are also associating localhost to this IP.


You may see localhost already associated to 127.0.0.1 and ::1 in your hosts file. If that is there, we are going to remove it. You can leave localhost.localdomain, you just want the single localhost association gone. The reason for this is part of the podman plugin process tries to bind to specifically to localhost.

Leaving the default entry in the hosts file would cause a partial bind to 127.0.0.1, which will not work.

 

Edit your hosts file, "sudo vi /etc/hosts", to look similar to this. Change the 10.0.2.15 IP to match whatever your primary nic IP is. If you don't have ipv6 enabled, you may not see the ::1 entry. Just skip that line if it is not applicable to you.

 

127.0.0.1   localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost.localdomain localhost6 localhost6.localdomain6
10.0.2.15   localhost podman podman.testdomain

 

 

Enable cgroup v2


By default, cgroup v1 is enabled. We need to add a kernel flag to get v2.


Execute the following to add the flag, and reboot. You need to reboot before you continue. Adding systemd.unified_cgroup_hierarchy=1 enables cgroup v2, and we apply it to all kernel entries.

sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
sudo reboot

 

After reboot, verify cgroupv2 is enabled by executing the following:

[devuser@podman /]$ mount -l | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)


If you don't see cgroup2 returned, stop. Diagnose the problem. The rest of the setup will not work.

 

Required cgroup v2 configs for podman


Add iptables entries


Create /etc/modules-load.d/iptables.conf by executing the following:


cat <<EOF | sudo tee /etc/modules-load.d/iptables.conf
ip6_tables
ip6table_nat
ip_tables
iptable_nat
EOF
 

 

Add delegate


Execute the following:


# Create /etc/systemd/system/user@.service.d 
sudo mkdir -p /etc/systemd/system/user@.service.d
# create delegate.conf with the following:
cat <<EOF | sudo tee /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=yes
DelegateControllers=cpu cpuset io memory pids
EOF
 

 

Add user slice config


This is where your user ID is important. The commands used below get your user id for you - the $(id -u) call. If you hard code a user ID, make sure it is your user ID, and you use the same ID in all places you see it referenced. The cgroup settings are bound to your user by the ID.

 

Execute the following:



cat <<EOF | sudo tee /etc/systemd/system/user-$(id -u).slice
[Unit]
Description=Fix cgroup controllers
After=user.slice
Requires=user.slice
Before=systemd-logind.service
 
[Install]
WantedBy=multi-user.target
EOF
 

 

Enable user slice


Execute the following:

 

sudo systemctl enable user-$(id -u).slice
# output should be Created symlink /etc/systemd/system/multi-user.target.wants/user-1000.slice -> /etc/systemd/system/user-1000.slice.

 

Lower unprivileged ports


By default, Oracle Linux considers any port below 1024 to be privileged, meaning only a root owned process can bind to it. We are lowering this to 80 so rootlessport can bind to it later.


Execute the following:

cat <<EOF | sudo tee /etc/sysctl.conf
net.ipv4.ip_unprivileged_port_start=80
EOF

 

Reboot


Reboot before going any further.


Check what cgroup controllers are enabled


It is possible you will not see the correct entries at this step.


There is a bug with the specific combination of technology versions used here I will work around in the next steps.


To check your cgroup settings, execute:

# check if things are set correctly. Look carefully. You may be missing the cpu value
cat /sys/fs/cgroup/cgroup.subtree_control
# should return cpuset cpu io memory pids
cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
# should return cpuset cpu io memory pids


You want to see cpuset cpu io memory pids. What you may see is cpuset io memory pids. The cpu entry may be missing.

 

Workaround


If you do not see the correct results in the previous step, try the following.


There is a race condition caused by pulseaudio and rtkit that prevents cgroup from working correctly. We are going to disable them as the workaround.


Execute:

sudo systemctl stop rtkit-daemon.service
sudo systemctl disable rtkit-daemon.service
systemctl --user stop pulseaudio.socket
systemctl --user stop pulseaudio.service
# reload deamon
sudo systemctl daemon-reload

 

Make sure you reload the daemon.


Any time you reboot your VM, you will need to execute this step.

 

Check cgroup settings again


To check your cgroup settings, execute:

# check if things are set correctly. Look carefully. You may be missing the cpu value
cat /sys/fs/cgroup/cgroup.subtree_control
# should return cpuset cpu io memory pids
cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
# should return cpuset cpu io memory pids

 

If you do not see cpuset cpu io memory pids, stop and diagnose the problem. The rest of the setup will not work.

 

Podman setup

OS packages


Install the required packages for podman


Execute:

sudo dnf install containers-common podman podman-plugins docker


Note that the docker package is not actually docker. It creates a link between the docker and podman commands. You are still executing podman if you type docker as your command, this is there for compatibility.

 

Install Kind


Execute the following to install kind:

 

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/bin
which kind
# Should return /usr/bin/kind
kind version
# Should return kind v0.11.1 go1.16.4 linux/amd64

 

Add kubernetes repo


Add the kubernetes repo for dnf/yum by executing the following:

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

 

Install kubectl


Execute:

sudo dnf install kubectl

 

Create local cluster with kind

 

First create your kind config file. You will need to edit the IP address in this command to match that of your primary nic identified earlier.


Change 10.0.2.15 to match whatever your IP is


Execute the following to create a file in whatever directory you are currently in:

cat <<EOF > kind-config-with-host.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  apiServerAddress: "10.0.2.15"
name: my-local-kind
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
        listenAddress: "10.0.2.15"
      - containerPort: 443
        hostPort: 443
        protocol: TCP
        listenAddress: "10.0.2.15"
EOF

 

Note this is yaml. The spacing matters. Do not use tabs, only spaces. If the formatting is off, the config will be rejected.

 

This command may take a minute to complete. You should see output similar to the following:

[devuser@podman blog]$ kind create cluster --config=kind-config-with-host.yaml
enabling experimental podman provider
Cgroup controller detection is not implemented for Podman. If you see cgroup-related errors, you might need to set systemd property "Delegate=yes", see https://kind.sigs.k8s.io/docs/user/rootless/
Creating cluster "my-local-kind" ...
 ? Ensuring node image (kindest/node:v1.21.1) ?? 
 ? Preparing nodes ??  
 ? Writing configuration ?? 
 ? Starting control-plane ??? 
 ? Installing CNI ?? 
 ? Installing StorageClass ?? 
Set kubectl context to "kind-my-local-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-my-local-kind
Not sure what to do next? ??  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

 

If you check your podman networks, you should now see a kind bridge.

[devuser@podman blog]$ podman network ls
NETWORK ID    NAME        DRIVER
0b27c158feb3  kind        bridge
2f259bab93aa  podman      bridge

 

Check port bindings


If things have gone correctly, you should now have rootlessport processes bound to ports 80, 433 and a random high port.


Execute the following:

[devuser@podman blog]$ sudo netstat -plnt | grep rootlessport
tcp        0      0 10.0.2.15:80            0.0.0.0:*               LISTEN      6704/rootlessport   
tcp        0      0 10.0.2.15:33977         0.0.0.0:*               LISTEN      6704/rootlessport   
tcp        0      0 10.0.2.15:443           0.0.0.0:*               LISTEN      6704/rootlessport

 

You should see your IP address being used in the binding. NOT 127.0.0.1, or 0.0.0.0, or ::1.

 

Deploy applications


Reminder for all of the following steps - this is yaml and formatting/spacing matters.


If you get errors, check if you have tabs instead of space, or if the number of spaces is off in any of your yaml files.

 

Deploy Kubernetes ingress controller


Execute the following to deploy the Kubernetes ingress controller.

 

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl -n ingress-nginx rollout status deployment/ingress-nginx-controller

 

Deploy multiple httpbin apps and ingress


The following can all be combined into one yaml and one execution step. I am breaking them out to hopefully make following what is happening easier.

 

Create httpbin server A yaml


Execute the following to create the yaml for your httpbin A server. This will create httpbinA.yaml in whatever directory you are currently in.

cat <<EOF > httpbinA.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: httpbin
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin-a
  namespace: httpbin
  labels:
    app: httpbin-a
spec:
  ports:
  - name: http
    port: 80
    targetPort: 80
  selector:
    app.kubernetes.io/name: httpbin-a
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-a
  namespace: httpbin
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: httpbin-a
  template:
    metadata:
      labels:
        app.kubernetes.io/name: httpbin-a
    spec:
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80
EOF


This will deploy the httpbin image into namespace httpbin, with an application name of "httpbin-a". This will respond to port 80 on both the container, and the kubernetes network.

 

Deploy httpbin server A


Execute the following to deploy what we just created.

kubectl apply -f httpbinA.yaml

 

Create an ingress for httpbin server A


Execute the following to create the yaml for your httpbin A server ingress. This will create httpbin-ingressA.yaml in whatever directory you are currently in.

cat <<EOF > httpbin-ingressA.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
  labels:
    app.kubernetes.io/name: httpbin-a
  name: httpbin-a
  namespace: httpbin
spec:
  rules:
  - host: httpbina.podmanlocal
    http:
      paths:
      - backend:
          service:
            name: httpbin-a
            port:
              number: 80
        path: /
        pathType: Prefix
EOF

 

This will create an ingress bound to port 80, for application httpbin-a. The host line also makes this require any request coming in that you want routed to this app to be using httpbina-podmanlocal.

We will be setting this mapping up in later steps.

 

Deploy httpbin server A ingress


Execute the following:

kubectl apply -f httpbin-ingressA.yaml

 

Create httpbin server B yaml


Execute the following to create the yaml for your httpbin B server. This will create httpbinB.yaml in whatever directory you are currently in.

cat <<EOF > httpbinB.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: httpbin
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin-b
  namespace: httpbin
  labels:
    app: httpbin-b
spec:
  ports:
  - name: http
    port: 80
    targetPort: 80
  selector:
    app.kubernetes.io/name: httpbin-b
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-b
  namespace: httpbin
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: httpbin-b
  template:
    metadata:
      labels:
        app.kubernetes.io/name: httpbin-b
    spec:
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80
EOF


This will deploy the httpbin image into namespace httpbin, with an application name of "httpbin-b". This will respond to port 80 on both the container, and the kubernetes network. Note the application name is different. We will have both httpbin-a and httpbin-b running in separate deployments.


Deploy httpbin server B


Execute the following to deploy what we just created.

kubectl apply -f httpbinB.yaml

 

Create an ingress for httpbin server B


Execute the following to create the yaml for your httpbin B server ingress. This will create httpbin-ingressB.yaml in whatever directory you are currently in.

 

cat <<EOF > httpbin-ingressB.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
  labels:
    app.kubernetes.io/name: httpbin-b
  name: httpbin-b
  namespace: httpbin
spec:
  rules:
  - host: httpbinb.podmanlocal
    http:
      paths:
      - backend:
          service:
            name: httpbin-b
            port:
              number: 80
        path: /
        pathType: Prefix
EOF

 

This will create an ingress bound to port 80, for application httpbin-b. The host line also makes this require any request coming in that you want routed to this app to be using httpbinb-podmanlocal.

We will be setting this mapping up in later steps.


Note the different application name and host. We have app httpbin-b and host httpbinb.podmanlocal.

 

Deploy httpbin server B ingress


Execute the following:

kubectl apply -f httpbin-ingressB.yaml

 

Run some tests to help understand what is happening so far


The network for your local host, and the kube cluster network are independent.

The rootlessport process is you entry point into the cluster.

We deployed apps to respond to requests to httpbina.podmanlocal and httpbinb.podmanlocal.

Your OS has no idea what those are at this point.

Execute a test with curl to httpbina.podmanlocal. This should fail.

 

[devuser@podman ~]$ curl httpbina.podmanlocal
curl: (6) Could not resolve host: httpbina.podmanlocal

 

Execute a test to port 80 of just your IP address.


This should respond, but with a 404 not found error. This is because the ingress is setup to map requests to a specific host to the appropriate app. Requests to httpbina.podmanlocal get routed to the httpbin-a app deployment.

Since you are hitting the IP directly, there is no host in your request headers. The ingress will not route this to either app.

 

[devuser@podman ~]$ curl 10.0.2.15
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

 

Add local OS host entries for our ingress hosts.


Execute sudo vi /etc/hosts and add mappings for httpbina.podmanlocal and httpbinb.podmanlocal.

It should look as follows:

 

127.0.0.1   localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost.localdomain localhost6 localhost6.localdomain6
10.0.2.15   localhost podman podman.testdomain httpbina.podmanlocal httpbinb.podmanlocal

 

Now execute the same curl test to httpbina.podmanlocal.
You should receive a lengthy response.

I will make a call to one of the httpbin services that returns less data to demonstrate.

 

[devuser@podman ~]$ curl httpbina.podmanlocal/ip
{
  "origin": "10.89.0.2"
}

 

By adding the entries to /etc/hosts, your local OS now knows to send request to that address to your local IP address. The ingress then sees you asked for httpbina.podmanlocal and routes the request to the httpbin-a app.

 

Test DNS between deployments


At this point, your OS knows how to get to the kube cluster. However, different deployments inside the cluster do not know how to talk to each other. Different networks, different DNS resolution.

We will test curl calls from one deployment to another to demonstrate.

You will need to install curl in the deployments, it is not there by default.


Execute the following to get curl on both deployments:

 

# update apt
kubectl -n httpbin exec deploy/httpbin-a -- apt-get update
kubectl -n httpbin exec deploy/httpbin-b -- apt-get update
#install curl
kubectl -n httpbin exec deploy/httpbin-a -- apt-get -y install curl
kubectl -n httpbin exec deploy/httpbin-b -- apt-get -y install curl

 

Now you should be able to execute a curl test.
The following calls curl inside the httpbin-b app, and tries to send a request to the httpbin-a app. This should fail.

 

[devuser@podman ~]$ kubectl -n httpbin exec deploy/httpbin-b -- curl http://httpbina.podmanlocal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: httpbina.podmanlocal
command terminated with exit code 6

 

There is no DNS setup mapping these apps by name to an IP.

 

Edit the CoreDNS service


CoreDNS is the DNS service running inside your kube cluster.
We are going to add DNS mappings for the two deployments.

First, you need the IP address of your cluster control plane.
Execute the following:

 

[devuser@podman ~]$ kubectl get nodes -o jsonpath="{.items[*].status.addresses[?(@.type=='InternalIP')].address}"
10.89.0.2


That tells me my control plane IP is 10.89.0.2.

Our internal app hosts need to map to this address.


Edit the CoreDNS service with the following command:

 

kubectl edit configmap coredns -n kube-system


This will open the configmap in vi. Add mappings for httpbina.podmanlocal and httpbinb.podmanlocal to the control plane IP.

This is yaml again, spaces and formatting matter. Do not use tabs. If you try to save and exit, and are immediately returned to the edit window, something with your formatting is off. It is being rejected.

Add the following:

 

hosts custom.hosts podmanlocal {
   10.89.0.2 httpbina.podmanlocal
   10.89.0.2 httpbinb.podmanlocal
   fallthrough
}


This is what your configmap should look like (with your IP if it is different)

I have truncate some lines above and below the relevant section.

 

data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        hosts custom.hosts podmanlocal {
           10.89.0.2 httpbina.podmanlocal
           10.89.0.2 httpbinb.podmanlocal
           fallthrough
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }

 

After you save these changes, they take time to get picked up.
You can wait for the changes to get picked up, or force the change faster.

If you want to wait, watch the CoreDNS logs until you see the changes applied.

 

kubectl logs --namespace=kube-system -l k8s-app=kube-dns

 

If you want to force the changes, delete the coredns pods. The will regenerate with your changes applied immediately.

 

# find the pods
[devuser@podman ~]$ kubectl get pods -A | grep core
kube-system          coredns-558bd4d5db-jf8j4                              1/1     Running     0          29m
kube-system          coredns-558bd4d5db-vcq4g                              1/1     Running     0          29m
# delete the pods
kubectl delete pod -n kube-system coredns-558bd4d5db-jf8j4  coredns-558bd4d5db-vcq4g

 

Test deployment communication again


Run the curl test again to execute curl inside deployment httpbin-b with a call to httpbin-a. I am using the /ip call again for reduced output.

 

[devuser@podman ~]$ kubectl -n httpbin exec deploy/httpbin-b -- curl http://httpbina.podmanlocal/ip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{
  "origin": "10.244.0.1"
}
100    29  100    29    0     0    376      0 --:--:-- --:--:-- --:--:--   376

 

The calls work now. You deployments can send requests to each other, using the hostnames we told the ingress to route and respond to.


Helpful reset process


If you need to quickly reset your entire podman setup, the following will accomplish that.
You will be back to the state you were at the start of the podman install, with the exception of edits to your hosts file. Those will still be there.
All the cgroup configs are also still present.
If you start running out of space in your /home/user area, this will also clean that up. The prune commands get rid of containers and images.

Execute:

 

# reset environment
podman kill $(podman ps -a -q)
podman rm $(podman ps -a -q)
podman volume prune -f
podman system prune -af
podman network rm kind
sudo dnf remove podman -y

 

Now you can start again from the podman install process.

 

sudo dnf install containers-common podman podman-plugins docker -y

 

This is useful if you want to destroy your kube cluster and start again.
 

 

 

 

 

 

 

 

 

 

 

 

 

Michael Shanley


Previous Post

Provide Dynamic Tips for Improving User Interaction in Oracle Digital Assistant with Entity Event Handler

Siming Mu | 8 min read

Next Post


Asynchronous Webservices in Fusion ERP and SCM

Abhay Kumar | 7 min read