Solution - Troubleshooting 'Init:Error' Status Error Message After Deploying Ondat

Issue

When attempting to deploy Ondat into an OpenShift or Kubernetes cluster, you notice that the Ondat daemonset set pods are stuck in a Init:Err state. Below is an example of the error message being reported under the STATUS column.

# Get the status of the pods in the "storageos" namespace.
kubectl get pods --namespace storageos

NAME                                                 READY   STATUS    RESTARTS   AGE
# Truncated output...
storageos-node-8fhf6                                 0/3     Init:Err  0          6s
storageos-node-8z77g                                 0/3     Init:Err  0          6s
storageos-node-pzvp7                                 0/3     Init:Err  0          6s
storageos-node-qbjbr                                 0/3     Init:Err  0          6s
storageos-node-vkj92                                 0/3     Init:Err  0          6s
# Truncated output...

Root Cause

The root cause of this issue is due to missing Linux-IO (LIO) related kernel modules on worker nodes that are required for Ondat to successfully start up and run.

  • The Ondat daemonset will attempt to load the required kernel modules onto the worker nodes. If Ondat is unsuccessful in loading the kernel modules, an Init:Err error will be returned and stop Ondat from starting up without the required kernel modules.

Resolution

  1. Check and ensure that the logs of the init container report any kernel modules that Ondat tried to load:

    # Chec the logs of the "init" container to list the missing kernel modules required for Ondat to run.
    kubectl --namespace storageos logs storageos-node-8z77g --container init
    
    # Truncated output...
    Checking configfs
    configfs mounted on sys/kernel/config
    Module target_core_mod is not running
    executing modprobe -b target_core_mod
    Module tcm_loop is not running
    executing modprobe -b tcm_loop
    modprobe: FATAL: Module tcm_loop not found.             # "tcm_loop" kernel module is missing.
    
  2. End users can install the linux-image-extra-$(uname -r) package for your distribution which contains extra kernel modules that may have been left out of the base kernel package. End user can also use modprobe to load the required kernel modules:

    # Ensure that "kmod" is installed.
    sudo apt install kmod               # Debian based distributions.
    sudo dnf install kmod               # Red Hat based distributions.
    
    # Use "modprobe" to load the kernel modules below on the worker nodes were Ondat will run.
    modprobe --all target_core_mod tcm_loop configfs target_core_user uio
    

    💡 For more information on the required kernel modules for Ondat, review the Ondat Prerequisites page.

  3. Once the kernel modules have been successfully installed on the nodes, restart the Ondat daemonset pods by deleting the pods and let Kubernetes recreate the pods, which will detect the new system changes on the nodes.