Skip to main content
Loading
Version: Operator 2.3.0

Troubleshooting

Pods stuck in pending stateโ€‹

After an Aerospike cluster has been created or updated if the pods are stuck with "Pending" status like so:

Output:

NAME          READY   STATUS      RESTARTS   AGE
aerocluster-0-0 1/1 Pending 0 48s
aerocluster-0-1 1/1 Pending 0 48s

Describe the pod to find the reason for scheduling failure:

kubectl -n aerospike describe pod aerocluster-0-0

Under the events section you will find the reason for the pod not being scheduled. For example:

Output

QoS Class:       Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.

Possible reasons are

  • Storage class incorrect or not created. Please see persistent storage configuration for details.
  • 1 node(s) didn't match Pod's node affinity - Invalid zone, region, racklabel, or other parameter for the rack configured for this pod.
  • Insufficient resources, CPU or memory available to schedule more pods.

Pods keep crashingโ€‹

After an Aerospike cluster has been created or updated if the pods are stuck with "Error" or "CrashLoopBackOff" status like so:

Output:

NAME          READY   STATUS      RESTARTS   AGE
aerocluster-0-0 1/1 Error 0 48s
aerocluster-0-1 1/1 CrashLoopBackOff 2 48s

Check the following logs to see if pod initialization failed or the Aerospike Server stopped.

Init logs:

kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init

Server logs:

kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server

Possible reasons are

  • Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with correct feature key file. See Aerospike secrets for details.
  • Bad Aerospike configuration - The operator tries to validate the configuration before applying it to the cluster. However, it's still possible to misconfigure the Aerospike server. The offending parameter is logged in the server logs and should be fixed and applied again to the cluster. See Aerospike configuration change for details.

Error connecting to the cluster from outside Kubernetesโ€‹

If the cluster runs fine as verified by the pod status and asadm (see connecting with asadm), Ensure that firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports. See Port access for details.

Eventsโ€‹

Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, missing storage classes.
When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.

To help troubleshoot issues, the Operator generates events indicating state changes, errors and informational messages for the cluster it is working on.

Run kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name] to see the events generated by the operator for an Aerospike cluster. For example, this command displays the events generated for cluster aeroclutster in the aerospike namespace:

kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events 

Output:

LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE
90s Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
92s Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
2m8s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
92s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
92s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
61s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0

To see all the events for the cluster's namespace (aerospike in this case), run the following command

kubectl -n aerospike get events

Output:

LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE
15m Normal Killing pod/aerocluster-0-0 Stopping container aerospike-server
15m Normal Scheduled pod/aerocluster-0-0 Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal
15m Normal AddedInterface pod/aerocluster-0-0 Add eth0 [10.131.1.30/23] from openshift-sdn
15m Normal Pulled pod/aerocluster-0-0 Container image "registry.connect.redhat.com/aerospike/aerospike-kubernetes-init:0.0.17" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-init
15m Normal Started pod/aerocluster-0-0 Started container aerospike-init
15m Normal Pulled pod/aerocluster-0-0 Container image "registry.connect.redhat.com/aerospike/aerospike-server-enterprise-ubi8:6.1.0.1" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-server
15m Normal Started pod/aerocluster-0-0 Started container aerospike-server
16m Normal Killing pod/aerocluster-0-1 Stopping container aerospike-server
16m Normal Scheduled pod/aerocluster-0-1 Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal
16m Normal AddedInterface pod/aerocluster-0-1 Add eth0 [10.131.1.28/23] from openshift-sdn
16m Normal Pulled pod/aerocluster-0-1 Container image "registry.connect.redhat.com/aerospike/aerospike-kubernetes-init:0.0.17" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-init
16m Normal Started pod/aerocluster-0-1 Started container aerospike-init
16m Normal Pulled pod/aerocluster-0-1 Container image "registry.connect.redhat.com/aerospike/aerospike-server-enterprise-ubi8:6.1.0.1" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-server
16m Normal Started pod/aerocluster-0-1 Started container aerospike-server
15m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful
16m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful
16m Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
16m Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
16m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
15m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0

Operator logsโ€‹

OpenShiftโ€‹

List the operator pods:

kubectl -n openshift-operators get pod | grep aerospike-operator-controller-manager

Output:

aerospike-operator-controller-manager-677f99497c-qrtcl   2/2     Running   0          9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h

To get the log for an operator pod using the pod names from the previous example, the command is:

kubectl -n openshift-operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

Add the -f flag to follow the logs continuously.

kubectl -n openshift-operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

The series of steps the operator follows to apply user changes are logged along with errors, and warnings.

Other Kubernetes distributionsโ€‹

List the operator pods:

kubectl -n operators get pod | grep aerospike-operator-controller-manager

Output:

aerospike-operator-controller-manager-677f99497c-qrtcl   2/2     Running   0          9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h

To get the log for an operator pod using the pod names from the previous example, the command is:

kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

Add the -f flag to follow the logs continuously.

kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

The series of steps the operator follows to apply user changes are logged along with errors, and warnings.