Problem Note 69017: RabbitMQ pods fail to start in a SAS® Viya® environment
There are rare situations where RabbitMQ pods are unable to start after severe issues occur, such as the PersistentVolume that is used by a RabbitMQ pod running out of space or the file system being damaged. To recover RabbitMQ pods after these types of events, you must recreate the PersistentVolume.
Important Notes:
- Recreating the PersistentVolume is a destructive event. You must delete the allocated PersistentVolume. All exchanges, queues, bindings, and any unread messages will be destroyed. You should use this action only as a last resort to allow RabbitMQ pods to run.
- Completing the following steps causes an outage while the RabbitMQ pods are stopped and restarted. Ensure that you plan for an outage before continuing with the steps.
Complete the following steps to recover RabbitMQ pods.
Note: Set ${NS} to the namespace where your SAS Viya environment is deployed. For example, if the namespace is named viya, you would execute the following: NS=viya
- Scale the sas-rabbitmq-server StatefulSet down to zero replicas:
kubectl -n ${NS} scale statefulset sas-rabbitmq-server --replicas=0
- Wait until all three RabbitMQ pods are terminated completely. To confirm, run the following command and wait until it returns condition met messages:
kubectl -n ${NS} wait --for=delete pod sas-rabbitmq-server-{0..2}
- After you confirm that there are no RabbitMQ pods running, delete all the PersistentVolumeClaims (PVCs) that are associated with RabbitMQ:
kubectl -n ${NS} delete pvc -l app.kubernetes.io/name=sas-rabbitmq-server
- Scale the sas-rabbitmq-server StatefulSet back up to the original three replicas:
kubectl -n ${NS} scale statefulset sas-rabbitmq-server --replicas=3
Note: You should use these instructions for only RabbitMQ pods. You should not use them to restore Consul or any other pods.
Operating System and Release Information
SAS System | SAS Viya | Linux for x64 | 2020.1 | | Viya | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | medium |
Date Modified: | 2025-07-31 12:57:21 |
Date Created: | 2022-03-21 14:11:05 |