Restoring Metadata for Doris on K8S
1. Background#
I have a Doris cluster running on K8S that consists of one FE and three BE nodes.
After making some configuration changes, I restarted the FE node. However, the FE failed to start up and encountered errors.
1 | RuntimeLogger 2025-02-06 08:49:51,614 INFO (stateListener|92) [Env$5.runOneCycle():2690] begin to transfer FE type from INIT to UNKNOWN |
Based on the error message wait catalog to be ready. feType:UNKNOWN isReady:false
, this appears to be a metadata corruption issue. While there is documentation available for Doris metadata recovery at https://doris.apache.org/docs/admin-manual/trouble-shooting/metadata-operation, it doesn’t specifically cover how to perform the recovery in a K8S environment.
2. Solution#
In a K8S environment, the FE container is started by fe_entrypoint.sh
, which contains the following code:
1 | local recovery=`grep "\<selectdb.com.doris/recovery\>" $ANNOTATION_PATH | grep -v '^\s*#' | sed 's|^\s*'$confkey'\s*=\s*\(.*\)\s*$|\1|g'` |
This means we need to add an annotation to the FE StatefulSet to enable metadata recovery. Here’s how to do it:
- Add the recovery annotation to the FE StatefulSet:
1 | kubectl patch sts doris-fe -n your-namespace -p '{"spec":{"template":{"metadata":{"annotations":{"selectdb.com.doris/recovery":"true"}}}}}' |
- Delete the FE pod to trigger a restart:
1 | kubectl delete pod doris-fe-0 -n your-namespace |
The FE pod will be recreated with the recovery annotation, and the --metadata_failure_recovery
parameter will be added to the startup command. This should allow the FE to recover from the metadata corruption.
- Monitor the FE logs to verify the recovery process:
1 | kubectl logs -f doris-fe-0 -n your-namespace |
Once the recovery is complete, you should see the FE successfully start up and transition to the MASTER state. After confirming everything is working correctly, you can remove the recovery annotation:
1 | kubectl patch sts doris-fe -n your-namespace -p '{"spec":{"template":{"metadata":{"annotations":{"selectdb.com.doris/recovery":null}}}}}' |
Note: Make sure to replace your-namespace
and cluster name.