databases, kubernetes

etcd backup & restore in kubernetes

etcd is a consistent and highly-available key value store used as Kubernetes’ store for all cluster data. In case of any disruptions, data should be recoverable via backups. etcd is a leader-based distributed system. It is preferable to run it as a cluster of odd members. A five-member cluster is recommended in production. etcd cluster achieves high availability by tolerating minor member failures.

Access to etcd is equivalent to root permission in the cluster so ideally only the API server should have access to it.

If any API servers are running in your cluster, you should not attempt to restore instances of etcd. Instead, follow these steps to restore etcd:

stop all API server instances
restore state in all etcd instances
restart all API server instances

# check where etcd stores data
etcd.service
[...]
--data-dir=/var/lib/etcd

#force to use API v3
export ETCDCTL_API=3 

etcdctl snapshot save snapshot.db

etcdctl snapshot status snapshot.db

# to restore, stop apiserver:
service kube-apiserver  stop

# set new data-dir
etcdctl snapshot restore snapshot.db --data-dir=/var/lib/etcd-from-backup

# change etcd service configuration to use new data-dir
etcd.service
[...]
--data-dir=/var/lib/etcd-from-backup

# restart services
systemctl daemon-reload
service etcd restart
service kube-apiserver start

# remember to specify certificate files and enpodint - to run with all above etcdctl commands!
# exception is when etcd is on the controlplane node and from there we are invoking etcdctl
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/ca.crt \
--cert=/etc/etcd/etcd-sever.crt \
--key=/etc/etcd/etcd-server.key

# trusted-ca-file, cert-file and key-file can be obtained from the description of the etcd Pod or service definition file
etcdctl     etcd.service
--endpoints --listen-client-urls 
--cacert    --trusted-ca-file
--cert      --cert-file 
--key       --key-file