In this article:
cluster-manager is application-specific for NGN Cloud Kubernetes installations. It is responsible for changing the number of worker nodes in the cluster. This application operates with Instance Metadata API and Kubernetes API for providing the necessary integration of these components. Cluster-manager is responsible for:
cluster state monitoring (Healthy / Unhealthy);
secure deletion of worker nodes.
notification of expiration for API certificates in a cluster.
Cluster state monitoring#
Cluster-manager continuously requests information about the number of worker nodes registered in the Kubernetes cluster. If this value does not match actual number of instances in the cluster, then the cluster becomes Unhealthy. This behavior is typical for a cluster during worker node addition and removal, since an instance is started first and only after some time it is registered as a worker node of the cluster. If the number of registered worker nodes in the cluster matches the number of running instances, the cluster status becomes Healthy.
Cluster-manager regularly requests information about API certificate lifetime for master nodes in the cluster. If the lifetime is less than two weeks, the user is notified of the upcoming certificate expiration. To display the certificate lifetime correctly, renew them automatically using Certificate auto-renewal option or manually using kubeadm utility. In case of manual renewal, all certificates, including etcd, should be renewed at the same time, for example, using
kubeadm certs renew all command. If renewed selectively, certificates might be tracked incorrectly.
Secure node deletion#
When cluster-manager receives a notification about deleting worker node, it deletes worker node from the cluster using Kubernetes API. After the successful deletion of the worker node from the cluster, the instance termination process starts.
To maintain a master node (for example, update its OS kernel), you need to switch it in the SchedulingDisabled state. Cluster-manager synchronizes SchedulingDisabled state between the cluster and cloud.
Recovering unavailable master nodes#
Cluster-manager periodically checks the availability of master nodes and, if unavailable, starts the recovery process. Recovery means deleting an unavailable instance and creating a new one while preserving the network interface. Master nodes in the SchedulingDisabled state are not recovered.
The consequences of cluster-manager deletion#
We don’t recommend users to change settings of this application and configuration of its deployment in cluster. The user won’t be able to reduce the quantity of cluster worker nodes via NGN Cloud web interface:
if cluster-manager is deleted from the cluster;
when moving a running process from a master node to one of worker nodes.
Also the cluster state monitoring will stop.
Installation in cluster#
The Kubernetes Clusters service of NGN Cloud automatically installs сluster-manager in each cluster when it is created. The cluster-manager process is launched on a cluster’s master node.
Settings in use
- kind: ServiceAccount
- name: cluster-manager
- name: PYTHONUNBUFFERED
- key: node-role.kubernetes.io/master
Users are notified that cluster-manager needs to be updated after a new version is released. The notifications are displayed on the Kubernetes Clusters resource page, in the Warnings tab. The notification text contains the necessary kubectl command for updating and a link to the manifest in the yaml format.