Monitoring#

General information#

Prometheus cloud monitoring service allows for an easy integration of NGN Cloud PaaS services into a single monitoring system. In addition, you can connect your own services to the monitoring system.

Prometheus, a popular open-source monitoring system, interacts with the monitored systems and receives their status, health and performance data. A Prometheus server periodically polls monitoring agents (called “exporters” in Prometheus) and writes the obtained metrics to its time series database (TSDB).

Prometheus has a simple web interface and supports PromQL, a flexible query language. The web interface allows you to view all systems covered by monitoring, detect failures, and analyze various metrics for the desired time period. Thus, you can accurately and timely monitor many aspects of system operation and keep services up and running.

Error notifications are configured using Alertmanager component included in the Prometheus stack. A system administrator sets the rules by specifying the events, which trigger error notifications (for example, when a metric falls beyond the predefined range for a certain period of time). To ensure that you are always aware of changes in the system performance, the service supports sending notifications to Telegram, email, and third-party services via a webhook mechanism.

The cloud service also includes Grafana, a powerful visualization tool. It will help you create intuitive dashboards to visualize your system operation. Grafana allows you to display graphs of various metrics on one screen (with changes over time) and also supports time interval selection and filtering by system or host. This makes it easier to find causes of problems.

When deploying a monitoring service for installed PaaS services, the corresponding notification rules and configured dashboards are automatically created in Grafana.

Key concepts#

Scrape jobs – The data collection mechanism allows you to monitor not only PaaS services, but any systems. For integration with the monitoring system, install an exporter on the system to be monitored and specify the exporter’s address and port in the scrape job.

Source – The service or system covered by monitoring.

Exporters – Monitoring agents that interact with monitored systems and provide data about their status, health and performance.

Notification channel (recipient) – The channel to which notifications are sent. In NGN Cloud, notifications can be sent to Telegram, email and other services. In the latter case, a webhook mechanism is used.

Default notification channel – If the default channel is specified, then all alerts that do not meet any selection criteria will be sent to it. If the default channel is not specified, then such alerts will not be sent.

Route – The route defines the notification channel to which notifications are sent when the specified route selection criteria are met.

Labels – Labels allow for grouping and filtering metrics from different sources. They are helpful, for example, to set up route selection criteria. For details on using labels, see the corresponding section.

Metrics – Quantitative metrics used by Prometheus to monitor the health of monitored systems. For details, see the official documentation.

Before you begin#

To get started with the monitoring service, follow these steps:

  1. Create a project, if you don’t have one.

  2. In the IAM section, create a user with the PaaS Administrator or Cloud Administrator role and add it to the project with the PaaS privilege.

  3. Make sure that the project has all the required resources – subnets, SSH keys, and security groups. Otherwise, create them.

  4. Read the recommendations on how to work with the monitoring service in the cloud.

Monitoring service management#

Create a monitoring service#

To run the monitoring service, go to the Service Store or Installed Services subsection of the PaaS section. In the former case, go to the Monitoring block, in the latter one, to the tab with the same name, and click Create. This will open the service creation wizard.

  1. Specify the network environment where the monitoring service will run:

    • VPC where the service will be deployed.

    • Security groups to control traffic through interfaces of the instances on which ELK service will run.

    • Subnet to which an instance with the service deployed on it will be attached, or network interface through which this instance will be attached to the subnet.

    Click :bdg-primary:Next to proceed to the next step.

  2. Specify the configuration of the instance where the monitoring service will run. Select the instance type and parameters of its volumes: type, size and IOPS (if available for the type you choose).

    In addition, you can specify an SSH key. In this case, after automatic service configuration, you will have SSH access to the respective instances.

    Attention

    We provide the option to connect to instances using an SSH key while the new Prometheus service is beta testing. This feature may be disabled in the future.

    Click :bdg-primary:Next to proceed to the next step.

  3. Specify the service name. The name must consist of Roman letters, digits and the symbols - and ., begin with a letter and meet the requirements for domain names.

  4. Click Create.

Note

The service launching process usually takes 5 to 15 minutes.

After starting the service, you can add notification channels and configure routes. If you want to monitor systems other than PaaS services, you must also configure data collection.

Configure data collection#

Using Prometheus cloud service, you can monitor both PaaS services and other systems deployed in NGN Cloud. To monitor other systems using the service, first install Prometheus monitoring agents (exporters) on them.

Create a scrape job#

To collect data from a specific system, create a scrape job first.

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service for which you want to configure data collection, and click on its name to go to the service page.

  3. Open the Scrape jobs tab and click Create.

  4. In the window that opens, specify the name for the scrape job and the target (monitored system). To identify the source, you may specify its IP address, FQDN, or hostname, for example, 192.168.1.100:9100.

    If you don’t want to assign your own labels, skip the next step. If necessary, you can add labels later by editing the scrape job.

  5. To assign labels, click Add labels. Specify the key and value of the label. If you need to add one more label, click Add label.

  6. To add a scrape job, click Create.

Modify a scrape job#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service for which you want to modify a scrape job, and click on the service name to go to its page.

  3. Open the Scrape Jobs tab and click Modify.

  4. In the window that opens, edit the Targets field by adding new targets and/or deleting the existing ones.

  5. If you need to modify labels, click Edit labels. When you move to the next step, you will be able to edit label keys and values, add new labels, and/or delete existing ones.

  6. Once you have made all the required modifications, click Save.

Delete a scrape job#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service in which you want to delete a scrape job, and click on the service name to go to its page.

  3. Open the Scrape Jobs tab and select a scrape job in the resource table. You can select multiple jobs at the same time.

  4. Click on Delete and confirm the action.

Configure notification channels#

You can configure sending notifications to Telegram and email, as well as to third-party services, using a webhook mechanism.

Add notification channel#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service for which you want to add a notification channel, and click on the service name to go to its page.

  3. Open the Notification Channels tab and click Add.

  4. In the dialog window, specify the channel name and select the channel type. Notifications can be sent to Telegram (type telegram), email (email) and other services using a webhook mechanism (webhook). If you want to make the channel the default one, check the Default notification channel checkbox.

    Note

    If you already have a default channel and want to assign another one instead, then first disable this option for the current default channel.

  5. Depending on the selected channel type, also set the following parameters:

    • Notify about resolved alerts – When normal metric values are restored, a respective notification will be sent to the Telegram channel.

    • Telegram bot token.

    • ID of the chat where to send the messages.

    • To – Recipient’s email address.

    • From – Sender’s email address.

    • Smarthost – SMTP server used to send email.

    • Hello – Hostname used for identification on the SMTP server.

    • Authentication username.

    • SMTP over TLS – When authentication is enabled, a TSL connection must be used.

    • Authentication password.

    • Endpoint – Endpoint URL.

    • Maximum number of alerts – Limit on the number of warnings that can be included in one notification to a given channel. If the number of warnings exceeds this number, the remaining ones are discarded.

  6. Click Add to create a channel.

Modify channel parameters#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service for which you want to modify notification channel parameters, and click on the service name to go to its page.

  3. Open the Notification Channels tab, select the channel in the resource table and click Modify.

  4. In the dialog window that opens, you can enable or disable the Default notification channel option. Additionally, depending on the channel type, you can change the following parameters:

    • Notify about resolved alerts – When normal metric values are restored, a respective notification will be sent to the Telegram channel.

    • Telegram bot token.

    • ID of the chat where to send the messages.

    • To – Recipient’s email address.

    • From – Sender’s email address.

    • Smarthost – SMTP server used to send email.

    • Hello – Hostname used for identification on the SMTP server.

    • Authentication username.

    • SMTP over TLS – When authentication is enabled, a TSL connection must be used.

    • Authentication password.

    • Endpoint – Endpoint URL.

    • Maximum number of alerts – Limit on the number of warnings that can be included in one notification to a given channel. If the number of warnings exceeds this number, the remaining ones are discarded.

  5. Click Save to modify the channel parameters.

Delete a notification channel#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service in which you want to delete a notification channel, and click on the service name to go to its page.

  3. Open the Notification Channels tab and select a channel in the resource table. You can select multiple channels at the same time.

  4. Click Delete and confirm the action in the dialog window.

Configure routes#

Routes allow you to send different notifications to different channels. Keep in mind that routes are checked in order of priority. If a match is found, the notification is sent to the channel specified in the route and route checking stops by default.

If necessary, you can configure sending the same notifications to several channels. The search for a suitable route can continue if the “Continue route selection” option is set for the route.

Create a route#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service for which you want to create a route, and click on the service name to go to its page.

  3. Open the Routes tab and click Add.

  4. In the dialog window, specify the route parameters:

    • Name – The route name.

    • >*Matchers* – One or more conditions, under which the respective notifications will be sent to the channel specified in the route. The matchers are set as a string consisting of a label (for details, see about label usage ), operator, and value. Supported operators: =, !=, =~, !~, where the “tilde” character implies a comparison with a regular expression. If there are several conditions, the “logical AND” operation is used.

      Note

      After typing each criterion, click Enter to enter it.

    • Continue – If this option is enabled, the search for other routes will continue even if the route selection criteria have been met.

    • Group by – The list of labels by which notifications will be grouped. For example, notifications can be grouped by service, node, or environment name. By default, the special value ... is set, which does not use any grouping.

    • Group wait – Delay before sending warnings for the first time in case other warnings appear that belong to the same group.

    • Group interval – Time to wait before sending new warnings for the group for which the first warning has already been sent.

    • Repeat Interval – The period of time after which warnings are re-sent if the problem has not been resolved.

  5. Click Create to create a route.

Modify a route#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service in which you want to modify a route, and click on the service name to go to its page.

  3. Open the Routes tab and click Modify.

  4. In the dialog window, edit the route parameters:

    • Matchers – One or more conditions, under which the corresponding notifications will be sent to the channel specified in the route.

    • Continue – If this option is enabled, the search for other routes will continue even if the route selection criteria have been met.

    • Group by – The list of labels by which notifications will be grouped. For example, notifications can be grouped by node, service, or environment name. By default, the special value ... is set, which does not use any grouping.

    • Group wait – Delay before sending warnings for the first time in case other warnings appear that belong to the same group.

    • Group interval – Time to wait before sending new warnings for the group for which the first warning has already been sent.

    • Repeat Interval – The period of time after which warnings are re-sent if the problem has not been resolved.

  5. Click Save to modify the route parameters.

Delete a route#

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service in which you want to delete a route, and click on the service name to go to its page.

  3. Open the Routes tab and select the route to be deleted in the resource table. You can select multiple routes to be deleted, at the same time.

  4. Click Delete and confirm the action in the dialog window.

Delete a monitoring service#

Note

If the service to be deleted is used to monitor PaaS services, then, to delete it, disable monitoring for the corresponding services first.

  1. Go to the PaaS section Installed Services and open the Monitoring tab.

  2. In the resource table, find the monitoring service, which you want to delete.

  3. Click or go to the service page and click Delete in the Information tab.

  4. If you want to use network interfaces in the future, for example, to start a service with the same network parameters, then, in the window that opens, disable the Delete associated network interfaces option.

  5. Click Delete to confirm the action.