RTM - Real Time Monitoring

From Compilenix Wiki
Jump to: navigation, search

Note

As of 2018-11-09 I would rather use this setup to collect data only via CollectD and drop Netdata from the equasion. The amount of data collected is huge and not realy usefull in the most cases. CollectD provides enough data to get most of the system monitored.

Overview

This is a quick-guide to real-time-monitoring with these components:

  • Netdata as a metric-collector on Unix-like-hosts
  • InfluxDB as data-storage
  • Grafana as long-term-frontend
  • CollectD for stuff

Please note: Currently I would rather use a setup without netdata. As neat as it is, collectD can supply almost all the data. In addition, collectD organizes its data a lot neater in InfluxDB, making dashboards with multiple hosts a lot less of a pain to set up.

Netdata

Netdata setup is quite straight-forward. For detail see: Netdata Wiki: Installation It boils down to:

  • Arch Linux: pacman -S netdata
  • Gentoo Linux: emerge --ask netdata
  • Solus Linux: eopkg install netdata

All others:

Setup KSM (Kernel Sampage Merging)

  • echo 1 > /sys/kernel/mm/ksm/run
  • echo 1000 > /sys/kernel/mm/ksm/sleep_millisecs
  • echo "echo 1 > /sys/kernel/mm/ksm/run" >> /etc/rc.local
  • echo "echo 1000 > /sys/kernel/mm/ksm/sleep_millisecs" >> /etc/rc.local

Netdata defaults

Netdata collects everything on a per-second basis. It keeps a history of the last hour or so in memory. On a medium sized hardware-server it collects about 6000 metrics every second, resulting in 90MB of used memory. By default alerts are turned on and send to root. The dashboard is available via the server-ip (or hostname) at port 19999.

Netdata config-changes

You want to enable the backend after InfluxDB is running to the following values:

[backend]

       enabled = yes
       data source = average
       type = opentsdb
       destination = tcp:influxdb-server:4242
       prefix = netdata
       hostname = netdatahost
       update every = 10
       buffer on failures = 10
       timeout ms = 20000

You may want to change "netdatahost" to your hostname and "influxdb-server" to the fqdn or ip of the server hosting the influxDB.

InfluxDB

Setup of influxDB: InfluxDB-Installation

InfluxDB config

After config-changes influxDB has to be restarted.

http-section

This section manages access to the influxDB native-interface. Set the following directives and leave all others commented out:

  • enabled = true
  • bind-address = "127.0.0.1:8086"

This ensures, that only localhost is able to access the database.

subscriber-section

Disable the subscriber service, as it's not needed:

  • enabled = false

collectd-section

Ths section is important for the use of collectD Set the following directives and leave all others commented out:

  • enabled = true
  • bind-address = "influxdb-server:25826"
  • database = "collectd"

Replace "influxdb-server" with fqdn or IP of the server hosting the database.

opentsdb-section

This section is important for use of netdata Set the following directives and leave all others commented out:

  • enabled = true
  • bind-address = "influxdb-server:4242"
  • database = "opentsdb"

Verify connectivity

  1. On the InfluxDB-server: type "influx" to connect to the database. "SHOW DATABASES;" should reveal a opentsdb database.
  2. On the Netdata-Dashboard the should be a section "backend" under "Netdata Monitoring" the buffered and sent values should bee more or less equal

Grafana

Installation of grafana is documented here: Grafana-Installation

Accessing InfluxDB via Grafana

On the Grafana-Dashboard add a Datasource with:

  • Name: OpenTSDB
  • Default: may be checked
  • Type: InfluxDB
  • URL: http://localhost:8086
  • Access: proxy
  • Http Auth: all uncheckd
  • Databas: opentsdb
  • User: admin
  • Password: may be "admin"

You may now begin to create Dashboards for the hosts.

CollectD

Install collectd to the system.

In the "/etc/collectd.conf" all LoadPlugin-statements except syslog should be disabled, as these will be added in seperate config-files under "/etc/collectd.d/" The network-config is required for collectD to transmit data to the InfluxDB-server where it may be collected by Grafana

/etc/collectd.d/network

LoadPlugin network <Plugin "network">

     Server "influxdb-server" "25826"

</Plugin> Replace influxdb-server with your influxDB-server

CollectD integration into Grafana

Add a nother datasource with the settings:

  • Name: CollectD
  • Default: may be checked
  • Type: InfluxDB
  • All other settings as the OpenTSDB connector
  • Database: collectd