Authentication and encryption for Prometheus and its exporters

Connections to Prometheus and its exporters are not encrypted and authenticated by default. This is one way of fixing that with TLS certificates and stunnel.

Authentication and encryption for Prometheus and its exporters

Prometheus components do not provide a built-in way tosecure their interfaces in any way, at least for now. If no additional components are set up, which would enable encryption or authentication (or both), all the traffic between Prometheus and its components is sent in plain text, and there are no access restrictions - anyone, who knows where to look, can access these interfaces.

Securing this setup would usually involve some sort of reverse proxy in front of Prometheus and its components, which could enable both - traffic encryption with a TLS certificate, and also authentication with, for example, username and password or the same TLS certificate. There are numerous options out there for this purpose - nginx, HAproxy, hitch, ghostunnel etc.

This post would go into a bit more technical details on how to secure communication between Prometheus and node_exporter on a remote system with the help of TLS certificates and one such tool - stunnel.

TLS certificates

First off, we will need TLS certificates for authentication and securing the traffic. To test this out, we will use an ad-hoc CA and self-signed certificates. For production, using a proper, well-protected CA to sign all of the certificates is a must, so following 2 steps in that case would probably not be necessary, since the CA would already be in place. So, to create the ad-hoc CA key and certificate for testing:

openssl genrsa -des3 -out ca.key 4096
openssl req -new -x509 -days 720 -key ca.key -out ca.crt

Next, we would need certificates to be used on Prometheus server and on server(s) running exporter(s) to be signed by this (or production) CA.

When creating a CSR (openssl req ...) for the exporter(s), it is important to note, what was supplied for Common Name/CN parameter. It does not have to be an FQDN , but it will later become useful, when configuring Prometheus server. Other than that, it's a straightforward process, and any values can be used. As mentioned, the last step below (signing of the certificate with openssl x509 ...) for production purposes should be done with a well-protected CA, probably on a separate system, not the local test one we are using here.

openssl genrsa -des3 -out prom_node_exp.key 4096
openssl req -new -key prom_node_exp.key -out prom_node_exp.csr
openssl x509 -req -days 365 -in prom_node_exp.csr -CA ca.crt -CAkey ca.key -set_serial 1 -out prom_node_exp.crt

We will also need another key pair for the Prometheus server itself. Here too, the last step for production should be done with a production CA:

openssl genrsa -des3 -out prom_server.key 4096
openssl req -new -key prom_server.key -out prom_server.csr
openssl x509 -req -days 365 -in prom_server.csr -CA ca.crt -CAkey ca.key  -set_serial 1 -out prom_server.crt

By default openssl might require you to set up a passphrase for the private key files. That might not always be conventient, so there is a way to export the private key to a file without a password, for example:

openssl rsa -in prom_node_exp.key -out prom_node_exp_nopass.key

We will use the *_nopass.key files further in service configurations.

In the end, we should have 5 files we will need in the next steps:

  • One CA certificate. This is either our ad-hoc test CA certificate made in the beginning, or the one from an existing production CA (ca.crt)
  • One private key and one certificate for Prometheus server (prom_server_nopass.key, prom_server.crt)
  • One private key and one certificate for Prometheus exporter (prom_node_exp_nopass.key, prom_node_exp.crt). If securing multiple exporters, a separate key/certificate file for each of them can be created.

On Prometheus server we will need the ca.crt file, along with prom_server_nopass.key and prom_server.crt files. These 3 files should be made available for the Prometheus process. In this case, they were put into /opt/prometheus/secrets/ directory, and their owner set to the same user configured to run Prometheus server in the systemd unit file.

On the server, which would run the exporter(s), we would need prom_node_exp_nopass.key, prom_node_exp.crt and ca.crt  files. Same as with the Prometheus server  - these files will need to be made available for stunnel process, so they were put into /etc/stunnel/tls/ directory.

The private key files should be protected with appropriate filesystem permissions no matter where they go:

chmod 400 *.key
Prometheus exporter setup

By default, Prometheus exporters usually listen on all interfaces (0.0.0.0). In order to make the setup described here make sense, we must set it to listen on a port only on localhost (or 127.0.0.1), which would make it unreachable from the outside. All the requests from outside to the exporter would go via stunnel, which will be listening on another port on all (or some specific) interfaces other than localhost.

The ports can be chosen freely, provided there are no other services using them. Here, the node_exporter will listen on some other port (9101) on localhost, but stunnel configured for node_exporter will listen on the default port for node_exporter (9100) on all network interfaces (0.0.0.0). It could easily be the other way around, or completely different ports altogether.

In case of node_exporter, the only thing that needs to be added, is --web.listen-address=localhost:9101 parameter to its startup command in systemd unit file:

...
ExecStart=/path/to/node_exporter/node_exporter --web.listen-address=localhost:9101
...

The node_exporter needs to be restarted after these changes.

stunnel setup

Next step would be setting up stunnel. Popular distros should have stunnel package in their base repositories. It is the case for Debian and CentOS at least. Otherwise, it would be a download from stunnel website or github.

After the installation/extraction, it might be useful to set up a separate system user for stunnel (e.g. stunnel), as well as PID directory, if they are not present. Both are referenced in the configuration file below.

Care should be taken, if the PID directory (e.g. /run/ or /var/run/) is on a tmpfs, which is not persistent across reboots. Extra configuration steps of tmpfs mounts might be necessary in that case, to create a PID drectory for stunnel, owned by stunnel user.

The main stunnel configuration would go into /etc/stunnel/stunnel.conf. A possibly working version could look something like this:

setuid          = stunnel
setgid          = stunnel
debug           = 5
pid             = /run/stunnel/stunnel.pid
CAfile          = /etc/stunnel/tls/ca.crt

[node-exporter]
accept          = 0.0.0.0:9100
connect         = 127.0.0.1:9101
cert            = /etc/stunnel/tls/prom_node_exp.crt
key             = /etc/stunnel/tls/prom_node_exp_nopass.key
verify          = 2

In the first part are global parameters. We list the CA certificate there, since it will probably be common for all exporter endpoints with the same Prometheus server connecting to all of them.

The second part is endpoint-specific:

  • Which interface:port stunnel should listen (accept)
  • Where to send the authenticated requests (connect)
  • Certificate and private key (cert, key)
  • Verification parameter (verify) for securing and authenticating the connection.

The verify = 2 parameter means "Verify the peer certificate" - it will verify, if the certificate is issued by a trusted CA. See stunnel documentation for other possible verification levels.

If another exporter would need to be secured on the same host, another section, similar to [node-exporter], would need to be added to the stunnel.conf.

stunnel manpage has all the possible configuration parameters listed. Some of them are not backwards-compatible with old(er) versions.

Prometheus server setup

The last step would be to change the Prometheus job configuration for node_exporter to add TLS configuration:

- job_name: node-exporter
  static_configs:
  - targets: ['1.2.3.4:9100']
  scheme: https
  tls_config:
    ca_file: /opt/prometheus/secrets/ca.crt
    cert_file: /opt/prometheus/secrets/prom_server.crt
    key_file: /opt/prometheus/secrets/prom_server_nopass.key
    server_name: node_exporter

Here, the server_name refers to the caveat mentioned earlier - it should be the same, as the CN set in the prom_node_exp.crt. Otherwise it will fail in this csase, since it is different from the target hostname, and we do not have insecure_skip_verify: true set.

Verification of the setup

If everything is correct, there should be something like this in stunnel log after Prometheus scrapes node-exporter interface:

Service [node-exporter] accepted connection from 1.2.3.4:41658
Certificate accepted: depth=1, /C=EU/ST=Latvia/L=Riga/O=SomeCorp/CN=TestingCA/emailAddress=it@somecorp.io
Certificate accepted: depth=0, /C=EU/ST=Latvia/L=Riga/O=SomeCorp/OU=IT/CN=prometheus/emailAddress=it@somecorp.io
connect_blocking: connected 127.0.0.1:9101

It shows, that stunnel received a valid certificate from Prometheus server upon connection, and routed it to the actual node_exporter interface on localhost port 9101.

If there was a problem validating the certificate presented by Prometheus, it would be evident in the stunnel log as well. In this example, the verification level was set to 3, to verify peer certificate against locally installed client certificate:

Service [node-exporter] accepted connection from 1.2.3.4:41654
Certificate accepted: depth=1, /C=EU/ST=Latvia/L=Riga/O=SomeCorp/CN=TestingCA/emailAddress=it@somecorp.io
CERT: Certificate not found in local repository
Certificate check failed: depth=0, /C=EU/ST=Latvia/L=Riga/O=SomeCorp/OU=IT/CN=prometheus/emailAddress=it@somecorp.io
SSL_accept: 14089086: error:14089086:SSL routines:ssl3_get_client_certificate:certificate verify failed
Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket

Here, the CA certificate was accepted (CN=TestingCA), but not the client certificate (CN=prometheus), since it was not present in the ca.crt file.

Similarly, on Prometheus server side, if the certificate presented by the node_exporter would fail the check with target sever name not matching the CN attribute in the certificate of Prometheus server, and server_name in prometheus.yml would also not be present or incorrect, an error would be logged:

Get https://1.2.3.4:9100/metrics: x509: certificate is valid for node_exporter, not 1.2.3.4

Prometheus configuration parameter insecure_skip_verify, as the name suggests, is an insecure way of dealing with certificate validation issues on the Prometheus server side, as it will allow Prometheus to accept and trust any certificate presented by the remote party.

Summary

This setup enables securing of the communication channel between Prometheus and its exporters, which, by default, do not have any encryption and access restrictions.

Similar setup would also work with docker containers, where the exporter container would only be reachable from the outside via stunnel container secured with TLS certificates.