Skip to content

Commit

Permalink
Merge pull request #4355 from esl/docs/instrumentation
Browse files Browse the repository at this point in the history
Document instrumentation
  • Loading branch information
jacekwegr authored Aug 13, 2024
2 parents 0e8cf4b + e0e819c commit 6087044
Show file tree
Hide file tree
Showing 31 changed files with 1,292 additions and 468 deletions.
4 changes: 2 additions & 2 deletions big_tests/tests/bosh_SUITE.erl
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ create_and_terminate_session(Config) ->

% Assert that correct events have been executed
[instrument_helper:assert(Event, Label, fun(#{byte_size := BS}) -> BS > 0 end)
|| {Event, Label} <- instrumentation_events(), Event =/= c2s_message_processing_time],
|| {Event, Label} <- instrumentation_events(), Event =/= c2s_message_processed],

%% Verify C2S listener is not used
instrument_helper:assert_not_emitted(negative_instrumentation_events()),
Expand Down Expand Up @@ -954,7 +954,7 @@ wait_for_zero_bosh_sessions() ->
instrumentation_events() ->
instrument_helper:declared_events(mod_bosh, [])
++ instrument_helper:declared_events(mongoose_c2s, [global])
++ [{c2s_message_processing_time, #{host_type => host_type()}}].
++ [{c2s_message_processed, #{host_type => host_type()}}].

negative_instrumentation_events() ->
[{Name, #{}} || Name <- negative_instrumentation_events_names()].
Expand Down
4 changes: 2 additions & 2 deletions big_tests/tests/connect_SUITE.erl
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ metrics_test(Config) ->
[instrument_helper:assert(Event, Label, fun(#{byte_size := BS}) -> BS > 0;
(#{time := Time}) -> Time > 0 end)
|| {Event, Label} <- instrumentation_events(),
Event =/= c2s_message_processing_time].
Event =/= c2s_message_processed].

tls_authenticate(Config) ->
%% Given
Expand Down Expand Up @@ -818,4 +818,4 @@ proxy_info() ->
instrumentation_events() ->
instrument_helper:declared_events(mongoose_c2s_listener, [#{}])
++ instrument_helper:declared_events(mongoose_c2s, [global])
++ [{c2s_message_processing_time, #{host_type => domain_helper:host_type()}}].
++ [{c2s_message_processed, #{host_type => domain_helper:host_type()}}].
4 changes: 2 additions & 2 deletions big_tests/tests/mim_c2s_SUITE.erl
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ escalus_start(Cfg, FlatCDs) ->
instrumentation_events() ->
instrument_helper:declared_events(mongoose_c2s_listener, [#{}])
++ instrument_helper:declared_events(mongoose_c2s, [global])
++ [{c2s_message_processing_time, #{host_type => domain_helper:host_type()}}].
++ [{c2s_message_processed, #{host_type => domain_helper:host_type()}}].

tcp_instrumentation_events() ->
[{c2s_tcp_data_out, #{}},
Expand All @@ -253,6 +253,6 @@ tls_instrumentation_events() ->

common_instrumentation_events() ->
HostType = domain_helper:host_type(),
[{c2s_message_processing_time, #{host_type => HostType}},
[{c2s_message_processed, #{host_type => HostType}},
{c2s_xmpp_element_size_in, #{}},
{c2s_xmpp_element_size_out, #{}}].
2 changes: 1 addition & 1 deletion big_tests/tests/websockets_SUITE.erl
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ escape_attrs(Config) ->
instrumentation_events() ->
instrument_helper:declared_events(mod_websockets, [])
++ instrument_helper:declared_events(mongoose_c2s, [global])
++ [{c2s_message_processing_time, #{host_type => domain_helper:host_type()}}].
++ [{c2s_message_processed, #{host_type => domain_helper:host_type()}}].

negative_instrumentation_events() ->
[{Name, #{}} || Name <- negative_instrumentation_events_names()].
Expand Down
113 changes: 113 additions & 0 deletions doc/configuration/instrumentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
This section is used to configure MongooseIM instrumentation.
It is a system of executing events when something of interest happens in the server.
They are mainly used for the purpose of metrics.

Instrumentation events are acted upon by handlers. Available instrumentation handlers are:

* `prometheus` - exposes a metrics endpoint for [Prometheus](https://prometheus.io/).
* `exometer` - starts [Exometer](https://github.com/esl/exometer_core), a metrics server capable of exporting metrics using reporters. Currently available is a [Graphite](https://graphiteapp.org/) reporter.
* `log` - logs instrumentation events to disk.

Enable them by adding a corresponding sections and possible configuration values.
We recommend choosing either Prometheus or Exometer as a solution for exposing metrics.

## General options

### `instrumentation.probe_interval`
* **Syntax:** positive integer
* **Default:** `15` (seconds)
* **Example:** `probe_interval = 60`

Sets the interval for periodic measurements (probes).

## Exometer options

General options for the Exometer reporter:

### `instrumentation.exometer.all_metrics_are_global`
* **Syntax:** boolean
* **Default:** `false`
* **Example:** `all_metrics_are_global = true`

## Exometer reporter options

Multiple reporters can be configured.
Because of that, each reporter is configured in a section inside a TOML array, for example: `[[instrumentation.exometer.report.graphite]]`.

### `instrumentation.exometer.report.graphite.interval`
* **Syntax:** positive integer
* **Default:** `60000` (milliseconds)
* **Example:** `interval = 30_000`

Interval at which metrics will be sent to Graphite.

### `instrumentation.exometer.report.graphite.host`
* **Syntax:** string
* **Default:** no default, required
* **Example:** `host = "graphite.local"`

The name or IP address of the Graphite server.
This option is mandatory.

### `instrumentation.exometer.report.graphite.port`
* **Syntax:** integer, between 0 and 65535
* **Default:** `2003`
* **Example:** `port = 2033`

The port on which the Graphite server listens for connections.

### `instrumentation.exometer.report.graphite.connect_timeout`
* **Syntax:** positive integer
* **Default:** `5000` (milliseconds)
* **Example:** `connect_timeout = 10_000`

The amount of time Graphite reporter will wait before timing out.

### `instrumentation.exometer.report.graphite.api_key`
* **Syntax:** string
* **Default:** `""`
* **Example:** `api_key = "hosted_graphite_api_key"`

API key to use when reporting to a hosted graphite server.

### `instrumentation.exometer.report.graphite.prefix`
* **Syntax:** string
* **Default:** no default
* **Example:** `prefix = "mim_stats"`

A prefix to prepend all metric names with before they are sent to the graphite server.

### `instrumentation.exometer.report.graphite.env_prefix`
* **Syntax:** string
* **Default:** no default
* **Example:** `env_prefix = "GRAPHITE_METRICS_PREFIX"`

Specifies an environmental variable name from which an additional prefix will be taken.
In case both `prefix` and `env_prefix` are defined, it will be placed before the `prefix` and separated with a dot.

## Example Prometheus configuration

This configuration enables `prometheus`, and `log` handlers:
```toml
[instrumentation]
probe_interval = 10_000

[instrumentation.prometheus]

[instrumentation.log]
```

## Example Exometer configuration

This configuration enables `exometer` handler with two different Graphite reporters.
```toml
[[instrumentation.exometer.report.graphite]]
host = "127.0.0.1"
interval = 15_000
prefix = "mongooseim"
connect_timeout = 5000

[[instrumentation.exometer.report.graphite]]
host = "hosted_graphite.com"
prefix = "mim"
```
4 changes: 2 additions & 2 deletions doc/configuration/outgoing-connections.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ When MongooseIM fails to connect to the DB, it retries with an exponential backo
* **Example:** `host = "localhost"`

#### `outgoing_pools.rdbms.*.connection.port`
* **Syntax:** string
* **Syntax:** integer, between 0 and 65535
* **Default:** `5432` for `pgsql`; `3306` for `mysql`
* **Example:** `port = 5343`

Expand Down Expand Up @@ -204,7 +204,7 @@ There are two important limitations:
* **Example:** `host = "redis.local"`

### `outgoing_pools.redis.*.connection.port`
* **Syntax:** integer, between 0 and 65535, non-inclusive
* **Syntax:** integer, between 0 and 65535
* **Default:** `6379`
* **Example:** `port = 9876`

Expand Down
22 changes: 22 additions & 0 deletions doc/listeners/listen-http.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Following configuration option is used to set up an HTTP handler:

* `mod_bosh` - for [BOSH](https://xmpp.org/extensions/xep-0124.html) connections,
* `mod_websockets` - for [WebSocket](https://tools.ietf.org/html/rfc6455) connections,
* `mongoose_prometheus_handler` - for [Prometheus]((https://prometheus.io/) metrics,
* `mongoose_graphql_handler` - for GraphQL API,
* `mongoose_admin_api`, `mongoose_client_api` - for REST API.

Expand Down Expand Up @@ -183,6 +184,12 @@ By default, all modules are enabled, so you don't need to change this option.
The Swagger documentation of the client API is hosted at the `/api-docs` path.
You can disable the hosted documentation by setting this option to `false`.

## Handler types: Prometheus - `mongoose_prometheus_handler`

Requires no additional options other than the [common handler options](#common-handler-options) in the listener section.
In order to collect useful metrics, a `[prometheus]` section has to be added in [the instrumentation section](../configuration/instrumentation.md#).
The default configuration available with MongooseIM is shown in [Example 7](#example-7-prometheus) below.

## Transport options

The options listed below are used to modify the HTTP transport settings.
Expand Down Expand Up @@ -334,3 +341,18 @@ REST API for clients.
host = "_"
path = "/api"
```

### Example 7. Prometheus

Prometheus metrics endpoint.

```toml
[[listen.http]]
port = 9091

transport.num_acceptors = 10

[[listen.http.handlers.mongoose_prometheus_handler]]
host = "_"
path = "/metrics"
```
20 changes: 16 additions & 4 deletions doc/modules/mod_csi.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,19 @@ Buffer size for messages queued when session was `inactive`.

If you'd like to learn more about metrics in MongooseIM, please visit [MongooseIM metrics](../operation-and-maintenance/MongooseIM-metrics.md) page.

| Name | Type | Description (when it gets incremented) |
|--------------------|--------|----------------------------------------|
| `mod_csi_active` | spiral | A client becomes active. |
| `mod_csi_inactive` | spiral | A client becomes inactive. |
Prometheus metrics have a `host_type` label associated with these metrics.
Since Exometer doesn't support labels, the host types, or word `global`, are part of the metric names, depending on the [`instrumentation.exometer.all_metrics_are_global`](../configuration/instrumentation.md#instrumentationexometerall_metrics_are_global) option.

=== "Prometheus"

| Name | Type | Description (when it gets incremented) |
|------|------|----------------------------------------|
| `mod_csi_active_count` | counter | A client becomes active. |
| `mod_csi_inactive_count` | counter | A client becomes inactive. |

=== "Exometer"

| Name | Type | Description (when it gets incremented) |
|------|------|----------------------------------------|
| `[HostType, mod_csi_active, count]` | spiral | A client becomes active. |
| `[HostType, mod_csi_inactive, count]` | spiral | A client becomes inactive. |
23 changes: 18 additions & 5 deletions doc/modules/mod_event_pusher_http.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,23 @@ Below is an example of what the body of an HTTP POST request can look like:

If you'd like to learn more about metrics in MongooseIM, please visit [MongooseIM metrics](../operation-and-maintenance/MongooseIM-metrics.md) page.

| Name | Type | Description (when it gets incremented) |
| ---- | ---- | -------------------------------------- |
| `[Host, mod_event_pusher_http, sent]` | spiral | An HTTP notification is sent successfully. |
| `[Host, mod_event_pusher_http, failed]` | spiral | An HTTP notification failed. |
| `[Host, mod_event_pusher_http, response_time]` | histogram | Does not include timings of failed requests. |
Prometheus metrics have a `host_type` label associated with these metrics.
Since Exometer doesn't support labels, the host types, or word `global`, are part of the metric names, depending on the [`instrumentation.exometer.all_metrics_are_global`](../configuration/instrumentation.md#instrumentationexometerall_metrics_are_global) option.

=== "Prometheus"

| Name | Type | Description (when it gets incremented) |
| ---- | ---- | -------------------------------------- |
| `mod_event_pusher_http_sent_count` | counter | An HTTP notification is sent successfully. |
| `mod_event_pusher_http_sent_failure_count` | counter | An HTTP notification failed. |
| `mod_event_pusher_http_sent_response_time` | histogram | Time taken to send HTTP notification. Does not include timings of failed requests. |

=== "Exometer"

| Name | Type | Description (when it gets incremented) |
| ---- | ---- | -------------------------------------- |
| `[Host, mod_event_pusher_http_sent, count]` | spiral | An HTTP notification is sent successfully. |
| `[Host, mod_event_pusher_http_sent, failure_count]` | spiral | An HTTP notification failed. |
| `[Host, mod_event_pusher_http_sent, response_time]` | histogram | Time taken to send HTTP notification. Does not include timings of failed requests. |

[mod_event_pusher]: ./mod_event_pusher.md
Loading

0 comments on commit 6087044

Please sign in to comment.