If you want to find a suitable solution for monitoring MongoDB that help you observe most of important MongoDB metrics.
In this article, we will introduce the combination of Telegraf and Prometheus.
TL;DR:
First of all, you need to install Prometheus and Telegraf.
Install Prometheus
$ sudo su -
# useradd --no-create-home --shell /bin/false prome
# mkdir /etc/prometheus
# mkdir /var/lib/prometheus
# wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz
# tar -xzvf prometheus-2.28.1.linux-amd64.tar.gz
# cp prometheus-2.28.1.linux-amd64/prometheus /usr/local/bin/
# cp prometheus-2.28.1.linux-amd64/promtool /usr/local/bin/
# chown prome:prome /usr/local/bin/prometheus
# chown prome:prome /usr/local/bin/promtool
# cp -r prometheus-2.28.1.linux-amd64/consoles /etc/prometheus
# cp -r prometheus-2.28.1.linux-amd64/console_libraries /etc/prometheus
# chown -R prome:prome /etc/prometheus/consoles
# chown -R prome:prome /etc/prometheus/console_libraries
# vim /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prome
Group=prome
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
# systemctl daemon-reload
# systemctl enable prometheus
# systemctl status prometheus
prometheus.service - Prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
Active: active (running) since Thu 2021-07-15 22:31:10 UTC; 3s ago
Process: 3949 ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus>
Main PID: 3949 (prometheus)
Tasks: 7
Memory: 13.8M
CPU: 470ms
CGroup: /system.slice/prometheus.service
# systemctl start prometheus
Access to UI with IP & port 9090
Install Telegraf
After installing Prometheus, next we need to install Telegraf
Ubuntu
# wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
# source /etc/lsb-release
# echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
# apt-get update && sudo apt-get install telegraf
# service telegraf start
CentOS
$ cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
$ sudo yum install telegraf
$ service telegraf start
Windows
Download ZIP file from InfluxData downloads page.
Extract downloaded ZIP file to C:\Program Files\InfluxData\Telegraf.
Open CMD and run:
> cd C:\Program Files\InfluxData\Telegraf
> .\telegraf.exe -config <path_to_telegraf.conf>
Or Install as Windows service:
> cd C:\Program Files\InfluxData\Telegraf
> .\telegraf.exe --service install
> .\telegraf.exe --service start
Configure Telegraf to monitor MongoDB
First you need to identify which IP and port your MongoDB you need to monitor is running
E.g: MongoDB is running on 10.10.0.4 port 27017
Modify /etc/telegraf/telegraf.d/mongodb.conf
Create /etc/telegraf/telegraf.d/mongodb.conf and modify it:
# vim /etc/telegraf/telegraf.d/mongodb.conf
[[inputs.mongodb]]
servers = [ "mongodb://10.10.0.4:27017" ]
gather_perdb_stats = true
gather_col_stats = true
interval = "10s"
[inputs.mongodb.ssl]
enabled = true
[inputs.mongodb.tags] # add any tag you want
host = "mongodb.local"
hostname = "mongodb.local:27017"
version = "4.4.0"
service = "mongodb"
[[outputs.prometheus_client]]
listen = ":9273" # Prometheus Exporter port
collectors_exclude = ["gocollector", "process"]
[outputs.prometheus_client.tagpass]
host = ["mongodb.local" ]
We have just configure Telegraf to read MongoDB metrics and expose to port 9273 for Prometheus to scrape.
The metrics we have just got contain:
mongodb
tags:
hostname
node_type
rs_name
fields:
active_reads (integer)
active_writes (integer)
aggregate_command_failed (integer)
aggregate_command_total (integer)
assert_msg (integer)
assert_regular (integer)
assert_rollovers (integer)
assert_user (integer)
assert_warning (integer)
available_reads (integer)
available_writes (integer)
commands (integer)
connections_available (integer)
connections_current (integer)
connections_total_created (integer)
count_command_failed (integer)
count_command_total (integer)
cursor_no_timeout_count (integer)
cursor_pinned_count (integer)
cursor_timed_out_count (integer)
cursor_total_count (integer)
delete_command_failed (integer)
delete_command_total (integer)
deletes (integer)
distinct_command_failed (integer)
distinct_command_total (integer)
document_deleted (integer)
document_inserted (integer)
document_returned (integer)
document_updated (integer)
find_and_modify_command_failed (integer)
find_and_modify_command_total (integer)
find_command_failed (integer)
find_command_total (integer)
flushes (integer)
flushes_total_time_ns (integer)
get_more_command_failed (integer)
get_more_command_total (integer)
getmores (integer)
insert_command_failed (integer)
insert_command_total (integer)
inserts (integer)
jumbo_chunks (integer)
latency_commands_count (integer)
latency_commands (integer)
latency_reads_count (integer)
latency_reads (integer)
latency_writes_count (integer)
latency_writes (integer)
member_status (string)
net_in_bytes_count (integer)
net_out_bytes_count (integer)
open_connections (integer)
operation_scan_and_order (integer)
operation_write_conflicts (integer)
page_faults (integer)
percent_cache_dirty (float)
percent_cache_used (float)
queries (integer)
queued_reads (integer)
queued_writes (integer)
repl_apply_batches_num (integer)
repl_apply_batches_total_millis (integer)
repl_apply_ops (integer)
repl_buffer_count (integer)
repl_buffer_size_bytes (integer)
repl_commands (integer)
repl_deletes (integer)
repl_executor_pool_in_progress_count (integer)
repl_executor_queues_network_in_progress (integer)
repl_executor_queues_sleepers (integer)
repl_executor_unsignaled_events (integer)
repl_getmores (integer)
repl_inserts (integer)
repl_lag (integer)
repl_network_bytes (integer)
repl_network_getmores_num (integer)
repl_network_getmores_total_millis (integer)
repl_network_ops (integer)
repl_queries (integer)
repl_updates (integer)
repl_oplog_window_sec (integer)
repl_state (integer)
resident_megabytes (integer)
state (string)
storage_freelist_search_bucket_exhausted (integer)
storage_freelist_search_requests (integer)
storage_freelist_search_scanned (integer)
tcmalloc_central_cache_free_bytes (integer)
tcmalloc_current_allocated_bytes (integer)
tcmalloc_current_total_thread_cache_bytes (integer)
tcmalloc_heap_size (integer)
tcmalloc_max_total_thread_cache_bytes (integer)
tcmalloc_pageheap_commit_count (integer)
tcmalloc_pageheap_committed_bytes (integer)
tcmalloc_pageheap_decommit_count (integer)
tcmalloc_pageheap_free_bytes (integer)
tcmalloc_pageheap_reserve_count (integer)
tcmalloc_pageheap_scavenge_count (integer)
tcmalloc_pageheap_total_commit_bytes (integer)
tcmalloc_pageheap_total_decommit_bytes (integer)
tcmalloc_pageheap_total_reserve_bytes (integer)
tcmalloc_pageheap_unmapped_bytes (integer)
tcmalloc_spinlock_total_delay_ns (integer)
tcmalloc_thread_cache_free_bytes (integer)
tcmalloc_total_free_bytes (integer)
tcmalloc_transfer_cache_free_bytes (integer)
total_available (integer)
total_created (integer)
total_docs_scanned (integer)
total_in_use (integer)
total_keys_scanned (integer)
total_refreshing (integer)
total_tickets_reads (integer)
total_tickets_writes (integer)
ttl_deletes (integer)
ttl_passes (integer)
update_command_failed (integer)
update_command_total (integer)
updates (integer)
uptime_ns (integer)
version (string)
vsize_megabytes (integer)
wtcache_app_threads_page_read_count (integer)
wtcache_app_threads_page_read_time (integer)
wtcache_app_threads_page_write_count (integer)
wtcache_bytes_read_into (integer)
wtcache_bytes_written_from (integer)
wtcache_pages_read_into (integer)
wtcache_pages_requested_from (integer)
wtcache_current_bytes (integer)
wtcache_max_bytes_configured (integer)
wtcache_internal_pages_evicted (integer)
wtcache_modified_pages_evicted (integer)
wtcache_unmodified_pages_evicted (integer)
wtcache_pages_evicted_by_app_thread (integer)
wtcache_pages_queued_for_eviction (integer)
wtcache_server_evicting_pages (integer)
wtcache_tracked_dirty_bytes (integer)
wtcache_worker_thread_evictingpages (integer)
commands_per_sec (integer, deprecated in 1.10; use commands))
cursor_no_timeout (integer, opened/sec, deprecated in 1.10; use cursor_no_timeout_count))
cursor_pinned (integer, opened/sec, deprecated in 1.10; use cursor_pinned_count))
cursor_timed_out (integer, opened/sec, deprecated in 1.10; use cursor_timed_out_count))
cursor_total (integer, opened/sec, deprecated in 1.10; use cursor_total_count))
deletes_per_sec (integer, deprecated in 1.10; use deletes))
flushes_per_sec (integer, deprecated in 1.10; use flushes))
getmores_per_sec (integer, deprecated in 1.10; use getmores))
inserts_per_sec (integer, deprecated in 1.10; use inserts))
net_in_bytes (integer, bytes/sec, deprecated in 1.10; use net_out_bytes_count))
net_out_bytes (integer, bytes/sec, deprecated in 1.10; use net_out_bytes_count))
queries_per_sec (integer, deprecated in 1.10; use queries))
repl_commands_per_sec (integer, deprecated in 1.10; use repl_commands))
repl_deletes_per_sec (integer, deprecated in 1.10; use repl_deletes)
repl_getmores_per_sec (integer, deprecated in 1.10; use repl_getmores)
repl_inserts_per_sec (integer, deprecated in 1.10; use repl_inserts))
repl_queries_per_sec (integer, deprecated in 1.10; use repl_queries))
repl_updates_per_sec (integer, deprecated in 1.10; use repl_updates))
ttl_deletes_per_sec (integer, deprecated in 1.10; use ttl_deletes))
ttl_passes_per_sec (integer, deprecated in 1.10; use ttl_passes))
updates_per_sec (integer, deprecated in 1.10; use updates))
mongodb_db_stats
tags:
db_name
hostname
fields:
avg_obj_size (float)
collections (integer)
data_size (integer)
index_size (integer)
indexes (integer)
num_extents (integer)
objects (integer)
ok (integer)
storage_size (integer)
type (string)
mongodb_col_stats
tags:
hostname
collection
db_name
fields:
size (integer)
avg_obj_size (integer)
storage_size (integer)
total_index_size (integer)
ok (integer)
count (integer)
type (string)
mongodb_shard_stats
tags:
hostname
fields:
in_use (integer)
available (integer)
created (integer)
refreshing (integer)
mongodb_top_stats
tags:
collection
fields:
total_time (integer)
total_count (integer)
read_lock_time (integer)
read_lock_count (integer)
write_lock_time (integer)
write_lock_count (integer)
queries_time (integer)
queries_count (integer)
get_more_time (integer)
get_more_count (integer)
insert_time (integer)
insert_count (integer)
update_time (integer)
update_count (integer)
remove_time (integer)
remove_count (integer)
commands_time (integer)
commands_count (integer)
Restart Telegraf to apply new configuration:
$ sudo systemctl restart telegraf
Configure Prometheus to scrape MongoDB Metrics
We will configure Prometheus to scrape MongoDB Metrics from exposed port 9273 by Telegraf Output Plugin above.
$ sudo vim /etc/prometheus/prometheus.yml
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'mongodb'
scrape_interval: 10s
scrape_timeout: 5s
metrics_path: "/metrics"
static_configs:
- targets: ['localhost:9273']
labels:
service: mongodb
metric_relabel_configs:
- source_labels: [__name__]
regex: "mongodb_(.+)"
action: keep
Restart Prometheus to apply new configuration
$ sudo systemctl restart prometheus
Access Prometheus UI to get metrics. The metrics will be like:
mongodb_active_reads{exported_service="mongodb",host="mongodb.local",hostname="mongodb.local:27017",instance="localhost:9273",job="mongodb",member_status="SEC",node_type="SEC",rs_name="atlas-dofij-shard-0",service="mongodb",version="4.2.15"} 1
mongodb_active_writes{exported_service="mongodb",host="mongodb.local",hostname="mongodb.local:27017",instance="localhost:9273",job="mongodb",service="mongodb",version="4.2.15"} 0
This solution support both standalone MongoDB installation & Mongo Atlas.
Comments