top of page
GeekGuy

Monitoring MongoDB with Telegraf and Prometheus

Updated: Jan 25

If you want to find a suitable solution for monitoring MongoDB that help you observe most of important MongoDB metrics.


Monitoring MongoDB with Telegraf and Prometheus

In this article, we will introduce the combination of Telegraf and Prometheus.

 

TL;DR:

 


First of all, you need to install Prometheus and Telegraf.

Install Prometheus

$ sudo su -
# useradd --no-create-home --shell /bin/false prome
# mkdir /etc/prometheus
# mkdir /var/lib/prometheus
# wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz
# tar -xzvf prometheus-2.28.1.linux-amd64.tar.gz
# cp prometheus-2.28.1.linux-amd64/prometheus /usr/local/bin/
# cp prometheus-2.28.1.linux-amd64/promtool /usr/local/bin/
# chown prome:prome /usr/local/bin/prometheus
# chown prome:prome /usr/local/bin/promtool
# cp -r prometheus-2.28.1.linux-amd64/consoles /etc/prometheus
# cp -r prometheus-2.28.1.linux-amd64/console_libraries /etc/prometheus
# chown -R prome:prome /etc/prometheus/consoles
# chown -R prome:prome /etc/prometheus/console_libraries
# vim /etc/prometheus/prometheus.yml

global:
 scrape_interval: 15s
scrape_configs:
 - job_name: 'prometheus'
 scrape_interval: 5s
 static_configs:
 - targets: ['localhost:9090']

# vim /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prome
Group=prome
Type=simple
ExecStart=/usr/local/bin/prometheus \
 --config.file /etc/prometheus/prometheus.yml \
 --storage.tsdb.path /var/lib/prometheus/ \
 --web.console.templates=/etc/prometheus/consoles \
 --web.console.libraries=/etc/prometheus/console_libraries
[Install]

# systemctl daemon-reload
# systemctl enable prometheus
# systemctl status prometheus

prometheus.service - Prometheus
 Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
 Active: active (running) since Thu 2021-07-15 22:31:10 UTC; 3s ago
 Process: 3949 ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus>
 Main PID: 3949 (prometheus)
 Tasks: 7
 Memory: 13.8M
 CPU: 470ms
 CGroup: /system.slice/prometheus.service
 
 # systemctl start prometheus

Access to UI with IP & port 9090


Monitoring MongoDB with Telegraf and Prometheus

Install Telegraf

After installing Prometheus, next we need to install Telegraf

Ubuntu

# wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
# source /etc/lsb-release
# echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
# apt-get update && sudo apt-get install telegraf
# service telegraf start

CentOS

$ cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
$ sudo yum install telegraf
$ service telegraf start

Windows

Download ZIP file from InfluxData downloads page.

Extract downloaded ZIP file to C:\Program Files\InfluxData\Telegraf.

Open CMD and run:

> cd C:\Program Files\InfluxData\Telegraf
> .\telegraf.exe -config <path_to_telegraf.conf>

Or Install as Windows service:


> cd C:\Program Files\InfluxData\Telegraf
> .\telegraf.exe --service install
> .\telegraf.exe --service start

Configure Telegraf to monitor MongoDB

First you need to identify which IP and port your MongoDB you need to monitor is running

E.g: MongoDB is running on 10.10.0.4 port 27017

Modify /etc/telegraf/telegraf.d/mongodb.conf

Create /etc/telegraf/telegraf.d/mongodb.conf and modify it:



# vim /etc/telegraf/telegraf.d/mongodb.conf

[[inputs.mongodb]]
  servers = [ "mongodb://10.10.0.4:27017" ]
  gather_perdb_stats = true
  gather_col_stats = true
  interval = "10s"
  [inputs.mongodb.ssl]
    enabled = true
  [inputs.mongodb.tags] # add any tag you want
    host = "mongodb.local"
    hostname = "mongodb.local:27017"
    version = "4.4.0"
    service = "mongodb"
[[outputs.prometheus_client]]
  listen = ":9273" # Prometheus Exporter port
  collectors_exclude = ["gocollector", "process"]
  [outputs.prometheus_client.tagpass]
    host = ["mongodb.local" ]

We have just configure Telegraf to read MongoDB metrics and expose to port 9273 for Prometheus to scrape.

The metrics we have just got contain:

  • mongodb

    • tags:

      • hostname

      • node_type

      • rs_name

    • fields:

      • active_reads (integer)

      • active_writes (integer)

      • aggregate_command_failed (integer)

      • aggregate_command_total (integer)

      • assert_msg (integer)

      • assert_regular (integer)

      • assert_rollovers (integer)

      • assert_user (integer)

      • assert_warning (integer)

      • available_reads (integer)

      • available_writes (integer)

      • commands (integer)

      • connections_available (integer)

      • connections_current (integer)

      • connections_total_created (integer)

      • count_command_failed (integer)

      • count_command_total (integer)

      • cursor_no_timeout_count (integer)

      • cursor_pinned_count (integer)

      • cursor_timed_out_count (integer)

      • cursor_total_count (integer)

      • delete_command_failed (integer)

      • delete_command_total (integer)

      • deletes (integer)

      • distinct_command_failed (integer)

      • distinct_command_total (integer)

      • document_deleted (integer)

      • document_inserted (integer)

      • document_returned (integer)

      • document_updated (integer)

      • find_and_modify_command_failed (integer)

      • find_and_modify_command_total (integer)

      • find_command_failed (integer)

      • find_command_total (integer)

      • flushes (integer)

      • flushes_total_time_ns (integer)

      • get_more_command_failed (integer)

      • get_more_command_total (integer)

      • getmores (integer)

      • insert_command_failed (integer)

      • insert_command_total (integer)

      • inserts (integer)

      • jumbo_chunks (integer)

      • latency_commands_count (integer)

      • latency_commands (integer)

      • latency_reads_count (integer)

      • latency_reads (integer)

      • latency_writes_count (integer)

      • latency_writes (integer)

      • member_status (string)

      • net_in_bytes_count (integer)

      • net_out_bytes_count (integer)

      • open_connections (integer)

      • operation_scan_and_order (integer)

      • operation_write_conflicts (integer)

      • page_faults (integer)

      • percent_cache_dirty (float)

      • percent_cache_used (float)

      • queries (integer)

      • queued_reads (integer)

      • queued_writes (integer)

      • repl_apply_batches_num (integer)

      • repl_apply_batches_total_millis (integer)

      • repl_apply_ops (integer)

      • repl_buffer_count (integer)

      • repl_buffer_size_bytes (integer)

      • repl_commands (integer)

      • repl_deletes (integer)

      • repl_executor_pool_in_progress_count (integer)

      • repl_executor_queues_network_in_progress (integer)

      • repl_executor_queues_sleepers (integer)

      • repl_executor_unsignaled_events (integer)

      • repl_getmores (integer)

      • repl_inserts (integer)

      • repl_lag (integer)

      • repl_network_bytes (integer)

      • repl_network_getmores_num (integer)

      • repl_network_getmores_total_millis (integer)

      • repl_network_ops (integer)

      • repl_queries (integer)

      • repl_updates (integer)

      • repl_oplog_window_sec (integer)

      • repl_state (integer)

      • resident_megabytes (integer)

      • state (string)

      • storage_freelist_search_bucket_exhausted (integer)

      • storage_freelist_search_requests (integer)

      • storage_freelist_search_scanned (integer)

      • tcmalloc_central_cache_free_bytes (integer)

      • tcmalloc_current_allocated_bytes (integer)

      • tcmalloc_current_total_thread_cache_bytes (integer)

      • tcmalloc_heap_size (integer)

      • tcmalloc_max_total_thread_cache_bytes (integer)

      • tcmalloc_pageheap_commit_count (integer)

      • tcmalloc_pageheap_committed_bytes (integer)

      • tcmalloc_pageheap_decommit_count (integer)

      • tcmalloc_pageheap_free_bytes (integer)

      • tcmalloc_pageheap_reserve_count (integer)

      • tcmalloc_pageheap_scavenge_count (integer)

      • tcmalloc_pageheap_total_commit_bytes (integer)

      • tcmalloc_pageheap_total_decommit_bytes (integer)

      • tcmalloc_pageheap_total_reserve_bytes (integer)

      • tcmalloc_pageheap_unmapped_bytes (integer)

      • tcmalloc_spinlock_total_delay_ns (integer)

      • tcmalloc_thread_cache_free_bytes (integer)

      • tcmalloc_total_free_bytes (integer)

      • tcmalloc_transfer_cache_free_bytes (integer)

      • total_available (integer)

      • total_created (integer)

      • total_docs_scanned (integer)

      • total_in_use (integer)

      • total_keys_scanned (integer)

      • total_refreshing (integer)

      • total_tickets_reads (integer)

      • total_tickets_writes (integer)

      • ttl_deletes (integer)

      • ttl_passes (integer)

      • update_command_failed (integer)

      • update_command_total (integer)

      • updates (integer)

      • uptime_ns (integer)

      • version (string)

      • vsize_megabytes (integer)

      • wtcache_app_threads_page_read_count (integer)

      • wtcache_app_threads_page_read_time (integer)

      • wtcache_app_threads_page_write_count (integer)

      • wtcache_bytes_read_into (integer)

      • wtcache_bytes_written_from (integer)

      • wtcache_pages_read_into (integer)

      • wtcache_pages_requested_from (integer)

      • wtcache_current_bytes (integer)

      • wtcache_max_bytes_configured (integer)

      • wtcache_internal_pages_evicted (integer)

      • wtcache_modified_pages_evicted (integer)

      • wtcache_unmodified_pages_evicted (integer)

      • wtcache_pages_evicted_by_app_thread (integer)

      • wtcache_pages_queued_for_eviction (integer)

      • wtcache_server_evicting_pages (integer)

      • wtcache_tracked_dirty_bytes (integer)

      • wtcache_worker_thread_evictingpages (integer)

      • commands_per_sec (integer, deprecated in 1.10; use commands))

      • cursor_no_timeout (integer, opened/sec, deprecated in 1.10; use cursor_no_timeout_count))

      • cursor_pinned (integer, opened/sec, deprecated in 1.10; use cursor_pinned_count))

      • cursor_timed_out (integer, opened/sec, deprecated in 1.10; use cursor_timed_out_count))

      • cursor_total (integer, opened/sec, deprecated in 1.10; use cursor_total_count))

      • deletes_per_sec (integer, deprecated in 1.10; use deletes))

      • flushes_per_sec (integer, deprecated in 1.10; use flushes))

      • getmores_per_sec (integer, deprecated in 1.10; use getmores))

      • inserts_per_sec (integer, deprecated in 1.10; use inserts))

      • net_in_bytes (integer, bytes/sec, deprecated in 1.10; use net_out_bytes_count))

      • net_out_bytes (integer, bytes/sec, deprecated in 1.10; use net_out_bytes_count))

      • queries_per_sec (integer, deprecated in 1.10; use queries))

      • repl_commands_per_sec (integer, deprecated in 1.10; use repl_commands))

      • repl_deletes_per_sec (integer, deprecated in 1.10; use repl_deletes)

      • repl_getmores_per_sec (integer, deprecated in 1.10; use repl_getmores)

      • repl_inserts_per_sec (integer, deprecated in 1.10; use repl_inserts))

      • repl_queries_per_sec (integer, deprecated in 1.10; use repl_queries))

      • repl_updates_per_sec (integer, deprecated in 1.10; use repl_updates))

      • ttl_deletes_per_sec (integer, deprecated in 1.10; use ttl_deletes))

      • ttl_passes_per_sec (integer, deprecated in 1.10; use ttl_passes))

      • updates_per_sec (integer, deprecated in 1.10; use updates))

  • mongodb_db_stats

    • tags:

      • db_name

      • hostname

    • fields:

      • avg_obj_size (float)

      • collections (integer)

      • data_size (integer)

      • index_size (integer)

      • indexes (integer)

      • num_extents (integer)

      • objects (integer)

      • ok (integer)

      • storage_size (integer)

      • type (string)

  • mongodb_col_stats

    • tags:

      • hostname

      • collection

      • db_name

    • fields:

      • size (integer)

      • avg_obj_size (integer)

      • storage_size (integer)

      • total_index_size (integer)

      • ok (integer)

      • count (integer)

      • type (string)

  • mongodb_shard_stats

    • tags:

      • hostname

    • fields:

      • in_use (integer)

      • available (integer)

      • created (integer)

      • refreshing (integer)

  • mongodb_top_stats

    • tags:

      • collection

    • fields:

      • total_time (integer)

      • total_count (integer)

      • read_lock_time (integer)

      • read_lock_count (integer)

      • write_lock_time (integer)

      • write_lock_count (integer)

      • queries_time (integer)

      • queries_count (integer)

      • get_more_time (integer)

      • get_more_count (integer)

      • insert_time (integer)

      • insert_count (integer)

      • update_time (integer)

      • update_count (integer)

      • remove_time (integer)

      • remove_count (integer)

      • commands_time (integer)

      • commands_count (integer)

Restart Telegraf to apply new configuration:

$ sudo systemctl restart telegraf

Configure Prometheus to scrape MongoDB Metrics

We will configure Prometheus to scrape MongoDB Metrics from exposed port 9273 by Telegraf Output Plugin above.

$ sudo vim /etc/prometheus/prometheus.yml

global:
  scrape_interval: 10s
scrape_configs:
  - job_name: 'mongodb'
    scrape_interval: 10s
    scrape_timeout:  5s
    metrics_path: "/metrics"
    static_configs:
    - targets: ['localhost:9273']
      labels:
        service: mongodb
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: "mongodb_(.+)"
      action: keep

Restart Prometheus to apply new configuration

$ sudo systemctl restart prometheus

Access Prometheus UI to get metrics. The metrics will be like:

mongodb_active_reads{exported_service="mongodb",host="mongodb.local",hostname="mongodb.local:27017",instance="localhost:9273",job="mongodb",member_status="SEC",node_type="SEC",rs_name="atlas-dofij-shard-0",service="mongodb",version="4.2.15"} 1
mongodb_active_writes{exported_service="mongodb",host="mongodb.local",hostname="mongodb.local:27017",instance="localhost:9273",job="mongodb",service="mongodb",version="4.2.15"} 0

This solution support both standalone MongoDB installation & Mongo Atlas.





803 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Stationary photo

Be the first to know

Subscribe to our newsletter to receive news and updates.

Thanks for submitting!

Follow us
bottom of page