Metrics and monitoring

Dashboard metrics

You can find system metrics on the Crunchy Bridge Dashboard under the Metrics tab.

Info

Metrics are automatically available on any cluster provisioned after September 28, 2022.

If your Metrics graphs are blank, Metrics are likely not enabled for the cluster. You can enable them by initiating a Cluster Refresh maintenance to replace the instance.

Metrics monitored and displayed in the Dashboard include CPU, IOPS, Load average, Memory, and Postgres Connections. Use the expander arrows to view each metric individually in more detail.

CPU

This graph displays processing load broken out into system load, user load, iowait, and percent CPU steal. System CPU time reflects operating system (i.e. kernel) functions while user time reflects processing in the actual running instance of Postgres.

Hobby-tier burst credit exhaustion

Info

Note: Hobby-tier plans have burstable vCPUs. This means that you can temporarily use 20x more CPU than your instance is allotted, but the CPU will be throttled back to baseline when all burst credits are depleted. This is likely to manifest as a huge drop in performance on a hobby-tier database.

If you're having performance challenges on a hobby-tier cluster, look for spikes in percent steal in the CPU graph that typically follow after a spike in another measure of CPU load. High percent CPU steal would indicate CPU burst credit exhaustion:

Burst credits will accumulate again over time, but you may need to upgrade your cluster to achieve more consistent performance. Review plans and pricing to determine which tier is right for your use case.

Note: If you don't see % CPU steal in your cluster Metrics, you may need to refresh your cluster to receive the latest Crunchy Bridge features.

IOPS

IOPS (input/output per second) is available in I/O RTPS (read transactions per second) and I/O WTPS (write transactions per second). IOPS capacity varies by plan. The Plans and Pricing page shows the specifications of each plan.

To determine which queries are contributing to IOPS usage, look for ones that use a lot of disk. Crunchy Bridge runs pg_stat_statements by default on all instances, so statistics are available to review.

You can query pg_stat_statements to look for a low hit rate on shared blocks, which would indicate that more data is read proportionally from disk than is being provided by cache: low shared_blks_hit / (shared_blks_hit + shared_blks_read).

Here's an example query you can use to find queries with a low hit rate:

SELECT
	pd.datname AS DB_Name
	,pss.rows AS Total_Row_Count
	,(pss.total_exec_time / 1000 / 60) AS Total_Exec_Mins
	,((pss.total_exec_time / 1000 / 60) / calls) as Total_Avg_Exec_Time
	,calls
	,shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0)::float AS Hit_Rate
	,queryid
FROM pg_stat_statements AS pss
INNER JOIN pg_database AS pd
	ON pss.dbid=pd.oid
WHERE calls > 1000
ORDER BY 6
LIMIT 10;

To dig into a query shown in the output, you can run the following statement with a given queryid:

select query from pg_stat_statements where queryid = <queryid>;

Query and index tuning can be a big help in increasing the hit rate on the cache and thereby reducing IOPS usage. For a deeper dive, check out Query Optimization in Postgres with pg_stat_statements on the blog.

Load average

Load average shows average CPU load over the indicated time period. A load average equal to your vCPU count indicates full utilization of all CPUs. A load average in excess of your CPU count means that processes had to wait for CPU time, with higher values meaning more time spent waiting.

Number of vCPUs varies by plan. Check the Plans and Pricing page for details about specific plans. If you are consistently seeing high load average you should look at tuning expensive queries or consider upgrading to a larger plan.

Memory

This shows the amount of process memory and the amount of swap you are using based on the plan you have provisioned. Check the Plans and Pricing page for details about specific plans.

Note that swap usage is not necessarily a bad thing. However, if you’re often needing swap and your baseline memory usage is high, you likely need additional memory.

Postgres uses memory at a few different levels. If you're interested in the details, check out our blog post on data storage and flow.

Additionally, Postgres memory usage can be tricky to interpret. With regard to Postgres memory utilization, there are three main things at play.

Process Memory - This is memory being taken up by each backend process for its own use, including:

  • the main Postmaster process
  • utility processes (checkpointer, archiver, autovacuum launcher, etc)
  • any client processes, i.e. those executing query statements

These processes allocate (by default) 4 MB each for process memory, but they also reserve additional memory based on parameters like work_mem, maintanence_work_mem, and temp_buffer.

Shared Memory - This is memory used by all processes for data and transaction log caching. That is the sum of shared_buffers, wal_buffers, CLOG_buffers, etc. By default we allocate 25% of system memory to shared_buffers.

Kernel Memory - Memory not being used by Postgres processes is generally used by the kernel for disk cache. The kernel is (generally) smarter about what to keep and what to push to disk.

Info

The memory graph on your Crunchy Bridge dashboard currently shows a memory_used metric which includes all memory allocated by processes. The PostgreSQL server process allocates various buffers shared by all processes, so this value includes the sum of all the Process Memory and Shared Memory described above.

On Standard and Memory instances this will usually account for 25%-30% of memory usage, although it may be larger if you have a high connection count or query activity which consumes a lot of memory. However on Hobby instances this process memory will represent a larger fraction of overall memory usage, and it's not uncommon to see this value consistently reporting 80-85% of memory in use.

The important thing to note is that Linux makes intelligent use of available memory, using it to reduce load on disks. If processes need the memory, the OS will give up some of its disk cache.

Postgres connections

Shows the number of connected Postgres clients over time. The Y axis uses your existing setting for max_connections. This can be altered by updating the max_connections parameters.

Crunchy Bridge offers built in connection pooling. See Your Guide to Connection Management in Postgres on our blog for more details.