Metrics and Monitoring
Dashboard Metrics
You can find system metrics on the Crunchy Bridge Dashboard under the Metrics tab.
Info
Metrics are automatically available on any cluster provisioned after September 28, 2022.
If your Metrics graphs are blank, Metrics are likely not enabled for the cluster. You can enable them by initiating a Cluster Refresh maintenance to replace the instance.
Metrics monitored and displayed in the Dashboard include CPU, IOPS, Load average, Memory, and Postgres Connections. Use the expander arrows to view each metric individually in more detail.
CPU
This graph displays processing load broken out into system load and user load. System CPU time reflects operating system (i.e. kernel) functions while user time reflects processing in the actual running instance of Postgres.
Info
Note: Hobby-tier plans have burstable vCPUs. This means that you can temporarily use 20x more CPU than your instance is allotted, but the CPU will be throttled back to baseline when all burst credits are depleted. This is likely to manifest as a huge drop in performance on a hobby-tier database.
We're currently unable to display a cluster's burst balance in the Dashboard, but you can always reach out to support if you'd like to discuss burst balance or instance performance.
IOPS
IOPS (input/output per second) is available in I/O RTPS (read transactions per second) and I/O WTPS (write transactions per second). IOPS capacity varies by plan. The Plans and Pricing page shows the specifications of each plan.
To determine which queries are contributing to IOPS usage, look for ones that
use a lot of disk. Crunchy Bridge runs pg_stat_statements
by default on all
instances, so statistics are available to review.
You can query pg_stat_statements
to look for a low hit rate on shared blocks,
which would indicate that more data is read proportionally from disk than is
being provided by cache: low shared_blks_hit
/ (shared_blks_hit
+
shared_blks_read
).
Here's an example query you can use to find queries with a low hit rate:
SELECT
pd.datname AS DB_Name
,pss.rows AS Total_Row_Count
,(pss.total_exec_time / 1000 / 60) AS Total_Exec_Mins
,((pss.total_exec_time / 1000 / 60) / calls) as Total_Avg_Exec_Time
,calls
,shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0)::float AS Hit_Rate
,queryid
FROM pg_stat_statements AS pss
INNER JOIN pg_database AS pd
ON pss.dbid=pd.oid
WHERE calls > 1000
ORDER BY 6
LIMIT 10;
To dig into a query shown in the output, you can run the following statement
with a given queryid
:
select query from pg_stat_statements where queryid = <queryid>;
Query and index tuning can be a big help in increasing the hit rate on the cache
and thereby reducing IOPS usage. For a deeper dive, check out
Query Optimization in Postgres with pg_stat_statements
on the blog.
Load average
Load average shows average CPU load over the indicated time period. A load average equal to your vCPU count indicates full utilization of all CPUs. A load average in excess of your CPU count means that processes had to wait for CPU time, with higher values meaning more time spent waiting.
Number of vCPUs varies by plan. Check the Plans and Pricing page for details about specific plans. If you are consistently seeing high load average you should look at tuning expensive queries or consider upgrading to a larger plan.
Memory
This shows the amount of process memory and the amount of swap you are using based on the plan you have provisioned. Check the Plans and Pricing page for details about specific plans.
Note that swap usage is not necessarily a bad thing. However, if you’re often needing swap and your baseline memory usage is high, you likely need additional memory.
Postgres uses memory at a few different levels. If you're interested in the details, check out our blog post on data storage and flow.
Additionally, Postgres memory usage can be tricky to interpret. With regard to Postgres memory utilization, there are three main things at play.
Process Memory - This is memory being taken up by each backend process for its own use, including:
- the main Postmaster process
- utility processes (checkpointer, archiver, autovacuum launcher, etc)
- any client processes, i.e. those executing query statements
These processes allocate (by default) 4 MB each for process memory, but they
also reserve additional memory based on parameters like work_mem
,
maintanence_work_mem
, and temp_buffer
.
Shared Memory - This is memory used by all processes for data and
transaction log caching. That is the sum of shared_buffers
, wal_buffers
,
CLOG_buffers
, etc. By default we allocate 25% of system memory to
shared_buffers
.
Kernel Memory - Memory not being used by Postgres processes is generally used by the kernel for disk cache. The kernel is (generally) smarter about what to keep and what to push to disk.
Info
The memory graph on your Crunchy Bridge dashboard currently shows a memory_used
metric which includes all memory allocated by processes. The PostgreSQL server process allocates various buffers shared by all processes, so this value includes the sum of all the Process Memory and Shared Memory described above.
On Standard and Memory instances this will usually account for 25%-30% of memory usage, although it may be larger if you have a high connection count or query activity which consumes a lot of memory. However on Hobby instances this process memory will represent a larger fraction of overall memory usage, and it's not uncommon to see this value consistently reporting 80-85% of memory in use.
The important thing to note is that Linux makes intelligent use of available memory, using it to reduce load on disks. If processes need the memory, the OS will give up some of its disk cache.
Postgres Connections
Shows the number of connected Postgres clients over time. The Y axis uses your
existing setting for max_connections
. This can be altered with
ALTER SYSTEM SET max_connections
.
Crunchy Bridge offers built in connection pooling. See Your Guide to Connection Management in Postgres on our blog for more details.
Checking metrics with pg_proctab
pg_proctab
is a set of stored functions installed as a Postgres extension that
will let you access operating system statistics from the underlying server, such
as I/O, processor load, and memory usage.
You can enable pg_proctab
for Crunchy Bridge by running:
CREATE EXTENSION pg_proctab;
Once you've enable the extension you have access to new functions to monitor system metrics.
System load
You can query load using pg_proctab
by running:
SELECT * FROM pg_loadavg()
pg_proctab
will provide you with:
load1
— load average of last minuteload5
— load average of last 5 minutesload15
— load average of last 15 minuteslast_pid
— last PID running
Load is going to give you a number relative to the number of CPUs you have, so you’ll need to know the total number of cores/virtual CPUs available. As the load numbers get closer to the number of CPUs you have, you are running on the high end of load. On a single-core machine, you want the number below 1.0. Or for example, if you have 4 cores, you want the number to be below 4.0.
The below query will give you your percentage of load if you have a 4-core setup.
SELECT load15/4*100
FROM pg_loadavg();
Memory usage
To query memory usage you can run:
SELECT *
FROM pg_memusage();
The above will return:
memused
memfree
memshared
membuffers
memcached
swapused
swapfree
swapcached
Memory usage will show memory used, free memory, memory shared, swap used and free. If you start regularly seeing swap used, it's time to look at getting a provision with additional memory.
Here’s an easy one to check the free memory
SELECT pg_size_pretty(memfree*1024)
FROM pg_memusage();
CPU time and I/O
To monitor CPU and I/O you can run:
SELECT *
FROM pg_cputime();
This is going to get you:
- user nice
- system idle
- iowait
User will give you all the non-system programs using cpu, system is the other system/kernel cpu process, and idle is the idle/free processes. So you can get pretty close to getting CPU usage by totaling idle, user, and system and diving by idle. The below example gives a sample of that:
SELECT idle/(idle + "user" + system)*100
FROM pg_cputime();