Bloat and vacuum

What is bloat

At a high level, bloat refers to the accumulation of dead and unused rows in a database, resulting in disk space consumption and performance degradation. It primarily affects databases with high transaction workloads. To comprehend bloat, it is essential to grasp two key topics in Postgres: multiversion concurrency control (MVCC) and vacuuming.

Multiversion Concurrency Control (MVCC) is a fundamental process utilized in Postgres and other relational database systems to enable concurrent access to data while maintaining transaction isolation. MVCC allows multiple transactions to read and write data simultaneously without blocking or interfering with one another. It achieves this by generating multiple versions of each row or tuple, allowing each transaction to access a snapshot of the data as it existed at the start of the transaction. When a transaction reads a row, it retrieves the version that was valid at the start, ensuring a consistent snapshot even if other transactions are concurrently modifying or deleting data.

MVCC creates multiple versions of a row to handle concurrent transactions. When a row is updated or deleted, a new version is created, while the old version is marked as dead. These dead rows are not immediately removed from the table to preserve transactional integrity and ensure data consistency during concurrent operations.

To reclaim the disk space occupied by dead rows, Postgres periodically performs vacuuming. This process identifies and eliminates dead rows from the table, freeing up the disk space for reuse. Bloat occurs when high transactions generate a substantial number of dead rows between vacuum processes.

Measuring bloat

We provide a percentage of bloat to show the amount of space taken up by dead rows compared to the total size of the table or index. The bloat displayed is an estimate or approximation. If you need a more precise view, you can use the extension pgstattuple, though this can be a resource intensive operation.

Low Bloat: Bloat below 50% is generally considered acceptable and does not normally require action. It is still recommended to monitor bloat for further growth and check vacuum configurations and settings.

High Bloat: Bloat above 50% suggests a high level of bloat that can begin to severely impact performance and disk space utilization. You may need to consider action, such as performing a manual vacuum operation, if you notice slow queries or performance issues.

We do not display a bloat percentage for tables under 1GB or with a bloat percentage less than 10%.

Vacuum operations

Autovacuum

Postgres has an autovacuum process that runs in the background and automatically performs vacuum operations on tables and indexes when necessary. Autovacuum is enabled and configured by default for all Crunchy Bridge clusters. The settings out of the box are:

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

For large tables with lots of turnover you may want to tune the autovacuum_vacuum_scale_factor so that Autovaccum is triggered more often. A good rule of thumb is when ~2M rows deleted/updated. You can use the current row count to find an appropriate scale factor. For example if you have a table with 20M rows, you would want to vacuum this at 2M rows of dead tuples, so this would be a scale factor of .1.

ALTER TABLE {tablename}
SET autovacuum_vacuum_scale_factor = 0.1;

As your table grows, the scale factor will need to be adjusted to an even smaller number.

Manual vacuum

In addition to autovacuum, manual vacuum operations can be performed on specific tables or indexes as needed. Manual vacuuming allows for more control over the process and can useful in scenarios where immediate reclamation of space is needed or when specific tables or indexes require attention. PostgreSQL provides different types of manual vacuum operations. The two commonly used methods are:

VACUUM: This method reclaims space and updates statistics. It is suitable for most general maintenance scenarios.

VACUUM FULL: This method performs a more aggressive cleanup by rewriting the entire table or index. It requires exclusive table locks and can be resource-intensive, so use it carefully.

To perform a manual full vacuum on a table:

VACUUM FULL table_name;

Vacuum statistics in insights

The insights panel also includes vacuum statistics. You can check on the table names, the last vacuum and last autovacuum. You can also get insights on how many dead rows exist, when vacuum last cleaned up dead rows, and more.

Vacuum statistics include:

  • Table name
  • Last vacuum: last time a manual vacuum operation was run
  • Last autovacuum: last time autovacuum ran
  • Row count: total row count for the table
  • Dead row count: number of un-vacuumed / dead rows in the table presently
  • Scale factor: the current scale factor set in the autovacuum settings
  • Threshold: the total number of rows, using the scale factor, that would require a vacuum operation
  • Should vacuum: if you should manually vacuum the the table

Get support anytime

Worried about bloat and vacuum on your database? Get in touch with support anytime for assistance.