Disaster Recovery and High Availability
Ensuring your database is safe and available is a key tenet for Crunchy Bridge. Disaster recovery is automatically enabled for all databases, ensuring your database can be recovered in the event of a failure or disaster. High availability can be enabled during provisioning for your database at an additional charge.
Disaster recovery is the process of archiving your data to an additional data store that is typically cold storage. The cold storage offers 11 9s (99.999999999% of durability) providing strong reliability for not losing data. You do not have to do anything to enable disaster recovery on your Crunchy Bridge database, as it is automatically enabled on all databases.
Disaster recovery is accomplished by capturing a base backup from Postgres and streaming the WAL (write-ahead-log) continuously. In the event of a disaster, we will automatically restore from the most recent base backup and then replay the WAL to the most recent moment in time.
Disaster recovery times are not a guaranteed time as the amount of WAL that must be restored correlates to your transaction volume, not your database size.
For a rough guidance: disaster recovery takes about 1 hr per 100 GB of data, though it can take longer at times if you have a high throughput transaction database.
While disaster recovery is intended to ensure resiliency in an event of a failure, high availability is an additional mechanism to reduce the downtime in the event of a failure.
You can enable high availability during provisioning time of your database. When you provision high availability, we provision a standby of the same instance type. We continuously stream the transactions from the primary to the standby as they are committed. This provides an up-to-date copy that can be promoted to the primary in the event of a failure.
In order to provide high availability, the Crunchy Bridge control plane continuously monitors the health of both your primary and standby. If a health check fails against your primary, we run a deeper set of checks. If the primary does prove to be unhealthy, we initiate a failover to your standby.
During the failover, your connections will drop, so it is important you have reconnection logic within your database library (most ORMs and drivers support this out of the box). Your connections should automatically reconnect in a matter of seconds once the failover has completed. During a failover your connection string stays the same and does not change.