Analytics

Crunchy Bridge for Analytics is a new Crunchy Bridge cluster type that lets you query and interact with your object storage using Postgres. This analytics engine is baked carefully into existing PostgreSQL commands via extensions that add a vectorized parallel query engine. With Bridge for Analytics, you can easily set up foreign tables that point directly to Parquet, CSV, or JSON files in object storage, without having to specify column metadata, and run fast analytical queries.

Included in the Analytics documentation is:

Crunchy Bridge for Analytics Overview

Crunchy Bridge for Analytics instances have all the existing benefits of Crunchy Bridge and powerful new analytics capabilities such as:

  • Efficiently and easily perform analytical queries on Parquet, CSV, and line delimited JSON files stored in data lakes (e.g. object storage like s3).
  • Data import/export into object storage using conventional PostgreSQL commands (COPY, CREATE TABLE).
  • High performance, parallel, vectorized query engine with a local caching layer.

Crunchy Bridge for Analytics streamlines the user experience of various analytics workloads, minimizing the need for third party analytics and data movement tools can also bring cost efficiencies. A few example scenarios showcasing the benefits of Crunchy Bridge for Analytics are:

  • Analytics on large historical data sets. Easily copy and compress historical data to Parquet format files on cost effective cloud storage, then run high performance analytical queries against them. Currently Bridge Analytics storage is append-only.
  • Data Lake export and import. Bridge Analytics lets you export regular Postgres tables or query results to your data lake in cloud object storage system using a familiar COPY syntax, supporting various file formats and compression types. This facilitates easy sharing of operational data with other teams and tools that regularly access your data lake. You can also import data from your data lake with automatic column detection. CSV, newline delimited JSON, and Parquet files are all supported.
  • Integration of operational and analytics systems. For additional analytical insights, join historical data stored efficiently in cloud object storage to regular Postgres tables.

The Crunchy Bridge for Analytics instance connects to cloud object storage in S3 (Google Cloud Storage and Azure Blob Storage are in development). Data can be added to object storage with COPY commands or migrated to the database from object storage. Efficient OLAP queries execute directly against the Postgres database engine while connected to foreign tables in the object storage.