Getting started

Create a team for your Crunchy Data warehouse

Before you create your warehouse cluster, you'll need to decide which Team it belongs in. You can create a new Team for your warehouse clusters or provision them in existing Teams.

Creating a new warehouse

Provision a new warehouse cluster inside your team by clicking the down arrow next to the Create Cluster button. The option to create an Warehouse cluster will appear in the dropdown:

Select the region and instance size for the cluster:

Using a database other than postgres

Our system installs the Crunchy Data Warehouse extensions from the system template to the postgres database. If you're using additional databases, add the extensions with:

CREATE EXTENSION crunchy_data_warehouse CASCADE;

Upgrading from Crunchy Bridge to Crunchy Data Warehouse

If you currently have a Crunchy Bridge standard or memory instance type, you can change it to a Crunchy Data Warehouse instance by scheduling a resize and specifying the desired warehouse cluster type and size.

Block storage

When provisioning a new warehouse cluster, you'll be asked to choose the amount of block storage to be created for use on the local Postgres instance. This will be used for storing the PostgreSQL database directory and heap tables.

Managed object storage

In addition to local storage, Crunchy Data Warehouse comes with object storage in Amazon S3 built into the appliance. This is the default location for storing the managed Iceberg tables. You also have the option to provide your own S3 bucket for storage, provided that it is in the same AWS region as your warehouse cluster.

See Iceberg tables for specific commands you can use to work with Iceberg tables. You can also set up external projects to connect to Iceberg.

Warehouse object storage is billed based on usage ($0.046/GB/month) and is infinite.

Creating and querying Iceberg tables

Crunchy Data Warehouse comes with specific syntax for creating Iceberg tables. Iceberg tables can be created from data inside your database or external sources. Here's example syntax of the using iceberg for loading data from an external URL:

-- Convert a file directly into an Iceberg table
create table taxi_yellow ()
using iceberg
with (load_from = 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet');

To query the Iceberg table, nothing else is needed. Query the Iceberg table with regular SQL syntax using the table name from the create statement:

select * from taxi_yellow limit 10;

Regions

The region of the managed object storage for the Iceberg tables will automatically be set to the same region as your cluster. If you want to specify a different S3 bucket location for Iceberg tables, it will have to be in the same region as well. You can import, export and query data lake files in different S3 regions, although that may result in network charges. In such cases, the warehouse will automatically detect regions and configure S3 access accordingly.

If you want to pass in a special region parameter when specifying the URL of a bucket, you can add the optional ?s3_region=[bucket_region} parameter, but in the vast majority of cases you should not need to do this.