Storage
Managed storage
Crunchy Data Warehouse comes with object storage built into the appliance. Files
are written to managed storage with the using iceberg
syntax (see
Iceberg tables for details). Warehouse object storage is
billed based on usage ($0.046/gb/month) and is infinite.
Files in your storage can be accessed from the cluster overview screen:
Note that the name of your storage location will be 'database name'/'schema
name'/'table name'/'table OID'. If you're using the public
schema for
Postgres, that is how your Iceberg table will be named. Regardless of the
name displayed, storage repositories reside in a private network and are not
publicly accessible.
Connecting to external storage
Crunchy Data Warehouse also supports connection to outside Amazon S3 buckets either to store managed Iceberg tables, or to import, export and query data lake files. Google Cloud Storage and Azure Blob Storage are in development but not currently available in Crunchy Data Warehouse.
Credentials to your object store are in the Team settings —> Data lake. This is where you'll enter the connection details. Each new warehouse cluster can be connected to new or existing credentials.
Connecting via any URL
Crunchy Data Warehouse can also read any publicly available Parquet file via
an HTTPS URL path with the CREATE FOREIGN TABLE
command:
CREATE FOREIGN TABLE taxi_another_trips()
SERVER crunchy_lake_analytics
OPTIONS (path `https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2023-01.parquet`);
Granting URL access to database users
By default, regular database users cannot perform operations that read or write to an arbitrary URL, regardless of whether that URL points to public or private data.
Crunchy Data Warehouse comes with three different roles that you can GRANT
to existing users/roles.
crunchy_lake_read
- permission to read from an arbitrary URL viaCOPY ... FROM
or creating a crunchy_lake_analytics tablecrunchy_lake_write
- permission to write to an arbitrary URL viaCOPY ... TO
or creating an Iceberg tablecrunchy_lake_read_write
- permission to both read and write
For example, you can give a user permission to import from URLs with:
GRANT crunchy_lake_read TO importer;
Note that granting one of these roles implicitly gives access to all data in managed and external storage. You can use table-level grants to give users only read access on specific tables.