Getting started
Create a team for analytics
Before you create your analytics cluster, you'll need to decide which Team it belongs in. You can create a new Team for your analytics clusters or provision them in existing Teams.
Creating a new analytics cluster
Provision a new analytics cluster inside your team by clicking the down arrow next to the Create Cluster button. The option to create an Analytics cluster will appear in the dropdown:
Select the region and instance size for the cluster:
Granting access to the existing users
Crunchy Bridge for Analytics comes with 3 different roles that you should
GRANT
to your existing users/roles.
- crunchy_lake_read
- crunchy_lake_write
- crunchy_lake_read_write
GRANT crunchy_lake_read_write TO application;
Connecting to cloud object store
Crunchy Bridge for Analytics currently supports connection to Amazon S3 (Google Cloud Storage and Azure Blob Storage are in development). Credentials to your object store are in the Team Settings —> Analytics Credentials. Here you’ll enter the connection details. Each analytics provision can be connected to new or existing credentials.
Connecting via any URL
Crunchy Bridge for analytics can also read any publicly available Parquet file
via https url path with the CREATE FOREIGN TABLE
command.
CREATE FOREIGN TABLE taxi_another_trips()
SERVER crunchy_lake_analytics OPTIONS
(path 'https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2023-01.parquet');
Regions
We recommend that your Analytics cluster and the S3 buckets you will access
frequently are in the same region to maximize query performance and avoid
potential network charges. Whether they are in the same region or not, the
experience will be seamless. Analytics will automatically detect the regions and
configure S3 access accordingly. If you want to pass on a special region
parameter when specifying the URL of a bucket, you can add the optional
?s3_region=[bucket_region]
parameter, but in vast majority of cases you should
not need to.
Connecting to public S3
For testing purposes or experimentation, you may want to connect a Bridge Analytics cluster to public data in S3.
There are two kinds of S3 public buckets:
- Give access to everyone (public access). For public access buckets, you can skip the credential setup during cluster initialization. You will continue to see a warning on the cluster overview page to enter credentials for full functionality, but your Analytics cluster will still be operational without them.
- Authenticated users group (anyone with an AWS account). For these, you will need to be connected to s3 so that the account is recognized.