Connecting your Postgres resources
The core of Crunchy Data Warehouse is still pure PostgreSQL, just with
extensions adding the Iceberg file handling and vectorized query engine
features. You can connect your transactional Postgres instances to your
warehouse cluster using the
Postgres foreign data wrapper
(postgres_fdw
) without the need for any third party integrations.
Foreign data wrapper
postgres_fdw
lets you query an external Postgres resource from the active
database you're connected to. From the perspective of this example the two sides
we'll look at are:
- Foreign instance: the one that has the standard Postgres data, perhaps a database in Crunchy Bridge. (It can alternatively be another warehouse instance)
- Destination instance: the warehouse database querying that foreign data to analyze it.
Starting on the Foreign instance, create a new user for the destination server to connect as:
CREATE USER fdw_user WITH PASSWORD 'pizza1';
GRANT SELECT, INSERT, UPDATE,
DELETE ON TABLE your_table_name TO fdw_user;
On the Crunchy Data Warehouse destination, first create the FDW extension to allow you to connect to other databases, then create the foreign server, which tells Postgres where to connect for the remote data:
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE SERVER foreigndb_fdw
FOREIGN DATA WRAPPER
postgres_fdw
OPTIONS (host 'p.vbjrfujvlw725gvi3i.db.postgresbridge.com',
port '5432', dbname 'postgres', sslmode 'require');
Next create the user mapping. This tells Postgres which user on the foreign side to connect as. In this case, all users on the destination side will connect as the same user:;
CREATE USER MAPPING for PUBLIC SERVER
foreigndb_fdw OPTIONS (user 'fdw_user', password 'pizza1');
Finally on the destination side, import the schema and limit it to the table names you want. This gives Postgres a local table definition that matches the remote table's definition and can be queried on the destination server.
IMPORT FOREIGN SCHEMA "public"
LIMIT TO(your_table_name)
FROM SERVER foreigndb_fdw INTO public;
Now that you have a foreign data wrapper configured, you can use that foreign data wrapper connection to shadow data from a standard operational database into a warehouse instance backed by optimized object storage, then start running data analytics on it with the vectorized engine:
COPY (SELECT * FROM your_table_name WHERE time = '2023-01-01')
TO 's3://cdwtestdatasets/logs_demo/log_2023_01_01.parquet'
WITH (format 'parquet');
CREATE FOREIGN TABLE analytics_demo ()
SERVER crunchy_lake_analytics options (path 's3://cdwtestdatasets/logs_demo/*.parquet');