Parquet

Crunchy Bridge for Analytics includes read and write support for many of Parquet’s more complex types, including List, Array, Struct, Map, and Binary data.

These types will be created for you automatically when you use CREATE FOREIGN TABLE with type detection.

LIST and ARRAY types are automatically turned into array types on the Postgres side and operate just like Postgres arrays.

STRUCT types in Parquet get turned into composite types stored in a crunchy_struct schema; this lets you view STRUCT data in your Parquet files as if they are normal Postgres records.

Since nested composite types are not super common in Postgres you need to use syntax you may not be familiar to query directly from Postgres:

CREATE FOREIGN TABLE ov_places ()
SERVER crunchy_lake_analytics
OPTIONS (path 's3://overturemaps-us-west-2/release/2024-04-16-beta.0/theme=places/type=*/*/?s3_region=us-west-2', format 'parquet');
CREATE FOREIGN TABLE
postgres=# \d ov_places
                                                 Foreign table "public.ov_places"
   Column    |                                Type                                 | Collation | Nullable | Default | FDW options
-------------+---------------------------------------------------------------------+-----------+----------+---------+-------------
 id          | text                                                                |           |          |         |
 geometry    | bytea                                                               |           |          |         |
 bbox        | crunchy_struct.xmin_xmax_ymin_ymax_35464140                         |           |          |         |
 version     | integer                                                             |           |          |         |
 update_time | text                                                                |           |          |         |
 sources     | crunchy_struct.property_dataset_record_id_confidence_acf6e375[]     |           |          |         |
 names       | crunchy_struct.primary_common_rules_d47e3951                        |           |          |         |
 categories  | crunchy_struct.main_alternate_58e0a237                              |           |          |         |
 confidence  | double precision                                                    |           |          |         |
 websites    | text[]                                                              |           |          |         |
 socials     | text[]                                                              |           |          |         |
 emails      | text[]                                                              |           |          |         |
 phones      | text[]                                                              |           |          |         |
 brand       | crunchy_struct.wikidata_names_b40589be                              |           |          |         |
 addresses   | crunchy_struct.freeform_locality_postcode_region_country__4d9a9db[] |           |          |         |
 theme       | text                                                                |           |          |         |
 type        | text                                                                |           |          |         |
Server: crunchy_lake_analytics
FDW options: (path 's3://overturemaps-us-west-2/release/2024-04-16-beta.0/theme=places/type=*/*/?s3_region=us-west-2', format 'parquet')

The types that live in the crunchy_struct schema are the auto-generated composite types for the underlying STRUCT fields. You can get information about the types via \d:

postgres=# \d crunchy_struct.wikidata_names_b40589be
                 Composite type "crunchy_struct.wikidata_names_b40589be"
  Column  |                     Type                     | Collation | Nullable | Default
----------+----------------------------------------------+-----------+----------+---------
 wikidata | text                                         |           |          |
 names    | crunchy_struct.primary_common_rules_d47e3951 |           |          |

In this case, you can see that the wikidata_names_b40589be type corresponds to the original type STRUCT { wikidata TEXT, names STRUCT { ... }} . In this case, the names field is a sub-struct, which is also created as a composite type, which you can see in a similar fashion:

postgres=# \d crunchy_struct.primary_common_rules_d47e3951
                      Composite type "crunchy_struct.primary_common_rules_d47e3951"
 Column  |                             Type                              | Collation | Nullable | Default
---------+---------------------------------------------------------------+-----------+----------+---------
 primary | text                                                          |           |          |
 common  | crunchy_map.key_text_val_text                                 |           |          |
 rules   | crunchy_struct.variant_language_value_between_side_3e87a822[] |           |          |

If you were to look at a single record of the top-level struct, it will display in a serialized form, since Postgres does not natively decompose multiple levels of types when it works:

select brand from ov_places where brand is not null limit 1;
brand
-----------------
 ("(Civeco,,)")
(1 row)

When you query inside Postgres against fields which are using STRUCT types you will need to utilize the composite type decomposition syntax; this looks like the following:

select brand from ov_places where brand is not null limit 1;
brand
-----------------
 (,''(Civeco,,)'')
(1 row)
select (brand).* from ov_places where brand is not null limit 1;
 wikidata |   names
----------+------------
          | (Civeco,,)
(1 row)
select ((brand).names).* from ov_places where brand is not null limit 1;
 primary | common | rules
---------+--------+-------
 Civeco  |        |
(1 row)

As you can see, the (foo).* syntax will decompose the composite type into the fields for that subtype, meaning you could do checks like:

select ((brand).names).* from ov_places where ((brand).names).primary <> 'Civeco' limit 1;
      primary       | common | rules
--------------------+--------+-------
 City Impact Church |        |
(1 row)

BINARY data used to be treated as TEXT, but now works transparently using the bytea type in Postgres, so binary data is fully readable/writeable in this form.

MAP types are imported as a custom type, managed by the crunchy_map extension. These types are created in the crunchy_map schema and are effectively an array of a composite key/value pairs. To lookup a specific value from a MAP field, you can use the -> operator (shorthand for the crunchy_map.extract function) which will automatically return the looked-up value, or NULL if such a value does not exist:

select attributes->2 from test_lookups where name = 'Colors';
column
----------
 Green
(1 row)