Parquet
Crunchy Bridge for Analytics includes read and write support for many of Parquet’s more complex types, including List, Array, Struct, Map, and Binary data.
These types will be created for you automatically when you use
CREATE FOREIGN TABLE
with type detection.
LIST
and ARRAY
types are automatically turned into array types on the
Postgres side and operate just like Postgres arrays.
STRUCT
types in Parquet get turned into composite types stored in a
crunchy_struct
schema; this lets you view STRUCT
data in your Parquet files
as if they are normal Postgres records.
Since nested composite types are not super common in Postgres you need to use syntax you may not be familiar to query directly from Postgres:
CREATE FOREIGN TABLE ov_places ()
SERVER crunchy_lake_analytics
OPTIONS (path 's3://overturemaps-us-west-2/release/2024-04-16-beta.0/theme=places/type=*/*/?s3_region=us-west-2', format 'parquet');
CREATE FOREIGN TABLE
postgres=# \d ov_places
Foreign table "public.ov_places"
Column | Type | Collation | Nullable | Default | FDW options
-------------+---------------------------------------------------------------------+-----------+----------+---------+-------------
id | text | | | |
geometry | bytea | | | |
bbox | crunchy_struct.xmin_xmax_ymin_ymax_35464140 | | | |
version | integer | | | |
update_time | text | | | |
sources | crunchy_struct.property_dataset_record_id_confidence_acf6e375[] | | | |
names | crunchy_struct.primary_common_rules_d47e3951 | | | |
categories | crunchy_struct.main_alternate_58e0a237 | | | |
confidence | double precision | | | |
websites | text[] | | | |
socials | text[] | | | |
emails | text[] | | | |
phones | text[] | | | |
brand | crunchy_struct.wikidata_names_b40589be | | | |
addresses | crunchy_struct.freeform_locality_postcode_region_country__4d9a9db[] | | | |
theme | text | | | |
type | text | | | |
Server: crunchy_lake_analytics
FDW options: (path 's3://overturemaps-us-west-2/release/2024-04-16-beta.0/theme=places/type=*/*/?s3_region=us-west-2', format 'parquet')
The types that live in the crunchy_struct
schema are the auto-generated
composite types for the underlying STRUCT
fields. You can get information
about the types via \d
:
postgres=# \d crunchy_struct.wikidata_names_b40589be
Composite type "crunchy_struct.wikidata_names_b40589be"
Column | Type | Collation | Nullable | Default
----------+----------------------------------------------+-----------+----------+---------
wikidata | text | | |
names | crunchy_struct.primary_common_rules_d47e3951 | | |
In this case, you can see that the wikidata_names_b40589be
type corresponds to
the original type STRUCT { wikidata TEXT, names STRUCT { ... }}
. In this
case, the names
field is a sub-struct, which is also created as a composite
type, which you can see in a similar fashion:
postgres=# \d crunchy_struct.primary_common_rules_d47e3951
Composite type "crunchy_struct.primary_common_rules_d47e3951"
Column | Type | Collation | Nullable | Default
---------+---------------------------------------------------------------+-----------+----------+---------
primary | text | | |
common | crunchy_map.key_text_val_text | | |
rules | crunchy_struct.variant_language_value_between_side_3e87a822[] | | |
If you were to look at a single record of the top-level struct, it will display in a serialized form, since Postgres does not natively decompose multiple levels of types when it works:
select brand from ov_places where brand is not null limit 1;
brand
-----------------
("(Civeco,,)")
(1 row)
When you query inside Postgres against fields which are using STRUCT
types you
will need to utilize the composite type decomposition syntax; this looks like
the following:
select brand from ov_places where brand is not null limit 1;
brand
-----------------
(,''(Civeco,,)'')
(1 row)
select (brand).* from ov_places where brand is not null limit 1;
wikidata | names
----------+------------
| (Civeco,,)
(1 row)
select ((brand).names).* from ov_places where brand is not null limit 1;
primary | common | rules
---------+--------+-------
Civeco | |
(1 row)
As you can see, the (foo).*
syntax will decompose the composite type into the
fields for that subtype, meaning you could do checks like:
select ((brand).names).* from ov_places where ((brand).names).primary <> 'Civeco' limit 1;
primary | common | rules
--------------------+--------+-------
City Impact Church | |
(1 row)
BINARY
data used to be treated as TEXT
, but now works transparently using
the bytea
type in Postgres, so binary data is fully readable/writeable in this
form.
MAP
types are imported as a custom type, managed by the crunchy_map
extension. These types are created in the crunchy_map
schema and are
effectively an array of a composite key/value pairs. To lookup a specific value
from a MAP
field, you can use the ->
operator (shorthand for the
crunchy_map.extract
function) which will automatically return the looked-up
value, or NULL
if such a value does not exist:
select attributes->2 from test_lookups where name = 'Colors';
column
----------
Green
(1 row)