Partitioning an existing table
There are many helpful guides out there for setting up table partitions, including our Partitioning with Native Postgres and pg_partman blog post. However, when partitioning an an existing table, managing the historical data adds another layer of complexity. The method described below allows for partitioning an existing table and its data while minimizing any necessary table locks.
As always, we recommend that you test any procedures in a staging environment before taking any action in your production environment. Open a ticket if you have questions or need help working through this example.
Overview
This method starts off by creating a top-level partition that differentiates your existing, "unsorted" data.
Once the partitioning structure is set up, the existing data is moved from "unsorted" to "sorted" simply by updating a sorted
boolean field in batches.
Once everything is moved into the new "sorted" partition, you will detach and rename it in a transaction, making it your new main table. The result is a partitioned table with the same name as your original table.
Here is a diagram that shows the basic process of this partitioning strategy:
Hint
The example shown here uses a hash partition but the same could be done with range partitioning. Be sure to create a "default" to catch any data that doesn't fall into the ranges you define.
Table partitioning example
These instructions walk you through partitioning an existing table that we will generate with some dummy data. The goal is to provide an example of how you might do this with your own existing table.
Create the "source" table
Before we start, we need a table of existing "unsorted" data for the purposes of our walkthrough. We create a table to hold some data with a date range and insert some random dummy data:
CREATE TABLE orders
(
id serial NOT NULL CONSTRAINT orders_pkey primary key,
orderdate date NOT NULL,
data text
);
-- Create dummy data
INSERT INTO orders(orderdate, data)
SELECT timestamp '2019-01-01 00:00:00' + random() * (timestamp '2020-01-01 00:00:00' - timestamp '2019-01-01 00:00:00'),
sha512(((random()::text)::bytea))::text
FROM generate_series(1,10000) id;
Create the first level partition
Now that we have an existing table, we can start the process by adding our boolean column sorted
to distinguish between the sorted and unsorted data in our table. This will be the top-level partition:
ALTER TABLE orders ADD COLUMN sorted boolean DEFAULT NULL;
Create the new "root" table
Next we create a table that will contain the sorted data. It will ultimately become our "root" table in place of the existing table we started with:
CREATE TABLE orders_partitioned (LIKE orders) PARTITION BY LIST(sorted);
ALTER TABLE orders_partitioned ALTER column id SET DEFAULT nextval('orders_id_seq'::regclass);
CREATE INDEX idx_orders_id ON orders_partitioned(id);
Create the hash partition for sorted data
Now we can create our hash partition on the new "root" table:
CREATE TABLE orders_sorted PARTITION OF orders_partitioned FOR VALUES IN (TRUE) PARTITION BY hash(id);
Create child partitions
Next we can create the set of child partitions we want to use. For illustration purposes, we'll just create five partitions:
CREATE TABLE orders_0 PARTITION OF orders_sorted FOR VALUES WITH (modulus 5, remainder 0);
CREATE TABLE orders_1 PARTITION OF orders_sorted FOR VALUES WITH (modulus 5, remainder 1);
CREATE TABLE orders_2 PARTITION OF orders_sorted FOR VALUES WITH (modulus 5, remainder 2);
CREATE TABLE orders_3 PARTITION OF orders_sorted FOR VALUES WITH (modulus 5, remainder 3);
CREATE TABLE orders_4 PARTITION OF orders_sorted FOR VALUES WITH (modulus 5, remainder 4);
Attach original table as "unsorted" partition
Attach the table with the original data as a partition where the data is not sorted, which we know because sorted
is NULL:
ALTER TABLE orders_partitioned ATTACH PARTITION orders FOR VALUES IN (FALSE, NULL);
Rename tables to identify sorted and unsorted data
In a transaction, rename your existing data table to indicate that its data is "unsorted", and rename the new "root" table to the original name of the table. This will help you keep everything clear as we proceed.
BEGIN TRANSACTION;
ALTER TABLE orders RENAME TO orders_unsorted;
ALTER TABLE orders_partitioned RENAME TO orders;
COMMIT;
Set sorted=TRUE to the default
Now that we have the partitioning set up and have renamed our new table to the original table's name, any newly inserted data should already be sorted. So we'll set this as the default:
ALTER TABLE orders ALTER COLUMN sorted SET DEFAULT TRUE;
Move unsorted data over
Now we need to move our existing data from the unsorted partition to the new, sorted one. Doing this in batches reduces the locks that are taken out in the process. Run the command below repeatedly until nothing is left in orders_unsorted:
UPDATE orders SET sorted = TRUE WHERE id IN (SELECT id FROM orders_unsorted LIMIT 10000);
Hint
If a client attempts to delete a row that is currently being updated by the above query, the delete will queue up but then fail because the row has been moved. If this happens you'll receive the following error:
ERROR: tuple to be locked was already moved to another partition due to concurrent update
If you do encounter these errors, you'll need to re-run any affected deletes.
Check the number of rows in each partition
You can use this command to see the number of rows in each partition at any time:
SELECT tableoid::regclass, count(id) FROM orders GROUP BY tableoid::regclass;
Drop the old "unsorted" table
Once all the existing data has been moved into the new table, the old table containing our "unsorted" order data should be empty and can be dropped:
ALTER SEQUENCE orders_id_seq OWNED BY orders_sorted.id;
DROP TABLE orders_unsorted;
Detach and rename the new table
In a transaction, you can now detach the sorted partition and rename it to be the new "root" table. You can also rename (or just drop) the old table that contained the unsorted data.
BEGIN TRANSACTION;
ALTER TABLE orders DETACH PARTITION orders_sorted;
ALTER TABLE orders RENAME TO orders_old;
ALTER TABLE orders_sorted RENAME TO orders;
COMMIT;
Cleaning up
Finally, you can re-add your primary key constraint, and remove the sorted
boolean field since it is no longer needed:
CREATE UNIQUE INDEX orders_pkey ON orders (id);
ALTER TABLE ORDERS DROP COLUMN sorted;