PostgreSQL - Out of Memory with large INSERT

I’ve been working with moving large databases in PostgreSQL and have found that PostgreSQL is a true memory hog when inserting millions of rows in one statement. If the target table has a number of foreign key references PostgreSQL seems to choke horribly and use insane amounts of memory, 1GB per million rows in my original case. We have a rather simple table that is composed of six integer columns, two timestamps, and one boolean. I first ran into this issue when inserting with a select statement that had a simple join and where clause. Initially I thought the select statement was the one causing the out of memory issues. But I found running the select alone worked and changing the statement to a select into piped the data into a new table without any issues.

I created a simple test case to reproduce this with the following schema,

CREATE TABLE table1 (
    table1_id integer NOT NULL,
    name character varying NOT NULL,
    PRIMARY KEY(table1_id)
);
CREATE TABLE table2 (
    table2_id integer NOT NULL,
    name character varying NOT NULL,
    PRIMARY KEY(table2_id)
);
CREATE TABLE table3 (
    table3_id integer NOT NULL,
    name character varying NOT NULL,
    PRIMARY KEY(table3_id)
);
CREATE TABLE table4 (
    table4_id integer NOT NULL,
    name character varying NOT NULL,
    PRIMARY KEY(table4_id)
);
CREATE TABLE test_target (
    table1_id integer NOT NULL,
    table2_id integer NOT NULL,
    visible boolean NOT NULL,
    date_added timestamp with time zone NOT NULL,
    date_updated timestamp with time zone NOT NULL,
    table3_id1 integer NOT NULL,
    table3_id2 integer NOT NULL,
    table4_id1 integer NOT NULL,
    table4_id2 integer NOT NULL,
    FOREIGN KEY (table1_id) REFERENCES table1(table1_id),
    FOREIGN KEY (table2_id) REFERENCES table2(table2_id),
    FOREIGN KEY (table3_id1) REFERENCES table3(table3_id),
    FOREIGN KEY (table3_id2) REFERENCES table3(table3_id),
    FOREIGN KEY (table4_id1) REFERENCES table4(table4_id),
    FOREIGN KEY (table4_id2) REFERENCES table4(table4_id)
);

Running this INSERT / SELECT below can cause the PostgreSQL to keep expanding in memory usage. On a 32-bit machine it aborts due to an out of memory error around 2GB, but on a 64-bit Linux machine it keeps using memory until all of the main memory and swap is full and then the oom-killer process is spawned and starts killing processes.

INSERT INTO test_target
(
    table1_id, table2_id, visible,
    date_added, date_updated,
    table3_id1, table3_id2,
    table4_id1, table4_id2
)
SELECT
    0 as table1_id, 0 as table2_id,
    TRUE as visible,
    now() as date_added,
    now() as date_updated,
    0 as table3_id1, 0 as table3_id2,
    0 as table4_id1, 0 as table4_id2
FROM generate_series(1, 13000000);

I've tried this on PostgreSQL 8.3.7 under Linux and I found the same behavior in PostgreSQL 8.4 Beta for Windows.

A simple workaround I found is to simply drop the foreign keys before the insert, and re-add the foreign keys after the data is loaded into the table.

By @Jory Stone in
Tags :