Postgres tips

A quick reference for working with PostgreSQL databases.

Oct 15, 2018 4 min read

Total rows

Get an estimate of the number of rows for an enormous table called “events”.

SELECT reltuples::bigint AS rows_estimate FROM pg_class WHERE relname="events";

Total size

Or get a table of a database’s largest relations and their total size in bytes, total size pretty, and rows estimate.

SELECT
    nspname || '.' || relname AS "relation",
    pg_total_relation_size(C.oid) AS "total_size_bytes",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size_pretty",
    reltuples::BIGINT AS "rows_estimate"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 200;

More size queries

Get table, index, and total size.

SELECT
  table_name,
  pg_size_pretty(table_size) AS table_size,
  pg_size_pretty(indexes_size) AS indexes_size,
  pg_size_pretty(total_size) AS total_size
FROM (
  SELECT
    table_name,
    pg_table_size(table_name) AS table_size,
    pg_indexes_size(table_name) AS indexes_size,
    pg_total_relation_size(table_name) AS total_size
  FROM (
    SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name, table_catalog
    FROM information_schema.tables
  ) AS all_tables
  ORDER BY total_size DESC
) AS pretty_sizes;

Or get the size of the largest relations.

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_relation_size(C.oid)) AS "size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
  ORDER BY pg_relation_size(C.oid) DESC
  LIMIT 20;

Find more disk usage related queries at PostgreSQL Disk Usage

pg_dump

cat .../<your-app>/.../database.yml
DB_HOST=<from database.yml>
DB_NAME=<from database.yml>
DB_USER=<from database.yml>
DB_DUMP_FILE=db.dump
# -Fc format custom, -v verbose
time (pg_dump -Fc -v -U $DB_USER -h $DB_HOST -d $DB_NAME -f $DB_DUMP_FILE)

SERVER_USER=<ie deploy>
SERVER_IP=<ie 1.2.3.4 >
DB_NAME=<ie database_prod>
REMOTE_DB_DUMP_FILE=db.dump
DATE=$(date +"%Y%m%d%H%M")
LOCAL_DB_DUMP=$HOME/.../$DB_NAME\-$DATE.dump

echo Fetching production database dump
scp $SERVER_USER@$SERVER_IP:$REMOTE_DB_DUMP_FILE $LOCAL_DB_DUMP

pg_restore

Simple pg_restore for local development

cat .../<your-app>/.../database.yml
DB_NAME=<from database.yml>
DB_DUMP_FILE=db.dump

# -Fc --format=custom, -v --verbose
# -c --clean (clean/drop database objects before recreating them!)
time (pg_restore -Fc -v -c -d $DB_NAME $DB_DUMP_FILE)

More realistic scenario on production or staging server, using a pg_restore.list option with -L pg_restore.list

DB_HOST=<from database.yml>
DB_NAME=<from database.yml>
DB_USER=<from database.yml>
DB_DUMP_FILE=db.dump
PG_RESTORE_LIST=pg_restore.list

pg_restore -l db.dump | grep -Fv -e 'COMMENT - EXTENSION' -e 'master' -e 'rdsadmin' > $PG_RESTORE_LIST
# -Fc --format=custom, -v --verbose, -O --no-owner, -1 --single-transaction
# -e --exit-on-error
time (pg_restore -Fc -v -O -1 -e -h $DB_HOST -d $DB_NAME -U $DB_USER -L $PG_RESTORE_LIST $DB_DUMP_FILE)

Restore specific table(s)

DB_HOST=<from database.yml>
DB_NAME=<from database.yml>
DB_USER=<from database.yml>
DB_DUMP_FILE=db.dump
DB_TABLE1=<ie users>
DB_TABLE2=<ie accounts>

# -Fc --format=custom, -v --verbose, -a --data-only, -1 --single-transaction
# -e --exit-on-error, -t --table (as many as needed)
time (pg_restore -Fc -v -a -1 -e -h $DB_HOST -d $DB_NAME -U $DB_USER -t $DB_TABLE1 -t $DB_TABLE2 $DB_DUMP_FILE)

EXPLAIN

Add EXPLAIN before a query to get its execution plan. Example: EXPLAIN SELECT * FROM users WHERE id = 42;

Add EXPLAIN ANALYZE for the statement to be actually executed, not only planned. This is fine for SELECT. But, since the query is executed, be careful with INSERT, UPDATE, DELETE, CREATE TABLE AS. Wrap those with a BEGIN and ROLLBACK:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

VACUUM

Why VACUUM?

In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it’s necessary to do VACUUM periodically, especially on frequently-updated tables. (reference)

VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table.

ANALYZE updates statistics used by the planner to determine the most efficient way to execute a query.

VACUUM FULL reclaims more space, but takes much longer and exclusively locks the table. This method also requires extra disk space, since it writes a new copy of the table and doesn’t release the old copy until the operation is complete.

Rails console SQL commands

From the Rails console, call ActiveRecord::Base.connection.execute "..." to execute arbitrary SQL commands:

ActiveRecord::Base.connection.execute "VACUUM VERBOSE ANALYZE users"

Maintaining a Database

Find unused indexes that can be pruned.

Since indexes add significant overhead to any table change operation, they should be removed if they are not being used for either queries or constraint enforcement (such as making sure a value is unique).

SELECT
    t.schemaname,
    t.tablename,
    indexname,
    c.reltuples AS num_rows,
    pg_size_pretty(pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(t.tablename)::text)) AS table_size,
    pg_size_pretty(pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(indexrelname)::text)) AS index_size,
    pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(t.tablename)::text) AS table_size_raw,
    pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(indexrelname)::text) AS index_size_raw,
    CASE WHEN indisunique THEN 'Y'
        ELSE 'N'
    END AS UNIQUE,
    number_of_scans,
    tuples_read,
    tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename = c.relname
LEFT OUTER JOIN (
    SELECT
        c.relname AS ctablename,
        ipg.relname AS indexname,
        x.indnatts AS number_of_columns,
        idx_scan AS number_of_scans,
        idx_tup_read AS tuples_read,
        idx_tup_fetch AS tuples_fetched,
        indexrelname,
        indisunique,
        schemaname
    FROM pg_index x
    JOIN pg_class c ON c.oid = x.indrelid
    JOIN pg_class ipg ON ipg.oid = x.indexrelid
    JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid
) AS foo ON t.tablename = foo.ctablename AND t.schemaname = foo.schemaname
WHERE t.schemaname
    NOT IN ('pg_catalog', 'information_schema')
--    AND number_of_scans < 200
ORDER BY number_of_scans, 1, 2, 3;

Find more index maintenance related queries at PostgreSQL Disk Usage

Read more posts like this in the Software Engineering Toolbox collection.

Visit homepage