A quick reference for working with PostgreSQL databases.
See also: Language syntax review: SQL
Get an estimate of the number of rows for an enormous table called “events”.
SELECT reltuples::bigint AS rows_estimate FROM pg_class WHERE relname="events";
Or get a table of a database’s largest relations and their total size in bytes, total size pretty, and rows estimate.
SELECT
nspname || '.' || relname AS "relation",
pg_total_relation_size(C.oid) AS "total_size_bytes",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size_pretty",
reltuples::BIGINT AS "rows_estimate"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 200;
Get table, index, and total size.
SELECT
table_name,
pg_size_pretty(table_size) AS table_size,
pg_size_pretty(indexes_size) AS indexes_size,
pg_size_pretty(total_size) AS total_size
FROM (
SELECT
table_name,
pg_table_size(table_name) AS table_size,
pg_indexes_size(table_name) AS indexes_size,
pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name, table_catalog
FROM information_schema.tables
) AS all_tables
ORDER BY total_size DESC
) AS pretty_sizes;
Or get the size of the largest relations.
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
Find more disk usage related queries at PostgreSQL Disk Usage
cat .../<your-app>/.../database.yml
DB_HOST=<from database.yml>
DB_NAME=<from database.yml>
DB_USER=<from database.yml>
DB_DUMP_FILE=db.dump
# -Fc format custom, -v verbose
time (pg_dump -Fc -v -U $DB_USER -h $DB_HOST -d $DB_NAME -f $DB_DUMP_FILE)
SERVER_USER=<ie deploy>
SERVER_IP=<ie 1.2.3.4 >
DB_NAME=<ie database_prod>
REMOTE_DB_DUMP_FILE=db.dump
DATE=$(date +"%Y%m%d%H%M")
LOCAL_DB_DUMP=$HOME/.../$DB_NAME\-$DATE.dump
echo Fetching production database dump
scp $SERVER_USER@$SERVER_IP:$REMOTE_DB_DUMP_FILE $LOCAL_DB_DUMP
Simple pg_restore for local development
cat .../<your-app>/.../database.yml
DB_NAME=<from database.yml>
DB_DUMP_FILE=db.dump
# -Fc --format=custom, -v --verbose
# -c --clean (clean/drop database objects before recreating them!)
time (pg_restore -Fc -v -c -d $DB_NAME $DB_DUMP_FILE)
More realistic scenario on production or staging server, using a pg_restore.list
option with -L pg_restore.list
DB_HOST=<from database.yml>
DB_NAME=<from database.yml>
DB_USER=<from database.yml>
DB_DUMP_FILE=db.dump
PG_RESTORE_LIST=pg_restore.list
pg_restore -l db.dump | grep -Fv -e 'COMMENT - EXTENSION' -e 'master' -e 'rdsadmin' > $PG_RESTORE_LIST
# -Fc --format=custom, -v --verbose, -O --no-owner, -1 --single-transaction
# -e --exit-on-error
time (pg_restore -Fc -v -O -1 -e -h $DB_HOST -d $DB_NAME -U $DB_USER -L $PG_RESTORE_LIST $DB_DUMP_FILE)
DB_HOST=<from database.yml>
DB_NAME=<from database.yml>
DB_USER=<from database.yml>
DB_DUMP_FILE=db.dump
DB_TABLE1=<ie users>
DB_TABLE2=<ie accounts>
# -Fc --format=custom, -v --verbose, -a --data-only, -1 --single-transaction
# -e --exit-on-error, -t --table (as many as needed)
time (pg_restore -Fc -v -a -1 -e -h $DB_HOST -d $DB_NAME -U $DB_USER -t $DB_TABLE1 -t $DB_TABLE2 $DB_DUMP_FILE)
Add EXPLAIN
before a query to get its execution plan.
Example: EXPLAIN SELECT * FROM users WHERE id = 42;
Add EXPLAIN ANALYZE
for the statement to be actually executed, not only
planned. This is fine for SELECT
. But, since the query is executed, be careful
with INSERT, UPDATE, DELETE, CREATE TABLE AS
. Wrap those with a BEGIN
and ROLLBACK
:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;
Why VACUUM
?
In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it’s necessary to do VACUUM periodically, especially on frequently-updated tables. (reference)
VACUUM ANALYZE
performs a VACUUM
and then an ANALYZE
for each selected
table.
ANALYZE
updates statistics used by the planner to determine the most efficient
way to execute a query.
VACUUM FULL
reclaims more space, but takes much longer and exclusively locks
the table. This method also requires extra disk space, since it writes a new
copy of the table and doesn’t release the old copy until the operation is
complete.
From the Rails console, call ActiveRecord::Base.connection.execute "..."
to execute arbitrary SQL commands:
ActiveRecord::Base.connection.execute "VACUUM VERBOSE ANALYZE users"
Find unused indexes that can be pruned.
Since indexes add significant overhead to any table change operation, they should be removed if they are not being used for either queries or constraint enforcement (such as making sure a value is unique).
SELECT
t.schemaname,
t.tablename,
indexname,
c.reltuples AS num_rows,
pg_size_pretty(pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(t.tablename)::text)) AS table_size,
pg_size_pretty(pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(indexrelname)::text)) AS index_size,
pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(t.tablename)::text) AS table_size_raw,
pg_relation_size(quote_ident(t.schemaname)::text || '.' || quote_ident(indexrelname)::text) AS index_size_raw,
CASE WHEN indisunique THEN 'Y'
ELSE 'N'
END AS UNIQUE,
number_of_scans,
tuples_read,
tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename = c.relname
LEFT OUTER JOIN (
SELECT
c.relname AS ctablename,
ipg.relname AS indexname,
x.indnatts AS number_of_columns,
idx_scan AS number_of_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched,
indexrelname,
indisunique,
schemaname
FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class ipg ON ipg.oid = x.indexrelid
JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid
) AS foo ON t.tablename = foo.ctablename AND t.schemaname = foo.schemaname
WHERE t.schemaname
NOT IN ('pg_catalog', 'information_schema')
-- AND number_of_scans < 200
ORDER BY number_of_scans, 1, 2, 3;
Find more index maintenance related queries at PostgreSQL Disk Usage