databases, google cloud

Point in time recovery in PostgreSQL 11 with WAL-E

Point in time recovery (PITR) – as name suggests – stands for making database restore to a specific point in time. There are two requirements to do this operation: base backup & WAL archiving. In order to minimize restore time, base backups should be done as often as possible – in case of disaster, there will be less WAL files to replay – and as a result – recovery will be done much faster.

Environment

One VM with running Postgres 11 master database, and second one, which we will use to restore broken cluster. Master will send base backups and WAL files to bucket on Google Cloud. Restore also will be done using storage bucket.

Prepare master to write WAL files

In order to send WAL files to GCP bucket, master must be set and run with following configuration:

wal_level = archive
archive_mode = on
archive_command = 'envdir /etc/wal-e.d/env wal-e wal-push %p'
archive_timeout = 60

As we use envdir to protect environment variables and as a result we do not need to put them directly in postgres configuration, line below must be executed.

$ echo gs://<bucket_name> > /etc/wal-e.d/env/WALE_GS_PREFIX

In given example, there is only bucket name, which can be written directly to config with no harm done. This is because this VM is in private VPC on GCP and has direct access to buckets. Otherwise, the second parameter GOOGLE_APPLICATION_CREDENTIALS should be provided. And in this case putting it in postgres configuration is not recommended, therefore envdir usage is highly justified. More info on this tool can be find in wal-e readme on github.

From now, envdir will inject all of the variables to wal-e script. And wal-e needs this information to know where to send WAL files. Again – in our case this will be Google Storage bucket.

Start cluster

Right now we are ready to start database. In order to use PITR restoration, we have to do base backup, on which we will rely later. After starting DB, let’s use wal-e command backup-push to send it do GCP. This is done manually for testing purposes. In production environment, this operation should be done automatically via cron, eg. once a day at night.

$ envdir /etc/wal-e.d/env/ wal-e backup-push /var/lib/pgsql/11/data/

Disaster is coming

Ok. We have running cluster with base backup done and enabled live streaming WALs to GCP bucket. Unfortunately, some errors occured and data in postgres is corrupted. We decide to restore database to state from few hours earlier – when we are sure data was still correct.

Bring to life new cluster

In my case, I used separate VM on GCP to setup new Postgres cluster. To minimize risk, I use ansible to setup those machines in the same manner. After doing it, there are two machines with Postgres – broken master and new one, ready to handle traffic after doing restore.

Things to do on clean DB

Fortunately, we were prepared for such a situation and restoration should not be very complicated. Firstly, we will also use wal-e. Let’s tell it, where to look for files we need to perform restoration. The command is the same as on the master:

$ echo gs://<MASTER_bucket_name> > /etc/wal-e.d/env/WALE_GS_PREFIX

Then we have to stop postgres and apply base backup using backup-fetch command.

$ sudo systemctl stop postgresql-11
$ /usr/local/bin/wal-e backup-fetch /var/lib/pgsql/11/data/ LATEST

When base backup is recovered, we are ready to configure WALs restoring.

Create recovery.conf file in Postgres data directory to let know Postgres how to fetch WALs and at what time to stop applying them.

restore_command = 'envdir /etc/wal-e.d/env wal-e wal-fetch %f %p'
standby_mode = on
recovery_target_time = '2021-02-05 10:00:00'              # remember about the DB TIMEZONE differences!

If recovery.conf file is present, Postgres will know to run in recovery mode. Thus, it will not accept incoming connections until we disable it.

restore_command just tells what command to fire after cluster is started. In this case we instruct it to run wal-fetch, therefore to download files (from the source we specified via envdir) and try to restore them.

standby_mode If this parameter is on, the server will not stop recovery when the end of archived WAL is reached, but will keep trying to continue recovery by fetching new WAL segments using restore_command indefinitely. In this case we could omit it, as we are going to set the end of the restoration at specific point in time.

recovery_target_time indicates to which point in time we want to get back.

Restoration

We have configured new machine to handle restoration. After restart, let’s look at the logs whether the operation is being handled correctly.

$ sudo systemctl restart postgresql-11
$ tail -f /var/lib/pgsql/11/data/log/postgresql-Fri.log
2021-02-05 11:18:56.700 UTC [47958] LOG:  database system is shut down
2021-02-05 11:18:56.789 UTC [95963] LOG:  database system was shut down at 2021-02-05 11:18:56 UTC
2021-02-05 11:43:51.561 UTC [96022] LOG:  database system is ready to accept read only connections
wal_e.operator.backup INFO     MSG: promoted prefetched wal segment
        STRUCTURED: time=2021-02-05T11:43:52.079899-00 pid=96108 action=wal-fetch key=gs://<bucket_name>/wal_005/000000010000000000000009.lzo prefix=gs://<bucket_name>/ seg=000000010000000000000009
2021-02-05 11:43:52.173 UTC [96024] LOG:  restored log file "000000010000000000000009" from archive
wal_e.operator.backup INFO     MSG: promoted prefetched wal segment

[...]

        STRUCTURED: time=2021-02-05T11:43:55.677788-00 pid=96171 action=wal-fetch key=gs://<bucket_name>/wal_005/00000001000000000000000D.lzo prefix=gs://<bucket_name> seg=00000001000000000000000D
2021-02-05 11:43:55.785 UTC [96024] LOG:  restored log file "00000001000000000000000D" from archive
wal_e.operator.backup INFO     MSG: promoted prefetched wal segment
        STRUCTURED: time=2021-02-05T11:43:56.590586-00 pid=96194 action=wal-fetch key=gs://<bucket_name>/wal_005/00000001000000000000000E.lzo prefix=gs://<bucket_name> seg=00000001000000000000000E
2021-02-05 11:43:56.697 UTC [96024] LOG:  restored log file "00000001000000000000000E" from archive
2021-02-05 11:43:57.289 UTC [96024] LOG:  recovery stopping before commit of transaction 590, time 2021-02-05 11:27:58.183341+00
2021-02-05 11:43:57.289 UTC [96024] LOG:  recovery has paused
2021-02-05 11:43:57.289 UTC [96024] HINT:  Execute pg_wal_replay_resume() to continue.

It looks like restore has been completed successfully. As mentioned before, cluster is still running in standby mode, therefore it is not accepting client connections and running in read-only mode. Let’s promote it to master.

postgres=# SELECT pg_wal_replay_resume();

After running this command, file recovery.conf will be renamed to recovery.done and from now database will be running as fully recovered master. In other words – it’s ready to handle connections, so don’t hesitate to switch load balancer to point on it. To be sure whether everything is ok, let’s have a look again at logs:

2021-02-05 11:45:43.400 UTC [96024] LOG:  redo done at 0/E61FCF8

Final words

It’s important to remember that there are other backup mechanisms available in Postgres – like streaming replication. However, it’s even more important to distinct PITR from it – when disaster appears, corrupted data is replicated almost instantaneously to replica – therefore it’s not a valid backup anymore. It’s copy of broken master. Thus – replication is no backup.

Further reading