Point in time recovery in PostgreSQL 11 with WAL-E
Point in time recovery (PITR) – as name suggests – stands for making database restore to a specific point in time. There are two requirements to do this operation: base backup & WAL archiving. In order to minimize restore time, base backups should be done as often as possible – in case of disaster, there will be less WAL files to replay – and as a result – recovery will be done much faster.
One VM with running Postgres 11 master database, and second one, which we will use to restore broken cluster. Master will send base backups and WAL files to bucket on Google Cloud. Restore also will be done using storage bucket.
Prepare master to write WAL files
In order to send WAL files to GCP bucket, master must be set and run with following configuration:
wal_level = archive archive_mode = on archive_command = 'envdir /etc/wal-e.d/env wal-e wal-push %p' archive_timeout = 60
As we use
envdir to protect environment variables and as a result we do not need to put them directly in postgres configuration, line below must be executed.
$ echo gs://<bucket_name> > /etc/wal-e.d/env/WALE_GS_PREFIX
In given example, there is only bucket name, which can be written directly to config with no harm done. This is because this VM is in private VPC on GCP and has direct access to buckets. Otherwise, the second parameter
GOOGLE_APPLICATION_CREDENTIALS should be provided. And in this case putting it in postgres configuration is not recommended, therefore
envdir usage is highly justified. More info on this tool can be find in wal-e readme on github.
envdir will inject all of the variables to
wal-e script. And
wal-e needs this information to know where to send WAL files. Again – in our case this will be Google Storage bucket.
Right now we are ready to start database. In order to use PITR restoration, we have to do base backup, on which we will rely later. After starting DB, let’s use
backup-push to send it do GCP. This is done manually for testing purposes. In production environment, this operation should be done automatically via cron, eg. once a day at night.
$ envdir /etc/wal-e.d/env/ wal-e backup-push /var/lib/pgsql/11/data/
Disaster is coming
Ok. We have running cluster with base backup done and enabled live streaming WALs to GCP bucket. Unfortunately, some errors occured and data in postgres is corrupted. We decide to restore database to state from few hours earlier – when we are sure data was still correct.
Bring to life new cluster
In my case, I used separate VM on GCP to setup new Postgres cluster. To minimize risk, I use ansible to setup those machines in the same manner. After doing it, there are two machines with Postgres – broken master and new one, ready to handle traffic after doing restore.
Things to do on clean DB
Fortunately, we were prepared for such a situation and restoration should not be very complicated. Firstly, we will also use
wal-e. Let’s tell it, where to look for files we need to perform restoration. The command is the same as on the master:
$ echo gs://<MASTER_bucket_name> > /etc/wal-e.d/env/WALE_GS_PREFIX
Then we have to stop postgres and apply base backup using
$ sudo systemctl stop postgresql-11 $ /usr/local/bin/wal-e backup-fetch /var/lib/pgsql/11/data/ LATEST
When base backup is recovered, we are ready to configure WALs restoring.
recovery.conf file in Postgres data directory to let know Postgres how to fetch WALs and at what time to stop applying them.
restore_command = 'envdir /etc/wal-e.d/env wal-e wal-fetch %f %p' standby_mode = on recovery_target_time = '2021-02-05 10:00:00' # remember about the DB TIMEZONE differences!
recovery.conf file is present, Postgres will know to run in recovery mode. Thus, it will not accept incoming connections until we disable it.
restore_command just tells what command to fire after cluster is started. In this case we instruct it to run
wal-fetch, therefore to download files (from the source we specified via
envdir) and try to restore them.
standby_mode If this parameter is on, the server will not stop recovery when the end of archived WAL is reached, but will keep trying to continue recovery by fetching new WAL segments using
restore_command indefinitely. In this case we could omit it, as we are going to set the end of the restoration at specific point in time.
recovery_target_time indicates to which point in time we want to get back.
We have configured new machine to handle restoration. After restart, let’s look at the logs whether the operation is being handled correctly.
$ sudo systemctl restart postgresql-11
$ tail -f /var/lib/pgsql/11/data/log/postgresql-Fri.log 2021-02-05 11:18:56.700 UTC  LOG: database system is shut down 2021-02-05 11:18:56.789 UTC  LOG: database system was shut down at 2021-02-05 11:18:56 UTC 2021-02-05 11:43:51.561 UTC  LOG: database system is ready to accept read only connections wal_e.operator.backup INFO MSG: promoted prefetched wal segment STRUCTURED: time=2021-02-05T11:43:52.079899-00 pid=96108 action=wal-fetch key=gs://<bucket_name>/wal_005/000000010000000000000009.lzo prefix=gs://<bucket_name>/ seg=000000010000000000000009 2021-02-05 11:43:52.173 UTC  LOG: restored log file "000000010000000000000009" from archive wal_e.operator.backup INFO MSG: promoted prefetched wal segment [...] STRUCTURED: time=2021-02-05T11:43:55.677788-00 pid=96171 action=wal-fetch key=gs://<bucket_name>/wal_005/00000001000000000000000D.lzo prefix=gs://<bucket_name> seg=00000001000000000000000D 2021-02-05 11:43:55.785 UTC  LOG: restored log file "00000001000000000000000D" from archive wal_e.operator.backup INFO MSG: promoted prefetched wal segment STRUCTURED: time=2021-02-05T11:43:56.590586-00 pid=96194 action=wal-fetch key=gs://<bucket_name>/wal_005/00000001000000000000000E.lzo prefix=gs://<bucket_name> seg=00000001000000000000000E 2021-02-05 11:43:56.697 UTC  LOG: restored log file "00000001000000000000000E" from archive 2021-02-05 11:43:57.289 UTC  LOG: recovery stopping before commit of transaction 590, time 2021-02-05 11:27:58.183341+00 2021-02-05 11:43:57.289 UTC  LOG: recovery has paused 2021-02-05 11:43:57.289 UTC  HINT: Execute pg_wal_replay_resume() to continue.
It looks like restore has been completed successfully. As mentioned before, cluster is still running in standby mode, therefore it is not accepting client connections and running in read-only mode. Let’s promote it to master.
postgres=# SELECT pg_wal_replay_resume();
After running this command, file
recovery.conf will be renamed to
recovery.done and from now database will be running as fully recovered master. In other words – it’s ready to handle connections, so don’t hesitate to switch load balancer to point on it. To be sure whether everything is ok, let’s have a look again at logs:
2021-02-05 11:45:43.400 UTC  LOG: redo done at 0/E61FCF8
It’s important to remember that there are other backup mechanisms available in Postgres – like streaming replication. However, it’s even more important to distinct PITR from it – when disaster appears, corrupted data is replicated almost instantaneously to replica – therefore it’s not a valid backup anymore. It’s copy of broken master. Thus – replication is no backup.