To practice this guide, PostgreSQL version should be at least 9.6
First, setup a master-slave replication for PostgreSQL.
Then switch the slave server to master.
In our case, the database is almost 300GB, it would take at least 3 hours to migrate to a new server, including export, transfer and import time.
Sometimes, it’s acceptable to shutdown sites or apps for a short time to do some maintain, but what if the data grows and the migration may takes days or weeks?
For services in-production, you would expect it always online.
Yes, we hope we can keep the service online while transferring data and switch it in minutes.
I have checked How to Set-Up Master-Slave Replication for PostgreSQL 9.6 on Ubuntu 16.04 and tried many times of making replicas on the new server but always fail.
It reports the same error:
requested WAL segment 00000001000001CD00000055 has already been removed every time.
After did some research and I found that’s because my database is too big, transfer would take too much time, and the
wal_keep_segments is not enough for that long time of transferring files.
Maybe it can be fixed by increasing
wal_keep_segments to a bigger number, but that introduces another problem: we need to calculate the proper number, enough keeping WAL files after the long and unsure time of data transferring, and not too big that would slow down the running service.
Of course, that’s not the best solution.
Later, I found a more reliable and more robust solution in PostgreSQL’s mail list:
You could use
-X streaminstead of
--xlog(which is an alias for
-X fetch). This consumes two wal senders instead of one, but greatly reduce the probability of having this error.
The only way to really prevent this error is using replication slots, but the support for pg_basebackup is only available in 9.6.
I would show you how to use
replication slots and use stream to transfer the transaction log files(WAL files) below.
PostgreSQL disabled replication slots by default, we need to setup by adding lines to the bottom of
synchronous_standby_names = 'slave001'
Then restart the PostgreSQL service by
service postgresql-9.6 restart.
Connect to PostgreSQL by
sudo -u postgres psql
select pg_create_physical_replication_slot('slot_for_migration', true);
We can check the progress by selecting:
select * from pg_replication_slots;
If you want to delete this slot later after migration, you can use this command:
Use it after remove the initial prime if you know what this command would do.
Run in psql as user postgres:
CREATE USER replica REPLICATION LOGIN ENCRYPTED PASSWORD 'your password here';
Add these lines below to master’s pg_hba.conf
Then restart the PostgreSQL server.
Run on the slave server:
pg_basebackup -h 10.10.10.10 -p 5432 -S slot_for_migration -U replica -D /var/lib/pgsql/9.6/data --checkpoint=fast -R -P -X stream
It would ask you to type password, and after that, you need to wait for it until it finished syncing. This step is very time consuming, always takes hours or even days.
service postgresql-9.6 restart, there should be a sync before the standby server up and run, wouldn’t take too long if you didn’t leave the console too long after step 4.
Then we can connect to the server now by using psql and do some check, make sure your data are the latest.
pg_ctl promote -D