Pg_probackup, a robust backup solution for PostgreSQL, made its debut in 2016 and has undergone significant enhancements over the years. The latest version, 2.5.15, is available on GitHub, while version 2.8.6 is included in the Postgres Pro product suite. This versatile tool supports 14 platforms and provides three types of incremental backups, with features such as merging backups and quick replica restoration. It also integrates seamlessly with S3 and CFS, and users can join a dedicated Telegram chat for community support.
Versions of Pg_probackup
- 2.5.15 Community: Available on GitHub with only critical bug fixes applied; open-source.
- 2.8.6 STD: A reworked and optimized version designed for large datasets, included with Postgres Pro Standard; closed-source.
- 2.8.6 ENT: Similar to STD but adds CFS support and S3 compatibility; closed-source.
- 3 and above: A complete overhaul aimed at improved speed and efficiency; closed-source for the next few years.
The New Architecture of Pg_probackup 3
The development of pg_probackup 3 was driven by a backlog of client requests and internal ideas, alongside the realization that the original architecture had become cumbersome due to numerous quick fixes. The previous design had reached a point where maintenance was increasingly challenging, necessitating a fresh start.
Clients expressed dissatisfaction with several aspects, including the need to maintain separate builds for different PostgreSQL versions, the cumbersome backup format, and the requirement for multiple SSH connections. To address these issues, the decision was made to redesign pg_probackup from the ground up, resulting in a modular architecture comprising three independent components:
- A DB core extension for reading files and tracking changes for incremental backups.
- LibProbackup3, the core library that connects the backup application to the database, serving as a versatile SDK for custom solutions.
- The backup application itself, which manages communication with the database and writes backup data to disk.
Initially, the plan involved developing core functions and reimplementing all features from the previous version. However, it became clear that a more focused approach was necessary. The primary goal was to ensure data integrity and facilitate rapid recovery, leading to a revised plan that prioritized key elements for backup creation and recovery.
A Closer Look at Libprobackup3
Libprobackup3 is pivotal in managing communication with the Postgres Pro core and allows integration with third-party backup systems. Written in C++, it provides a C interface that simplifies integration with various programming languages, including Python and Go. This library directly interacts with the DB core extension to manage archival file streams, metadata, and backup logic.
Here’s a simple integration example in Go:
~$ golang_sample backup -pgport $SOURCEPORT -pgdata $BASE -backup-id $FULL -backup-mode FULL -backup-source=pro -storage=fs
The New Replication Protocol in Pg_probackup 3
Pg_probackup 3 introduces a custom communication protocol designed to optimize data reading, writing, and transfer without relying on physical files. This new approach enables multi-threaded I/O and simplifies the backup process.
The protocol is inspired by pg_basebackup and introduces commands that allow for full and incremental backups without direct database access. The new replication command structure facilitates efficient data handling, ensuring that both data and WAL files can be transferred within a single connection, thus eliminating timing issues.
The backup modes now include:
- FULL: Creates a complete database backup.
- DELTA: Incremental mode based on page-by-page file comparison.
- PTRACK: Tracks changes and records the corresponding LSN at the moment of the event.
- PAGE: Creates a change map during backup using LSN and checksums.
A Bit of Practice
To utilize the new protocol, users need to adjust their configuration files accordingly. The application interface remains largely unchanged, with the addition of the –backup-source=pro option when invoking pg_probackup.
For example:
pg_probackup3 backup -B $BACKUPDIR3 --instance $INSTANCE -b FULL --backup-source=pro --backup-id=0-full
What’s New with Recovery in Pg_probackup 3?
One of the standout features of pg_probackup 3 is the FUSE capability, allowing users to restore data directly from a backup without unpacking it entirely. This innovative approach enables quick recovery from failures, as the backup file can be mounted in memory, facilitating immediate access to critical data.
Recovery options have also been enhanced, allowing for selective restoration of databases and WAL files, while ensuring that service data is copied and discrepancies are automatically resolved during the recovery process.
Performance Testing
Comprehensive performance testing was conducted to compare version 2.8.6 with the latest pg_probackup 3.0.0. The tests involved creating full and incremental backups on a randomly generated 1 TB database, analyzing performance across various thread counts.
The results indicated that while version 2.8.6 performed better in single-threaded scenarios, pg_probackup 3.0.0 demonstrated superior performance as the number of threads increased, particularly in the PRO mode for PTRACK incrementals.
What’s Next?
The roadmap for pg_probackup 3 is clear, focusing on the release of the current version, preparation of an open-source variant, optimization of multi-threading capabilities, and implementation of additional planned features for pg_probackup 3.1. Feedback and suggestions from the community are welcomed as development continues.