Once, a moment of fatigue led to a significant mishap in my production database, resulting in the loss of approximately 30% of revenue and countless hours of stress. This experience ignited a passion within me to create a PostgreSQL backup and monitoring tool, which I have since open-sourced and now utilize across various projects.
Table of content
- About the open-source PostgreSQL backup project
- The story of how I broke the DB and couldn’t fully recover
- How I started building the project
- Roadmap & future plans
- DB safety rules I now enforce on every project
About the open-source PostgreSQL backup project
After two years of development, I decided to open-source my PostgreSQL backup and monitoring tool. Initially crafted for my professional needs and a few personal projects, it has evolved into a robust solution that I believe could benefit the wider community.
Stack: The tool is built using Go, gin, gorm, React, TypeScript, and PostgreSQL, all neatly packaged in Docker. The original version was developed in Java, but a transition to Go allowed for improved performance and functionality.
At its core, the tool serves as a user-friendly interface for the standard pg_dump command, enhanced with additional features for a smoother user experience and integrations with external storage and notification services.
What it can do:
- Schedule backups at specified intervals, such as daily at 4 AM or weekly at midnight, for PostgreSQL versions 13 to 17.
- Store compressed backups locally, or in cloud storage options like S3 or Google Drive, with plans to include additional storage solutions.
- Provide notifications after each backup, confirming success or failure, with optional alerts.
- Alert users via Discord, email, Telegram, or Slack if the database becomes unresponsive, employing a threshold to avoid false alarms and displaying an availability graph.
This project is free, open-source (MIT license), self-hosted, and features an intuitive web UI. For more information, visit the project site or check out the GitHub repository. If you find it useful, I would greatly appreciate a star on GitHub, as those initial stars can be quite challenging to obtain.
The story of how I broke the DB and couldn’t fully recover
In 2023, I was managing a pet project that served as a wrapper for ChatGPT (3.5), essentially reselling API access with a user-friendly interface. As the project gained traction, it reached a revenue peak of about ,500 per month. However, during this time, I encountered a critical issue: I inadvertently corrupted the database.
It was a Friday night, and fatigue had set in as I transitioned from coding to responding to messages. A customer requested a change to their account email. In a moment of distraction, I accessed the production VPS via SSH and executed an UPDATE command that inadvertently affected around 10,000 rows. The realization of my mistake hit me like a cold wave.
Despite being aware of best practices—such as running a SELECT first or setting a SAVEPOINT—I had neglected these crucial steps. The result was catastrophic: all user emails were overwritten, leading to a cascade of complaints as users were locked out of their accounts.
Upon checking the backups, I was met with further dismay; the most recent backup was over a month old. With new payments rolling in and subscriptions being canceled, I scrambled to reconstruct about 65% of the database using scripts based on user IDs. The remainder required canceling subscriptions and issuing refunds, a painful and costly process that left a lasting impression on me.
How I started building the project
This experience prompted me to create a reliable backup tool that would notify me daily of the system’s status and allow for quick restorations. The first iteration of Postgresus was developed in Java within a month, and after testing it with friends, I realized its potential value.
As I continued to refine the tool based on feedback, it became evident that it was not only beneficial for me but also for others. The name “Postgresus” emerged two months later, replacing the initial title of “pg-web-backup”.
Currently, Postgresus addresses several key needs:
- Main backup tool: Ideal for smaller projects hosted on a VPS.
- Fallback backup tool: Provides an additional layer of security for larger projects utilizing a Database as a Service (DBaaS) with its own backups, ensuring that there is always a backup plan in case of unforeseen issues.
Roadmap & future plans I’m planning to push the project in these directions:
- Enhance PostgreSQL-specific load monitoring (pg_stat_activity, pg_stat_system, pg_locks) with a user-friendly interface, offering an alternative to postgres_exporter + Grafana, bundled with backup features.
- Implement monitoring and alerts for slow query performance, allowing for proactive optimization of key functions as they grow.
- Collect query statistics based on CPU time and execution frequency to identify resource allocation and improvement opportunities.
- Expand notification channels and add support for more storage providers.
DB safety rules I now enforce on every project
In light of my experience, I have adopted several essential safety rules:
- Before executing any
UPDATE, always perform aSELECTto confirm that only 1-2 rows will be affected. - For significant changes, manually set a
SAVEPOINT. - Conduct restoration drills at least once every three months to ensure that backups are functional and that the restoration process is efficient.
Over the past two years, I have successfully restored from backups multiple times, both in the cloud and using Postgresus, thanks to the rigorous testing and adherence to these safety protocols. The lessons learned have proven invaluable, reinforcing the importance of basic safety measures in database management.