Skip to content

How components work together

This document explains how components of the proposed high-availability architecture work together.

Database and DSC layers

Let’s start with the database and DCS layers as they are interconnected and work closely together.

Every database node hosts PostgreSQL and Patroni instances.

Each PostgreSQL instance in the cluster maintains consistency with other members through streaming replication. Streaming replication is asynchronous by default, meaning that the primary does not wait for the secondaries to acknowledge the receipt of the data to consider the transaction complete.

Each Patroni instance runs on top of and manages its own PostgreSQL instance. This means that Patroni starts and stops PostgreSQL and manages its configuration.

Patroni is also responsible for creating and managing the PostgreSQL cluster. It performs the initial cluster initialization and monitors the cluster state. To do so, Patroni relies on and uses the Distributed Configuration Store (DCS), represented by etcd in our architecture.

Though Patroni supports various Distributed Configuration Stores like ZooKeeper, etcd, Consul or Kubernetes, we recommend and support etcd as the most popular DCS due to its simplicity, consistency and reliability.

Note that the PostgreSQL cluster and Patroni cluster are the same thing, and we will use these names interchangeably.

When you start Patroni, it writes the cluster configuration information in etcd. During the initial cluster initialization, Patroni uses the etcd locking mechanism to ensure that only one instance becomes the primary. This mechanism ensures that only a single process can hold a resource at a time avoiding race conditions and inconsistencies.

You start Patroni instances one by one so the first instance acquires the lock with a lease in etcd and becomes the primary PostgreSQL node. The other instances join the primary as replicas, waiting for the lock to be released.

If the current primary node crashes, its lease on the lock in etcd expires. The lock is automatically released after its expiration time. etcd the starts a new election and a standby node attempts to acquire the lock to become the new primary.

Patroni uses not only etcd locking mechanism. It also uses etcd to store the current state of the cluster, ensuring that all nodes are aware of the latest changes.

Another important component is the watchdog. It runs on each database node. The purpose of watchdog is to prevent split-brain scenarios, where multiple nodes might mistakenly think they are the primary node. The watchdog monitors the node’s health by receiving periodic “keepalive” signals from Patroni. If these signals stop due to a crash, high system load or any other reason, the watchdog resets the node to ensure it does not cause inconsistencies.

Load balancing layer

This layer consists of HAProxy and keepalived.

HAProxy acts as a single point of entry to your cluster for client applications. It accepts all requests from client applications and distributes the load evenly across the cluster nodes. It can route read/write requests to the primary and read-only requests to the secondary nodes. This behavior is defined within HAProxy configuration. To determine the current primary node, HAProxy queries the Patroni REST API.

HAProxy also serves as the connection pooler. It manages a pool of reusable database connections to optimize performance and resource usage. Instead of creating and closing a new connection for every database request, HAProxy maintains a set of open connections that can be shared among multiple clients.

HAProxy must be also redundant. You need minimum 2 HAProxy instances (one active and another one standby) to eliminate the single point of failure and be able to perform failover. This is where keepalived comes in.

Keepalived is the failover tool for HAProxy. It provides the virtual IP address (VIP) for HAProxy and monitors its state. When the current active HAProxy node is down, it transfers the VIP to the remaining node and fails over the services there.

Services layer

Finally, the services layer is represented by pgBackRest and PMM.

pgBackRest is deployed as the separate backup server and also as the agents on every database node. pgBackRest makes backups from the one of the secondary nodes and WAL archiving - from the primary. By communicating with its agents, pgBackRest determines the current primary PostgreSQL node.

The monitoring solution is optional but nice to have. It enables you to monitor the health of your high-availability architecture, receive timely alerts should performance issues occur and proactively react to them.

Get expert help

If you need assistance, visit the community forum for comprehensive and free database knowledge, or contact our Percona Database Experts for professional support and services.