Skip to content

Commit

Permalink
feat: add resource to pacemaker
Browse files Browse the repository at this point in the history
  • Loading branch information
alemorvan committed Jul 9, 2024
1 parent 1bca7f2 commit 9d3c40c
Showing 1 changed file with 255 additions and 43 deletions.
298 changes: 255 additions & 43 deletions docs/books/web_services/07-high-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ This availability is a performance measure expressed as a percentage obtained by
| 99,999% | 5 minutes, 15 seconds |
| 99,9999% | 31,68 seconds |

High Availability" (**HA**) refers to all the measures taken to guarantee the highest possible availability of a service, i.e. its correct operation 24 hours a day.
"High Availability" (**HA**) refers to all the measures taken to guarantee the highest possible availability of a service, i.e. its correct operation 24 hours a day.

### Overview

Expand Down Expand Up @@ -79,8 +79,8 @@ In this chapter, you will learn about Pacemaker, a clustering solution.

**Objectives**: In this chapter, you will learn how to:

:heavy_check_mark: installer et configurer un cluster Pacemaker
:heavy_check_mark: administrer un cluster Pacemaker
:heavy_check_mark: install and configure a Pacemaker cluster;
:heavy_check_mark: administer a Pacemaker cluster.

:checkered_flag: **clustering**, **ha**, **high availability**, **pcs**, **pacemaker**

Expand Down Expand Up @@ -249,12 +249,9 @@ sudo firewall-cmd --permanent --add-service=high-availability
sudo firewall-cmd --reload
```

The services can now be activated for the next startup:
!!! NOTE

```bash
sudo systemctl enable corosync
sudo systemctl enable pacemaker
```
Do not start the services now, as they are not configured, they will not work.

### Cluster managment

Expand Down Expand Up @@ -282,9 +279,13 @@ hacluster:x:189:189:cluster user:/var/lib/pacemaker:/sbin/nologin
On all nodes, assign an identical password to the hacluster user:

```bash
echo "mdphacluster" | sudo passwd --stdin hacluster
echo "pwdhacluster" | sudo passwd --stdin hacluster
```

!!! NOTE

Please replace "pwdhacluster" with a more secure password.

From any node, it is possible to authenticate as a hacluster user on all nodes, then use the `pcs` commands on them:

```bash
Expand All @@ -298,7 +299,7 @@ server2: Authorized
From the node on which pcs is authenticated, launch the cluster configuration:

```bash
$ sudo pcs cluster setup --start mycluster server1 server2
$ sudo pcs cluster setup mycluster server1 server2
No addresses specified for host 'server1', using 'server1'
No addresses specified for host 'server2', using 'server2'
Destroying cluster on hosts: 'server1', 'server2'...
Expand Down Expand Up @@ -330,6 +331,14 @@ server1: Starting Cluster...
server2: Starting Cluster...
```

Enable the cluster service to start on boot:

```bash
sudo pcs cluster enable --all
```

Check the service status:

```bash
$ sudo pcs status
Cluster name: mycluster
Expand Down Expand Up @@ -357,67 +366,272 @@ Daemon Status:
pcsd: active/enabled
```

### Security
#### Adding ressources

### Workshop
Before we can configure the resources, we'll need to deal with the alert message:

For this workshop, you will need two servers with Pacemaker services installed, configured and secured as described in the previous chapters.
```bash
WARNINGS:
No stonith devices and stonith-enabled is not false
```

You will configure a higly available Apache cluster.
In this state, Pacemaker will refuse to start our new resources.

Our two servers have the following IP addresses:
You have two choices:

* server1: 192.168.1.100
* server2: 192.168.1.101
* disable `stonith`
* configure it

First, we will disable stonith until we learn how to configure it:

#### Task 1 : XXX
```bash
sudo pcs property set stonith-enabled=false
```

#### Task 2 : XXX
!!! WARNING

#### Task 3 : XXX
Be careful not to leave stonith disabled on a production environment!!!!

#### Task 4 : XXX
##### VIP configuration

### Check your Knowledge
The first resource we're going to create on our cluster is a VIP.

:heavy_check_mark: Simple question? (3 answers)
The standard resources available are provided by the `pcs resource standards` command:

:heavy_check_mark: Question with multiple answers?
```bash
$ pcs resource standards
lsb
ocf
service
systemd
```

* [ ] Answer 1
* [ ] Answer 2
* [ ] Answer 3
* [ ] Answer 4
This VIP, corresponding to the IP address used by customers to access future cluster services, will be assigned to one of the nodes. Then, in the event of failure, the cluster will switch this resource from one node to another to ensure continuity of service.

## PCS
```bash
pcs resource create myclusterVIP ocf:heartbeat:IPaddr2 ip=192.168.1.12 cidr_netmask=24 op monitor interval=30s
```

In this chapter, you will learn about XXXXXXX.
The `ocf:heartbeat:IPaddr2` argument is made up of 3 fields that provide pacemaker with :

****
* the standard (here `ocf`),
* the script namespace (here `heartbeat`),
* the resource script name.

**Objectives**: In this chapter, you will learn how to:
The result is the addition of a virtual IP address to the list of managed resources:

:heavy_check_mark: XXX
:heavy_check_mark: XXX
```bash
$ sudo pcs status
Cluster name: mycluster

...
Cluster name: mycluster
Cluster Summary:
* Stack: corosync (Pacemaker is running)
...
* 2 nodes configured
* 1 resource instance configured

Full List of Resources:
* myclusterVIP (ocf:heartbeat:IPaddr2): Started server1
...
```

:checkered_flag: **XXX**, **XXX**
In this case, VIP is active on server1, which can be verified with the `ip` command:

```bash
$ ip add show dev enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:df:29:09 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.12/24 brd 192.168.1.255 scope global secondary enp0s3
valid_lft forever preferred_lft forever
```

**Knowledge**: :star:
**Complexity**: :star:
###### Toggle tests

**Reading time**: XX minutes
From anyware on the network, run the ping command on the VIP :

****
```bash
ping 192.168.1.12
```

### Generalities
Put the active node on standby:

```bash
sudo pcs node standby server1
```

Check that no ping was lost during the operation: (no missing `icmp_seq`)

```bash
64 bytes from 192.168.1.12: icmp_seq=39 ttl=64 time=0.419 ms
64 bytes from 192.168.1.12: icmp_seq=40 ttl=64 time=0.043 ms
64 bytes from 192.168.1.12: icmp_seq=41 ttl=64 time=0.129 ms
64 bytes from 192.168.1.12: icmp_seq=42 ttl=64 time=0.074 ms
64 bytes from 192.168.1.12: icmp_seq=43 ttl=64 time=0.099 ms
64 bytes from 192.168.1.12: icmp_seq=44 ttl=64 time=0.044 ms
64 bytes from 192.168.1.12: icmp_seq=45 ttl=64 time=0.021 ms
64 bytes from 192.168.1.12: icmp_seq=46 ttl=64 time=0.058 ms
```

Check the cluster status:

```bash
$ sudo pcs status
Cluster name: mycluster
Cluster Summary:
...
* 2 nodes configured
* 1 resource instance configured

Node List:
* Node server1: standby
* Online: [ server2 ]

Full List of Resources:
* myclusterVIP (ocf:heartbeat:IPaddr2): Started server2
```

The VIP has been moved to server2. Check with the `ip add` command as seen previously.

Return server1 to the pool:

```bash
sudo pcs node unstandby server1
```

Note that once server1 has been unstandby, the cluster returns to its normal state, but the resource is not transferred back to server1: it remains on server2.

##### Service configuration

We will install the Apache service on both nodes of our cluster. This service will only be started on the active node, and will switch nodes at the same time as the VIP in the event of failure of the active node.

Please refer to the apache chapter for detailed installation instructions.

Installation must be performed on both nodes:

```bash
$ sudo dnf install -y httpd
$ sudo firewall-cmd --permanent --add-service=http
$ sudo firewall-cmd --reload
```

!!! WARNING

Don't start or activate the service yourself, pacemaker will take care of it.

An HTML page containing the server name will be displayed by default:

```bash
echo "<html><body>Node $(hostname -f)</body></html>" | sudo tee "/var/www/html/index.html"
```

The Pacemaker resource agent will use the `/server-status` page (see apache chapter) to determine its health status. It must be activated by creating the file `/etc/httpd/conf.d/status.conf` on both servers:

```bash
sudo vim /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Require local
</Location>
```

To create a resource we'll call "WebSite", we'll call the apache script of the OCF resource and in the heartbeat namespace.

```bash
sudo pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=1min
```

The cluster will check Apache's health every minute (`op monitor interval=1min`).

Finally, to ensure that the Apache service is started on the same node as the VIP address, a constraint must be added to the cluster:

```bash
sudo pcs constraint colocation add WebSite with myclusterVIP INFINITY
```

The Apache service can also be configured to start after the VIP, which can be useful if Apache VHosts are configured to listen to the VIP address (`Listen 192.168.1.12`):

```bash
$ sudo pcs constraint order myclusterVIP then WebSite
Adding myclusterVIP WebSite (kind: Mandatory) (Options: first-action=start then-action=start)
```

###### Testing the failover

We will perform a failover and test that our webserver is still available:

```bash
$ sudo pcs status
Cluster name: mycluster
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: server1 (version 2.1.7-5.el9_4-0f7f88312) - partition with quorum
...

Node List:
* Online: [ server1 server2 ]

Full List of Resources:
* myclusterVIP (ocf:heartbeat:IPaddr2): Started server1
* WebSite (ocf:heartbeat:apache): Started server1
```

We are currently working on server1.

```bash
$ curl http://192.168.1.12/
<html><body>Node server1</body></html>
```

Simulate a failure on server1:

```bash
sudo pcs node standby server1
```

```bash
$ curl http://192.168.1.12/
<html><body>Node server2</body></html>
```

As you can see, our webservice is still working but on server2 now.

```bash
sudo pcs node unstandby server1
```

Please note that the service was only interrupted for a few seconds while the VIP was switched over and the services restarted.

### Cluster troubleshooting

### Configuration

### Security

### Workshop

For this workshop, you will need two servers with Pacemaker services installed, configured and secured as described in the previous chapters.

You will configure a higly available Apache cluster.

Our two servers have the following IP addresses:

* server1: 192.168.1.10
* server2: 192.168.1.11

If you do not have a service to resolve names, fill the `/etc/hosts` file with content like the following:

```bash
$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.10 server1 server1.rockylinux.lan
192.168.1.11 server2 server2.rockylinux.lan
```

#### Task 1 : XXX

#### Task 2 : XXX
Expand All @@ -436,5 +650,3 @@ In this chapter, you will learn about XXXXXXX.
* [ ] Answer 2
* [ ] Answer 3
* [ ] Answer 4

-->

0 comments on commit 9d3c40c

Please sign in to comment.