Rocky Linux集群环境构建实战打造高效稳定的企业服务器架构

引言

在当今数字化转型的浪潮中，企业对服务器架构的稳定性、高效性和可靠性要求越来越高。Rocky Linux作为CentOS的替代品，已经成为企业级Linux发行版的首选之一。本文将详细介绍如何构建基于Rocky Linux的高效稳定集群环境，为企业提供强大的服务器架构支持。

1. Rocky Linux概述

Rocky Linux是一个社区支持的企业级操作系统，设计为与Red Hat Enterprise Linux (RHEL) 100%二进制兼容。它由CentOS的创始人Gregory Kurtzer发起，旨在填补CentOS转向CentOS Stream后留下的空白。

1.1 Rocky Linux的优势

稳定性：提供长期支持版本，适合企业关键应用
安全性：及时的安全更新和补丁
兼容性：与RHEL完全兼容，无需担心软件兼容性问题
社区支持：活跃的社区提供技术支持和持续开发
免费使用：无需许可费用，降低企业IT成本

2. 集群环境的基础概念

2.1 什么是集群

集群是一组相互连接的计算机，它们作为一个统一的计算资源工作，提供高可用性、负载均衡和并行处理能力。

2.2 集群的类型

高可用性集群(HA Cluster)：确保关键应用持续可用，减少停机时间
负载均衡集群：分配工作负载，优化资源使用
高性能计算集群(HPC)：用于处理复杂的计算任务
存储集群：提供集中式、高可用的存储解决方案

2.3 集群架构的关键组件

节点(Node)：集群中的单个服务器
负载均衡器(Load Balancer)：分发请求到不同节点
集群管理软件：如Pacemaker、Corosync等
共享存储：如NFS、iSCSI或分布式存储系统
心跳机制：监控节点健康状态
故障转移(Failover)：在节点故障时自动切换服务

3. Rocky Linux集群环境规划

3.1 需求分析

在构建集群前，需明确以下需求：

应用类型：Web服务、数据库、文件服务等
性能要求：CPU、内存、存储、网络带宽
可用性要求：预期的正常运行时间百分比
扩展性需求：未来可能的扩展规模
预算限制：硬件、软件和维护成本

3.2 集群规模设计

根据需求确定集群规模：

小型集群：2-3个节点，适合小型企业或部门级应用
中型集群：4-8个节点，适合中型企业应用
大型集群：9个以上节点，适合大型企业或云服务提供商

3.3 拓扑结构设计

常见的集群拓扑结构：

主动/被动模式：一个节点提供服务，另一个作为备份
主动/主动模式：所有节点同时提供服务
N层架构：前端Web服务器、中间应用服务器、后端数据库服务器分层设计

4. 硬件和网络准备

4.1 硬件要求

4.1.1 服务器硬件

CPU：64位处理器，建议使用Intel Xeon或AMD EPYC系列
内存：至少16GB RAM，根据应用需求调整
存储：SSD用于系统盘，HDD或高性能SSD用于数据存储
网络接口：至少双网卡，建议使用万兆网卡

4.1.2 共享存储

SAN存储：通过光纤通道或iSCSI连接
NAS存储：通过NFS或SMB协议访问
分布式存储：如Ceph、GlusterFS等

4.2 网络配置

4.2.1 网络拓扑

[Internet] -> [防火墙] -> [负载均衡器] -> [集群节点] | v [管理网络]

4.2.2 IP地址规划

为每个节点分配多个IP地址：

公共IP：对外提供服务的IP地址
私有IP：内部通信的IP地址
心跳IP：专用于集群心跳检测的IP地址
管理IP：用于系统管理的IP地址

4.2.3 网络配置示例

# 编辑网络配置文件 vi /etc/sysconfig/network-scripts/ifcfg-eth0 # 示例配置 TYPE=Ethernet BOOTPROTO=static DEFROUTE=yes NAME=eth0 DEVICE=eth0 ONBOOT=yes IPADDR=192.168.1.10 PREFIX=24 GATEWAY=192.168.1.1 DNS1=8.8.8.8 DNS2=8.8.4.4 # 重启网络服务 systemctl restart network

5. Rocky Linux系统安装和基础配置

5.1 系统安装

5.1.1 下载Rocky Linux

从官方网站下载最新的Rocky Linux ISO镜像：

wget https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9.1-x86_64-dvd.iso

5.1.2 创建启动介质

使用dd命令创建USB启动盘：

# 确定USB设备名称 lsblk # 创建启动盘（假设USB设备为/dev/sdb） dd if=Rocky-9.1-x86_64-dvd.iso of=/dev/sdb bs=4M status=progress

5.1.3 安装步骤

从USB启动盘启动计算机
选择”Install Rocky Linux”
配置语言、键盘和时区
配置网络和主机名
配置磁盘分区（建议使用LVM以便于扩展）
设置root密码和创建用户
开始安装并等待完成

5.2 系统基础配置

5.2.1 更新系统

# 更新系统软件包 dnf update -y # 安装常用工具 dnf install -y vim wget curl net-tools telnet

5.2.2 配置主机名和hosts文件

# 设置主机名 hostnamectl set-hostname node1.example.com # 编辑hosts文件 vi /etc/hosts # 添加以下内容 192.168.1.10 node1.example.com node1 192.168.1.11 node2.example.com node2 192.168.1.12 node3.example.com node3

5.2.3 配置时间同步

# 安装chrony时间同步服务 dnf install -y chrony # 启动并设置开机自启 systemctl start chronyd systemctl enable chronyd # 检查时间同步状态 chronyc sources

5.2.4 配置防火墙

# 启动防火墙 systemctl start firewalld systemctl enable firewalld # 开放必要端口（以Web服务为例） firewall-cmd --permanent --add-service=http firewall-cmd --permanent --add-service=https firewall-cmd --permanent --add-service=ssh # 重新加载防火墙配置 firewall-cmd --reload

5.2.5 禁用SELinux（可选）

# 检查SELinux状态 sestatus # 临时禁用SELinux setenforce 0 # 永久禁用SELinux（编辑配置文件） vi /etc/selinux/config # 将SELINUX=enforcing改为SELINUX=disabled SELINUX=disabled

6. 集群软件安装和配置

6.1 高可用性集群软件

6.1.1 安装Pacemaker和Corosync

# 安装高可用性集群软件包 dnf install -y pcs pacemaker corosync fence-agents-all # 设置hacluster用户密码 echo "password" | passwd --stdin hacluster # 启动pcsd服务并设置开机自启 systemctl start pcsd systemctl enable pcsd

6.1.2 配置集群认证

# 在所有节点上认证集群节点（只需在一个节点上执行） pcs host auth node1.example.com node2.example.com node3.example.com -u hacluster -p password

6.1.3 创建集群

# 创建集群（只需在一个节点上执行） pcs cluster setup --name mycluster node1.example.com node2.example.com node3.example.com # 启动集群 pcs cluster start --all # 设置集群开机自启 pcs cluster enable --all # 检查集群状态 pcs status

6.2 负载均衡软件

6.2.1 安装HAProxy

# 安装HAProxy dnf install -y haproxy # 配置HAProxy vi /etc/haproxy/haproxy.cfg # 基本配置示例 global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 frontend http-in bind *:80 default_backend servers backend servers balance roundrobin server node1 192.168.1.10:80 check server node2 192.168.1.11:80 check server node3 192.168.1.12:80 check # 启动HAProxy服务 systemctl start haproxy systemctl enable haproxy

6.2.2 安装Nginx作为负载均衡器

# 安装Nginx dnf install -y nginx # 配置Nginx作为负载均衡器 vi /etc/nginx/nginx.conf # 添加upstream和server配置 http { upstream backend { server 192.168.1.10:80; server 192.168.1.11:80; server 192.168.1.12:80; } server { listen 80; location / { proxy_pass http://backend; } } } # 启动Nginx服务 systemctl start nginx systemctl enable nginx

7. 高可用性配置

7.1 配置浮动IP

# 创建浮动IP资源 pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s # 确保浮动IP在集群启动时启动 pcs constraint colocation add ClusterIP with cluster

7.2 配置Web服务高可用性

7.2.1 安装Web服务器

# 在所有节点上安装Apache dnf install -y httpd # 创建测试页面 echo "<h1>Node $(hostname)</h1>" > /var/www/html/index.html # 启动Apache服务 systemctl start httpd systemctl enable httpd

7.2.2 配置Apache为集群资源

# 创建Apache资源 pcs resource create WebServer ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s # 设置资源约束 pcs constraint colocation add WebServer with ClusterIP INFINITY pcs constraint order ClusterIP then WebServer

7.3 配置STONITH设备

STONITH (Shoot The Other Node In The Head) 是一种确保数据完整性的机制，当节点发生故障时，它会强制重启或关闭故障节点。

# 配置fence_xvm设备（示例） pcs stonith create vm-fence fence_xvm pcmk_host_map="node1.example.com:node1;node2.example.com:node2;node3.example.com:node3" op monitor interval=60s # 启用STONITH pcs property set stonith-enabled=true

8. 负载均衡设置

8.1 配置HAProxy高可用性

# 安装HAProxy（如果尚未安装） dnf install -y haproxy # 配置HAProxy vi /etc/haproxy/haproxy.cfg # 添加以下配置 global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 listen stats bind *:9000 stats enable stats uri /stats stats refresh 30s stats show-node stats auth admin:password frontend http-in bind *:80 default_backend servers backend servers balance roundrobin cookie SERVERID insert indirect nocache server node1 192.168.1.10:80 check cookie node1 server node2 192.168.1.11:80 check cookie node2 server node3 192.168.1.12:80 check cookie node3 # 创建HAProxy资源 pcs resource create HAProxy systemd:haproxy op monitor interval=20s # 设置资源约束 pcs constraint colocation add HAProxy with ClusterIP pcs constraint order ClusterIP then HAProxy

8.2 配置Keepalived实现VIP高可用

# 安装Keepalived dnf install -y keepalived # 配置Keepalived vi /etc/keepalived/keepalived.conf # 主节点配置示例 vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 101 advert_int 1 authentication { auth_type PASS auth_pass password } virtual_ipaddress { 192.168.1.100/24 dev eth0 } track_script { chk_haproxy } } # 备节点配置示例（priority值较低） vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass password } virtual_ipaddress { 192.168.1.100/24 dev eth0 } track_script { chk_haproxy } } # 启动Keepalived服务 systemctl start keepalived systemctl enable keepalived

9. 存储解决方案

9.1 配置NFS共享存储

9.1.1 NFS服务器配置

# 安装NFS服务器 dnf install -y nfs-utils # 创建共享目录 mkdir -p /data/shared chmod 777 /data/shared # 配置NFS共享 vi /etc/exports # 添加以下内容 /data/shared 192.168.1.0/24(rw,sync,no_root_squash) # 启动NFS服务 systemctl start nfs-server systemctl enable nfs-server # 导出共享目录 exportfs -a

9.1.2 NFS客户端配置

# 安装NFS客户端 dnf install -y nfs-utils # 创建挂载点 mkdir -p /mnt/nfs # 挂载NFS共享 mount 192.168.1.100:/data/shared /mnt/nfs # 添加到fstab实现开机自动挂载 echo "192.168.1.100:/data/shared /mnt/nfs nfs defaults 0 0" >> /etc/fstab

9.2 配置iSCSI共享存储

9.2.1 iSCSI目标服务器配置

# 安装iSCSI目标软件 dnf install -y targetcli # 启动并设置开机自启 systemctl start target systemctl enable target # 配置iSCSI目标 targetcli # 创建后端存储 /backstores/block create disk1 /dev/sdb1 # 创建iSCSI目标 /iscsi create iqn.2023-01.com.example:storage.disk1 # 创建LUN /iscsi/iqn.2023-01.com.example:storage.disk1/tpg1/luns create /backstores/block/disk1 # 设置ACL /iscsi/iqn.2023-01.com.example:storage.disk1/tpg1/acls create iqn.2023-01.com.example:client # 保存配置 saveconfig exit

9.2.2 iSCSI发起端配置

# 安装iSCSI发起端软件 dnf install -y iscsi-initiator-utils # 配置发起端名称 vi /etc/iscsi/initiatorname.iscsi # 设置为与目标服务器ACL匹配的名称 InitiatorName=iqn.2023-01.com.example:client # 启动并设置开机自启 systemctl start iscsid systemctl enable iscsid # 发现目标 iscsiadm -m discovery -t st -p 192.168.1.100 # 登录目标 iscsiadm -m node -l # 查看新发现的磁盘 lsblk # 分区并格式化新磁盘 fdisk /dev/sdb mkfs.ext4 /dev/sdb1 # 挂载新磁盘 mkdir -p /mnt/iscsi mount /dev/sdb1 /mnt/iscsi # 添加到fstab实现开机自动挂载 echo "/dev/sdb1 /mnt/iscsi ext4 defaults,_netdev 0 0" >> /etc/fstab

9.3 配置Ceph分布式存储

9.3.1 安装Ceph部署工具

# 安装Ceph部署工具 dnf install -y cephadm # 配置Ceph仓库 cephadm add-repo --release pacific # 安装Ceph Common dnf install -y ceph-common

9.3.2 部署Ceph集群

# 引导Ceph集群 cephadm bootstrap --mon-ip 192.168.1.10 # 安装Ceph CLI工具 cephadm install ceph-common # 添加其他节点到集群 ceph orch host add node2 192.168.1.11 ceph orch host add node3 192.168.1.12 # 部署OSD（假设使用/dev/sdb作为OSD磁盘） ceph orch daemon add osd node1:/dev/sdb ceph orch daemon add osd node2:/dev/sdb ceph orch daemon add osd node3:/dev/sdb # 创建Ceph池 ceph osd pool create mypool 64 64 # 创建Ceph文件系统 ceph fs new myfs myfs_metadata myfs_data # 挂载Ceph文件系统 mkdir -p /mnt/cephfs mount -t ceph 192.168.1.10:6789:/ /mnt/cephfs # 添加到fstab实现开机自动挂载 echo "192.168.1.10:6789:/ /mnt/cephfs ceph name=admin,secretfile=/etc/ceph/secret.key,noatime,_netdev 0 0" >> /etc/fstab

10. 监控和维护

10.1 安装和配置Zabbix监控系统

10.1.1 Zabbix服务器安装

# 安装Zabbix仓库 rpm -Uvh https://repo.zabbix.com/zabbix/5.0/rhel/9/x86_64/zabbix-release-5.0-1.el9.noarch.rpm dnf clean all # 安装Zabbix服务器、前端和代理 dnf install -y zabbix-server-mysql zabbix-web-mysql zabbix-apache-conf zabbix-sql-scripts zabbix-agent # 安装MariaDB数据库 dnf install -y mariadb-server mariadb # 启动MariaDB并设置开机自启 systemctl start mariadb systemctl enable mariadb # 配置MariaDB mysql_secure_installation # 创建Zabbix数据库和用户 mysql -u root -p create database zabbix character set utf8 collate utf8_bin; create user zabbix@localhost identified by 'password'; grant all privileges on zabbix.* to zabbix@localhost; quit; # 导入Zabbix数据库架构 zcat /usr/share/doc/zabbix-sql-scripts/mysql/create.sql.gz | mysql -uzabbix -p zabbix # 配置Zabbix服务器 vi /etc/zabbix/zabbix_server.conf # 设置数据库密码 DBPassword=password # 启动Zabbix服务器和代理 systemctl restart zabbix-server zabbix-agent httpd php-fpm systemctl enable zabbix-server zabbix-agent httpd php-fpm

10.1.2 配置Zabbix前端

访问 http://zabbix-server-ip/zabbix
按照安装向导完成前端配置
默认用户名：Admin，密码：zabbix

10.1.3 添加集群节点监控

# 在所有集群节点上安装Zabbix代理 rpm -Uvh https://repo.zabbix.com/zabbix/5.0/rhel/9/x86_64/zabbix-release-5.0-1.el9.noarch.rpm dnf clean all dnf install -y zabbix-agent # 配置Zabbix代理 vi /etc/zabbix/zabbix_agentd.conf # 设置服务器IP Server=192.168.1.100 ServerActive=192.168.1.100 Hostname=node1.example.com # 启动Zabbix代理 systemctl start zabbix-agent systemctl enable zabbix-agent

10.2 安装和配置Prometheus和Grafana

10.2.1 安装Prometheus

# 创建Prometheus用户 useradd --no-create-home --shell /bin/false prometheus # 下载Prometheus wget https://github.com/prometheus/prometheus/releases/download/v2.36.2/prometheus-2.36.2.linux-amd64.tar.gz tar -xvzf prometheus-2.36.2.linux-amd64.tar.gz # 移动文件到合适位置 mkdir -p /etc/prometheus /var/lib/prometheus cp prometheus-2.36.2.linux-amd64/prometheus /usr/local/bin/ cp prometheus-2.36.2.linux-amd64/promtool /usr/local/bin/ cp -r prometheus-2.36.2.linux-amd64/console* /etc/prometheus/ cp -r prometheus-2.36.2.linux-amd64/prometheus.yml /etc/prometheus/ # 设置权限 chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus chown prometheus:prometheus /usr/local/bin/prometheus chown prometheus:prometheus /usr/local/bin/promtool # 创建systemd服务文件 vi /etc/systemd/system/prometheus.service # 添加以下内容 [Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries [Install] WantedBy=multi-user.target # 启动Prometheus服务 systemctl start prometheus systemctl enable prometheus

10.2.2 安装Node Exporter

# 在所有集群节点上安装Node Exporter useradd --no-create-home --shell /bin/false node_exporter wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz tar -xvzf node_exporter-1.3.1.linux-amd64.tar.gz cp node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/ chown -R node_exporter:node_exporter /usr/local/bin/node_exporter # 创建systemd服务文件 vi /etc/systemd/system/node_exporter.service # 添加以下内容 [Unit] Description=Node Exporter After=network.target [Service] User=node_exporter Group=node_exporter Type=simple ExecStart=/usr/local/bin/node_exporter [Install] WantedBy=multi-user.target # 启动Node Exporter服务 systemctl start node_exporter systemctl enable node_exporter

10.2.3 配置Prometheus监控集群节点

# 编辑Prometheus配置文件 vi /etc/prometheus/prometheus.yml # 添加以下内容到scrape_configs部分 - job_name: 'cluster_nodes' static_configs: - targets: ['node1.example.com:9100'] - targets: ['node2.example.com:9100'] - targets: ['node3.example.com:9100'] # 重启Prometheus服务 systemctl restart prometheus

10.2.4 安装和配置Grafana

# 安装Grafana仓库 dnf install -y grafana # 启动Grafana服务 systemctl start grafana-server systemctl enable grafana-server # 配置防火墙 firewall-cmd --permanent --add-port=3000/tcp firewall-cmd --reload # 访问Grafana Web界面（http://grafana-server-ip:3000） # 默认用户名：admin，密码：admin # 添加Prometheus数据源 1. 登录Grafana 2. 进入Configuration > Data Sources 3. 点击Add data source 4. 选择Prometheus 5. 设置URL为http://prometheus-server-ip:9090 6. 点击Save & Test # 导入Node Exporter仪表板 1. 进入Dashboards > Import 2. 输入仪表板ID：1860 3. 点击Load 4. 选择Prometheus数据源 5. 点击Import

11. 安全性考虑

11.1 系统安全加固

11.1.1 配置SSH安全

# 编辑SSH配置文件 vi /etc/ssh/sshd_config # 修改以下配置 PermitRootLogin no PasswordAuthentication no Port 2222 AllowUsers adminuser # 重启SSH服务 systemctl restart sshd

11.1.2 配置防火墙规则

# 配置防火墙规则 firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" service name="ssh" accept' firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" service name="http" accept' firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" service name="https" accept' firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="5405" accept' firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port protocol="udp" port="5404" accept' firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="21064" accept' firewall-cmd --reload

11.1.3 安装和配置Fail2ban

# 安装Fail2ban dnf install -y fail2ban # 创建配置文件 cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local # 编辑配置文件 vi /etc/fail2ban/jail.local # 添加以下内容 [sshd] enabled = true port = 2222 filter = sshd logpath = /var/log/secure maxretry = 3 bantime = 3600 # 启动Fail2ban服务 systemctl start fail2ban systemctl enable fail2ban

11.2 集群安全配置

11.2.1 配置集群通信加密

# 生成Corosync密钥 corosync-keygen # 复制密钥到所有节点 scp /etc/corosync/authkey node2.example.com:/etc/corosync/ scp /etc/corosync/authkey node3.example.com:/etc/corosync/ # 设置正确的权限 chmod 400 /etc/corosync/authkey

11.2.2 配置Pacemaker安全

# 设置Pacemaker属性 pcs property set stonith-enabled=true pcs property set no-quorum-policy=stop pcs property set symmetric-cluster=true pcs property set default-resource-stickiness=100

12. 实战案例

12.1 Web服务器集群案例

12.1.1 案例描述

构建一个高可用的Web服务器集群，包含3个节点，使用Apache作为Web服务器，HAProxy作为负载均衡器，NFS作为共享存储。

12.1.2 架构设计

[Internet] -> [防火墙] -> [HAProxy (VIP: 192.168.1.100)] -> [Web服务器集群] | v [NFS共享存储]

12.1.3 实施步骤

系统准备

# 在所有节点上更新系统 dnf update -y # 安装必要软件包 dnf install -y vim wget curl net-tools telnet # 配置主机名和hosts文件 hostnamectl set-hostname node1.example.com echo "192.168.1.10 node1.example.com node1" >> /etc/hosts echo "192.168.1.11 node2.example.com node2" >> /etc/hosts echo "192.168.1.12 node3.example.com node3" >> /etc/hosts # 配置时间同步 dnf install -y chrony systemctl start chronyd systemctl enable chronyd

安装和配置集群软件

# 在所有节点上安装集群软件 dnf install -y pcs pacemaker corosync fence-agents-all # 设置hacluster用户密码 echo "password" | passwd --stdin hacluster # 启动pcsd服务 systemctl start pcsd systemctl enable pcsd # 在node1上认证集群节点 pcs host auth node1.example.com node2.example.com node3.example.com -u hacluster -p password # 创建集群 pcs cluster setup --name webcluster node1.example.com node2.example.com node3.example.com # 启动集群 pcs cluster start --all pcs cluster enable --all

配置浮动IP

# 创建浮动IP资源 pcs resource create WebVIP ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s

安装和配置Web服务器

# 在所有节点上安装Apache dnf install -y httpd # 创建测试页面 mkdir -p /var/www/html echo "<h1>Web Server Cluster</h1>" > /var/www/html/index.html # 启动Apache服务 systemctl start httpd systemctl enable httpd

配置Apache为集群资源

# 创建Apache资源 pcs resource create WebServer ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s # 设置资源约束 pcs constraint colocation add WebServer with WebVIP INFINITY pcs constraint order WebVIP then WebServer

安装和配置NFS共享存储

# 在专用存储服务器上安装NFS dnf install -y nfs-utils # 创建共享目录 mkdir -p /data/web chmod 777 /data/web # 配置NFS共享 echo "/data/web 192.168.1.0/24(rw,sync,no_root_squash)" >> /etc/exports # 启动NFS服务 systemctl start nfs-server systemctl enable nfs-server exportfs -a

在Web服务器节点上挂载NFS共享

# 在所有Web服务器节点上安装NFS客户端 dnf install -y nfs-utils # 创建挂载点 mkdir -p /var/www/html # 挂载NFS共享 mount 192.168.1.100:/data/web /var/www/html # 添加到fstab echo "192.168.1.100:/data/web /var/www/html nfs defaults,_netdev 0 0" >> /etc/fstab

安装和配置HAProxy

# 在专用负载均衡器节点上安装HAProxy dnf install -y haproxy # 配置HAProxy vi /etc/haproxy/haproxy.cfg # 添加以下配置 global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 listen stats bind *:9000 stats enable stats uri /stats stats refresh 30s stats show-node stats auth admin:password frontend http-in bind *:80 default_backend servers backend servers balance roundrobin cookie SERVERID insert indirect nocache server node1 192.168.1.10:80 check cookie node1 server node2 192.168.1.11:80 check cookie node2 server node3 192.168.1.12:80 check cookie node3 # 启动HAProxy服务 systemctl start haproxy systemctl enable haproxy

测试集群

# 检查集群状态 pcs status # 测试Web服务 curl http://192.168.1.100 # 模拟节点故障 pcs node standby node1.example.com # 再次测试Web服务 curl http://192.168.1.100 # 恢复节点 pcs node unstandby node1.example.com

12.2 数据库集群案例

12.2.1 案例描述

构建一个高可用的MySQL数据库集群，包含3个节点，使用Galera Cluster进行多主复制，HAProxy作为负载均衡器。

12.2.2 架构设计

[应用服务器] -> [HAProxy (VIP: 192.168.1.100)] -> [MySQL Galera集群] (node1, node2, node3)

12.2.3 实施步骤

系统准备

# 在所有节点上更新系统 dnf update -y # 安装必要软件包 dnf install -y vim wget curl net-tools telnet # 配置主机名和hosts文件 hostnamectl set-hostname dbnode1.example.com echo "192.168.1.10 dbnode1.example.com dbnode1" >> /etc/hosts echo "192.168.1.11 dbnode2.example.com dbnode2" >> /etc/hosts echo "192.168.1.12 dbnode3.example.com dbnode3" >> /etc/hosts # 配置时间同步 dnf install -y chrony systemctl start chronyd systemctl enable chronyd

安装MariaDB和Galera

# 在所有节点上安装MariaDB和Galera dnf install -y mariadb-server mariadb-client galera # 启动MariaDB服务 systemctl start mariadb systemctl enable mariadb # 运行安全安装脚本 mysql_secure_installation

配置Galera集群

# 在所有节点上创建Galera配置文件 vi /etc/my.cnf.d/galera.cnf # 添加以下内容 [mysqld] binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 bind-address=0.0.0.0 # Galera Provider Configuration wsrep_on=ON wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so # Galera Cluster Configuration wsrep_cluster_name="my_galera_cluster" wsrep_cluster_address="gcomm://dbnode1.example.com,dbnode2.example.com,dbnode3.example.com" # Galera Synchronization Configuration wsrep_sst_method=rsync # Galera Node Configuration wsrep_node_address="dbnode1.example.com" # 在每个节点上使用对应的主机名 wsrep_node_name="dbnode1" # 在每个节点上使用对应的节点名

启动集群

# 在第一个节点上启动集群 systemctl stop mariadb galera_new_cluster # 在其他节点上启动MariaDB systemctl start mariadb # 检查集群状态 mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

安装和配置HAProxy

# 在专用负载均衡器节点上安装HAProxy dnf install -y haproxy # 配置HAProxy vi /etc/haproxy/haproxy.cfg # 添加以下配置 global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon defaults mode tcp log global option httplog option dontlognull option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s maxconn 3000 listen stats bind *:9000 stats enable stats uri /stats stats refresh 30s stats show-node stats auth admin:password listen mysql-cluster bind *:3306 mode tcp balance roundrobin option mysql-check user haproxy_check server dbnode1 192.168.1.10:3306 check server dbnode2 192.168.1.11:3306 check server dbnode3 192.168.1.12:3306 check # 启动HAProxy服务 systemctl start haproxy systemctl enable haproxy

创建监控用户

# 在所有MySQL节点上创建监控用户 mysql -u root -p CREATE USER 'haproxy_check'@'%'; FLUSH PRIVILEGES; quit;

测试集群

# 测试数据库连接 mysql -h 192.168.1.100 -u root -p # 创建测试数据库和表 CREATE DATABASE testdb; USE testdb; CREATE TABLE testtable (id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(50)); INSERT INTO testtable (name) VALUES ('test'); # 在另一个节点上验证数据复制 mysql -h 192.168.1.11 -u root -p USE testdb; SELECT * FROM testtable; # 模拟节点故障 systemctl stop mariadb # 测试高可用性 mysql -h 192.168.1.100 -u root -p USE testdb; INSERT INTO testtable (name) VALUES ('test2'); # 恢复节点 systemctl start mariadb # 验证数据同步 mysql -h 192.168.1.10 -u root -p USE testdb; SELECT * FROM testtable;