跳到主要内容

Kubekey 环境中 Etcd 集群搭建及维护

· 阅读需 5 分钟
wubx
Data AI

etcd 是一个分布式、高可用的键值存储系统,专为配置管理、服务发现和分布式协调而设计。它由 CoreOS 团队开发,使用 Go 语言编写,并基于 Raft 一致性算法实现数据一致性和高可用性。etcd 是 Kubernetes 等云原生生态系统的核心组件,用于存储集群的元数据和配置信息。本篇文章用于记录 kubekey 环境 etcd 故障修复。

一般情况下 kubekey 会自动把 etcd 集群环境搭建起来,但二般情况也也有不是那自然的问题。例如作者在初 kubekey 中 Etcd 就遇到etcd中其中一个节点没能正确初始化。 以下内容包括:

  1. etcd 集群初始化
  2. 集群成员添加
  3. 集群成员查看
  4. 集群定期备份

1. etc 集群初始化

基本环境

安装 etcd 的三台机器:

  • 192.168.126.67
  • 192.168.126.68
  • 192.168.126.69

第一个节点启动

配置文件

下载 etcd 后,需要手编写两个配置文件

cat /etc/systemd/system/etcd.service

[Unit]
Description=etcd
After=network.target

[Service]
User=root
Type=notify
Nice=-20
OOMScoreAdjust=-1000
EnvironmentFile=/etc/etcd.env
ExecStart=/usr/local/bin/etcd
NotifyAccess=all
RestartSec=10s
LimitNOFILE=40000
Restart=always

[Install]
WantedBy=multi-user.targe

cat /etc/etcd.env

# Environment file for etcd v3.5.13
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://192.168.126.67:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.126.67:2380
ETCD_INITIAL_CLUSTER_STATE=existing
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://192.168.126.67:2379,https://127.0.0.1:2379
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://192.168.126.67:2380
ETCD_NAME=etcd-node1
ETCD_PROXY=off
ETCD_ENABLE_V2=true
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=10000

# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_CLIENT_CERT_AUTH=true

ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=true

# CLI settings
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem

证书生成不在介绍。

启动

sudo systemctl start etcd

TIPS: 最后完成后,需要把对应的: ETCD_INITIAL_CLUSTER 更改为三个节点的信息

ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380

第二个节点启动

配置文件参考,第一个节点,主要是证书部分需要为第二节点生成证书

# 1. 停掉 etcd 
sudo systemctl stop etcd

# 2. 清理 etcd 现有 data-dir
sudo rm -rf /var/lib/etcd

# 3. 在 /etc/ectd.env 中添加第一个节点
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380

# 4. 在第一个节点加成成员
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
etcdctl member add etcd-node2 --peer-urls=https://192.168.126.68:2380

确认成员存在:
etcdctl member list

# 5. 第二个节点上启动 etcd
sudo systemctl start etcd

数据会自行同步过来

最后修定 /etc/etcd.env

ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380

第三个节点启动

配置文件参考,第一个节点,主要是证书部分需要为第二节点生成证书

# 1. 停掉 etcd 
sudo systemctl stop etcd

# 2. 清理 etcd 现有 data-dir
sudo rm -rf /var/lib/etcd

# 3. 在 /etc/ectd.env 中添加第一个节点
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380

# 4. 在第一个节点加成成员
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
etcdctl member add etcd-node3 --peer-urls=https://192.168.126.69:2380

确认成员存在:
etcdctl member list

# 5. 第二个节点上启动 etcd
sudo systemctl start etcd

数据会自行同步过来

最后修定 /etc/etcd.env

ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380

集群确认

etcdctl member list 
etcdctl endpoint status --cluster -w table

❯ etcdctl endpoint status --cluster -w table

ENDPOINTIDVERSIONDB SIZEIS LEADERIS LEARNERRAFT TERMRAFT INDEXRAFT APPLIED INDEXERRORS
https://192.168.126.67:237913328a4825671d433.5.1319 MBtruefalse233924563392456
https://192.168.126.69:237920087e80a529fd833.5.1320 MBfalsefalse233924563392456
https://192.168.126.68:2379f5c10ae40fa61a7e3.5.1319 MBfalsefalse233924563392456

集群备份

export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
etcdctl snapshot save etcd_`date +%Y_%m_%d`.db

建议每天备份一次

参考:

💬 社区支持
有问题与我们的团队联系:Slack

微信:82565387