Kubekey 环境中 Etcd 集群搭建及维护
· 5 min read
etcd 是一个分布式、高可用的键值存储系统,专为配置管理、服务发现和分布式协调而设计。它由 CoreOS 团队开发,使用 Go 语言编写,并基于 Raft 一致性算法实现数据一致性和高可用性。etcd 是 Kubernetes 等云原生生态系统的核心组件,用于存储集群的元数据和配置信息。本篇文章用于记录 kubekey 环境 etcd 故障修复。
一般情况下 kubekey 会自动把 etcd 集群环境搭建起来,但二般情况也也有不是那自然的问题。例如作者在初 kubekey 中 Etcd 就遇到etcd中其中一个节点没能正确初始化。 以下内容包括:
- etcd 集群初始化
- 集群成员添加
- 集群成员查看
- 集群定期备份
1. etc 集群初始化
基本环境
安装 etcd 的三台机器:
- 192.168.126.67
- 192.168.126.68
- 192.168.126.69
第一个节点启动
配置文件
下载 etcd 后,需要手编写两个配置文件
cat /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target
[Service]
User=root
Type=notify
Nice=-20
OOMScoreAdjust=-1000
EnvironmentFile=/etc/etcd.env
ExecStart=/usr/local/bin/etcd
NotifyAccess=all
RestartSec=10s
LimitNOFILE=40000
Restart=always
[Install]
WantedBy=multi-user.targe
cat /etc/etcd.env
# Environment file for etcd v3.5.13
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://192.168.126.67:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.126.67:2380
ETCD_INITIAL_CLUSTER_STATE=existing
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://192.168.126.67:2379,https://127.0.0.1:2379
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://192.168.126.67:2380
ETCD_NAME=etcd-node1
ETCD_PROXY=off
ETCD_ENABLE_V2=true
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=10000
# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=true
# CLI settings
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
证书生成不在介绍。
启动
sudo systemctl start etcd
TIPS: 最后完成后,需要把对应的: ETCD_INITIAL_CLUSTER 更改为三个节点的信息
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380
第二个节点启动
配置文件参考,第一个节点,主要是证书部分需要为第二节点生成证书
# 1. 停掉 etcd
sudo systemctl stop etcd
# 2. 清理 etcd 现有 data-dir
sudo rm -rf /var/lib/etcd
# 3. 在 /etc/ectd.env 中添加第一个节点
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380
# 4. 在第一个节点加成成员
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
etcdctl member add etcd-node2 --peer-urls=https://192.168.126.68:2380
确认成员存在:
etcdctl member list
# 5. 第二个节点上启动 etcd
sudo systemctl start etcd
数据会自行同步过来
最后修定 /etc/etcd.env
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380
第三个节点启动
配置文件参考,第一个节点,主要是证书部分需要为第二节点生成证书
# 1. 停掉 etcd
sudo systemctl stop etcd
# 2. 清理 etcd 现有 data-dir
sudo rm -rf /var/lib/etcd
# 3. 在 /etc/ectd.env 中添加第一个节点
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380
# 4. 在第一个节点加成成员
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
etcdctl member add etcd-node3 --peer-urls=https://192.168.126.69:2380
确认成员存在:
etcdctl member list
# 5. 第二个节点上启动 etcd
sudo systemctl start etcd
数据会自行同步过来
最后修定 /etc/etcd.env
ETCD_INITIAL_CLUSTER=etcd-node1=https://192.168.126.67:2380,etcd-node2=https://192.168.126.68:2380,etcd-node3=https://192.168.126.69:2380
集群确认
etcdctl member list
etcdctl endpoint status --cluster -w table
❯ etcdctl endpoint status --cluster -w table
ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
---|---|---|---|---|---|---|---|---|---|
https://192.168.126.67:2379 | 13328a4825671d43 | 3.5.13 | 19 MB | true | false | 2 | 3392456 | 3392456 | |
https://192.168.126.69:2379 | 20087e80a529fd83 | 3.5.13 | 20 MB | false | false | 2 | 3392456 | 3392456 | |
https://192.168.126.68:2379 | f5c10ae40fa61a7e | 3.5.13 | 19 MB | false | false | 2 | 3392456 | 3392456 |
集群备份
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-node1-key.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-node1.pem
etcdctl snapshot save etcd_`date +%Y_%m_%d`.db
建议每天备份一次
参考:
- https://etcd.io/docs/v3.6/tutorials/how-to-deal-with-membership/
- https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/
💬 社区支持
有问题与我们的团队联系:Slack微信:82565387