文章大纲
对于内部部署的 ODF 集群要增加容量有两种方式:
- 横向扩展(增加新的存储节点)
- 纵向扩展(现有节点增加或启用新的硬盘)
扩展容量的一些限制:
- 从技术角度来看,2000 节点数是 ODF 的限制
- ODF 不支持异构 OSD/Disk 大小
- 对具有三个故障域的部署,以三的倍数添加磁盘来扩展存储,且来自每个故障域中的节点的磁盘数量相同
增加存储节点硬盘
扩展使用本地存储设备创建的集群,首先需要满足以下条件:
- 正在运行的 ODF 集群
- 用于扩展的磁盘已经添加到存储节点
LocalVolumeDiscovery
和LocalVolumeSet
对象已创建
正常情况下,当添加磁盘到现有存储节点时,LocalVolumeDiscovery
会自动发现并通过 LocalVolumeset
创建出对应的 PV,但是在前面的部署中,我手动将 LocalVolumeSet
中的 maxDeviceCount
设置为了 1
,也就是每个节点只提供一个 PV,现在在此处更新资源的 YAML 定义,可以删除该限制,或者将值修改成预期的值,例如 2
:
修改完成后,查看 PV,就可以看到对应的存储类已经提供了新的 PV,并且处于 Available 的状态:
oc get pv | grep imxcai-lvmset
local-pv-373f3bb3 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-2c2jjc imxcai-lvmset 18h
local-pv-60eced28 500Gi RWO Delete Available imxcai-lvmset 87s
local-pv-74b62d61 500Gi RWO Delete Available imxcai-lvmset 87s
local-pv-77d7b4ac 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-1n4688 imxcai-lvmset 18h
local-pv-78888abc 500Gi RWO Delete Available imxcai-lvmset 87s
local-pv-8786c248 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-0jdx57 imxcai-lvmset 18h
接下来进入到 OCS Operator,在 Storage Cluster 处选择 Add Capacity:
修改完成后,查看 PV,就可以看到对应的存储类已经提供了新的 PV,并且处于 Available 的状态:
oc get pv | grep imxcai-lvmset
local-pv-373f3bb3 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-2c2jjc imxcai-lvmset 18h
local-pv-60eced28 500Gi RWO Delete Available imxcai-lvmset 87s
local-pv-74b62d61 500Gi RWO Delete Available imxcai-lvmset 87s
local-pv-77d7b4ac 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-1n4688 imxcai-lvmset 18h
local-pv-78888abc 500Gi RWO Delete Available imxcai-lvmset 87s
local-pv-8786c248 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-0jdx57 imxcai-lvmset 18h
接下来进入到 OCS Operator,在 Storage Cluster 处选择 Add Capacity:
等待状态变为 Ready
在扩展之前,当前有三个 OSD:
oc get pods -n openshift-storage | grep osd
rook-ceph-osd-0-cfb48979c-vcqc6 2/2 Running 0 18h
rook-ceph-osd-1-54b679bb85-st2qw 2/2 Running 0 18h
rook-ceph-osd-2-5569b7cc47-mqw46 2/2 Running 0 18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-0q8gj7 0/1 Completed 0 18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-1drqvg 0/1 Completed 0 18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-2f77k9 0/1 Completed 0 18h
完成扩展后,有六个 OSD:
oc get pods -n openshift-storage | grep osd
rook-ceph-osd-0-cfb48979c-vcqc6 2/2 Running 0 18h
rook-ceph-osd-1-54b679bb85-st2qw 2/2 Running 0 18h
rook-ceph-osd-2-5569b7cc47-mqw46 2/2 Running 0 18h
rook-ceph-osd-3-55c845784f-dj7tb 2/2 Running 0 12s
rook-ceph-osd-4-66ffdf545-scgld 2/2 Running 0 11s
rook-ceph-osd-5-6c49476c9c-n4k9m 2/2 Running 0 11s
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-0q8gj7 0/1 Completed 0 18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-1drqvg 0/1 Completed 0 18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-2f77k9 0/1 Completed 0 18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-37rb52 0/1 Completed 0 25s
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-45wqhk 0/1 Completed 0 24s
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-5bwkpz 0/1 Completed 0 24s
容量得到了扩展,随后等待集群重平衡然后变为 Health 的状态。
也可以通过 Ceph 客户端访问集群来验证状态:
oc exec -it -n openshift-storage rook-ceph-operator-7df548cc9-gb8tr -- /bin/bash
bash-4.4$ ceph -c /var/lib/rook/openshift-storage/openshift-storage.config -s
cluster:
id: 43d582a9-10cd-4830-8ad7-fbf45a04598c
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 12m)
mgr: a(active, since 12m)
mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
osd: 6 osds: 6 up (since 2m), 6 in (since 2m)
rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
data:
pools: 10 pools, 176 pgs
objects: 366 objects, 138 MiB
usage: 6.3 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 176 active+clean
io:
client: 938 B/s rd, 29 KiB/s wr, 1 op/s rd, 3 op/s wr
增加新的存储节点
如果是增加新的存储节点,分为两个步骤:
- 添加新节点
- 扩展存储容量
添加新节点
新的节点加入到集群后,只需为新节点打上 node-role.kubernetes.io/worker= cluster.ocs.openshift.io/openshift-storage=
标签,ODF 会自动的使用新节点:
oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
验证该标签下是否存在新节点:
oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
确认在新节点上至少有以下 Pod 处于 Running 的状态:
csi-cephfsplugin-*
csi-rbdplugin-*
扩展存储容量
这一步跟前面对现有存储节点添加新硬盘流程一致。