ODF 02:扩展存储集群

对于内部部署的 ODF 集群要增加容量有两种方式:

  • 横向扩展(增加新的存储节点)
  • 纵向扩展(现有节点增加或启用新的硬盘)

扩展容量的一些限制:

  • 从技术角度来看,2000 节点数是 ODF 的限制
  • ODF 不支持异构 OSD/Disk 大小
  • 对具有三个故障域的部署,以三的倍数添加磁盘来扩展存储,且来自每个故障域中的节点的磁盘数量相同

增加存储节点硬盘

扩展使用本地存储设备创建的集群,首先需要满足以下条件:

  • 正在运行的 ODF 集群
  • 用于扩展的磁盘已经添加到存储节点
  • LocalVolumeDiscoveryLocalVolumeSet 对象已创建

正常情况下,当添加磁盘到现有存储节点时,LocalVolumeDiscovery 会自动发现并通过 LocalVolumeset 创建出对应的 PV,但是在前面的部署中,我手动将 LocalVolumeSet 中的 maxDeviceCount 设置为了 1 ,也就是每个节点只提供一个 PV,现在在此处更新资源的 YAML 定义,可以删除该限制,或者将值修改成预期的值,例如 2:

修改完成后,查看 PV,就可以看到对应的存储类已经提供了新的 PV,并且处于 Available 的状态:

oc get pv | grep imxcai-lvmset
local-pv-373f3bb3                          500Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-2c2jjc   imxcai-lvmset                          18h
local-pv-60eced28                          500Gi      RWO            Delete           Available                                                                 imxcai-lvmset                          87s
local-pv-74b62d61                          500Gi      RWO            Delete           Available                                                                 imxcai-lvmset                          87s
local-pv-77d7b4ac                          500Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-1n4688   imxcai-lvmset                          18h
local-pv-78888abc                          500Gi      RWO            Delete           Available                                                                 imxcai-lvmset                          87s
local-pv-8786c248                          500Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-0jdx57   imxcai-lvmset                          18h

接下来进入到 OCS Operator,在 Storage Cluster 处选择 Add Capacity:

修改完成后,查看 PV,就可以看到对应的存储类已经提供了新的 PV,并且处于 Available 的状态:

oc get pv | grep imxcai-lvmset
local-pv-373f3bb3                          500Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-2c2jjc   imxcai-lvmset                          18h
local-pv-60eced28                          500Gi      RWO            Delete           Available                                                                 imxcai-lvmset                          87s
local-pv-74b62d61                          500Gi      RWO            Delete           Available                                                                 imxcai-lvmset                          87s
local-pv-77d7b4ac                          500Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-1n4688   imxcai-lvmset                          18h
local-pv-78888abc                          500Gi      RWO            Delete           Available                                                                 imxcai-lvmset                          87s
local-pv-8786c248                          500Gi      RWO            Delete           Bound       openshift-storage/ocs-deviceset-imxcai-lvmset-0-data-0jdx57   imxcai-lvmset                          18h

接下来进入到 OCS Operator,在 Storage Cluster 处选择 Add Capacity:

等待状态变为 Ready

在扩展之前,当前有三个 OSD:

oc get pods -n openshift-storage | grep osd
rook-ceph-osd-0-cfb48979c-vcqc6                                   2/2     Running     0          18h
rook-ceph-osd-1-54b679bb85-st2qw                                  2/2     Running     0          18h
rook-ceph-osd-2-5569b7cc47-mqw46                                  2/2     Running     0          18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-0q8gj7   0/1     Completed   0          18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-1drqvg   0/1     Completed   0          18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-2f77k9   0/1     Completed   0          18h

完成扩展后,有六个 OSD:

oc get pods -n openshift-storage | grep osd
rook-ceph-osd-0-cfb48979c-vcqc6                                   2/2     Running     0          18h
rook-ceph-osd-1-54b679bb85-st2qw                                  2/2     Running     0          18h
rook-ceph-osd-2-5569b7cc47-mqw46                                  2/2     Running     0          18h
rook-ceph-osd-3-55c845784f-dj7tb                                  2/2     Running     0          12s
rook-ceph-osd-4-66ffdf545-scgld                                   2/2     Running     0          11s
rook-ceph-osd-5-6c49476c9c-n4k9m                                  2/2     Running     0          11s
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-0q8gj7   0/1     Completed   0          18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-1drqvg   0/1     Completed   0          18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-2f77k9   0/1     Completed   0          18h
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-37rb52   0/1     Completed   0          25s
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-45wqhk   0/1     Completed   0          24s
rook-ceph-osd-prepare-ocs-deviceset-imxcai-lvmset-0-data-5bwkpz   0/1     Completed   0          24s

容量得到了扩展,随后等待集群重平衡然后变为 Health 的状态。

也可以通过 Ceph 客户端访问集群来验证状态:

oc exec -it -n openshift-storage rook-ceph-operator-7df548cc9-gb8tr -- /bin/bash
bash-4.4$ ceph -c /var/lib/rook/openshift-storage/openshift-storage.config -s
  cluster:
    id:     43d582a9-10cd-4830-8ad7-fbf45a04598c
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 12m)
    mgr: a(active, since 12m)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
    osd: 6 osds: 6 up (since 2m), 6 in (since 2m)
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)

  data:
    pools:   10 pools, 176 pgs
    objects: 366 objects, 138 MiB
    usage:   6.3 GiB used, 2.9 TiB / 2.9 TiB avail
    pgs:     176 active+clean

  io:
    client:   938 B/s rd, 29 KiB/s wr, 1 op/s rd, 3 op/s wr

增加新的存储节点

如果是增加新的存储节点,分为两个步骤:

  • 添加新节点
  • 扩展存储容量

添加新节点

新的节点加入到集群后,只需为新节点打上 node-role.kubernetes.io/worker= cluster.ocs.openshift.io/openshift-storage= 标签,ODF 会自动的使用新节点:

oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

验证该标签下是否存在新节点:

oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1

确认在新节点上至少有以下 Pod 处于 Running 的状态:

  • csi-cephfsplugin-*
  • csi-rbdplugin-*

扩展存储容量

这一步跟前面对现有存储节点添加新硬盘流程一致。

发表评论

您的邮箱地址不会被公开。 必填项已用 * 标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据

滚动至顶部