2サーバでcephを組んで見た(未解決)

Proxmox VEの2サーバ+ corosync qdeviceサーバの3サーバ構成でProxmox VEクラスタを作った際に、cephを組めるのかな?と実験してみた

Proxmox VEサーバは CPU6コア、メモリ16GB、システムディスク120GBで作成し、ceph用ストレージとして16GBディスクを3個で稼働させた。

とりあえず動いてる

モニタとマネージャは各サーバに1個ずつ設定した。

詳細確認

まず「ceph health」と「ceph health detail」を実行して確認

root@proxmoxa:~# ceph health
HEALTH_WARN clock skew detected on mon.proxmoxb; Degraded data redundancy: 28/90 objects degraded (31.111%), 25 pgs degraded, 128 pgs undersized
root@proxmoxa:~# ceph health detail
HEALTH_WARN clock skew detected on mon.proxmoxb; Degraded data redundancy: 39/123 objects degraded (31.707%), 35 pgs degraded, 128 pgs undersized
[WRN] MON_CLOCK_SKEW: clock skew detected on mon.proxmoxb
    mon.proxmoxb clock skew 0.305298s > max 0.05s (latency 0.00675958s)
[WRN] PG_DEGRADED: Degraded data redundancy: 39/123 objects degraded (31.707%), 35 pgs degraded, 128 pgs undersized
    pg 2.0 is stuck undersized for 26m, current state active+undersized, last acting [3,1]
    pg 2.1 is stuck undersized for 26m, current state active+undersized, last acting [2,5]
    pg 2.2 is stuck undersized for 26m, current state active+undersized, last acting [5,1]
    pg 2.3 is stuck undersized for 26m, current state active+undersized, last acting [5,2]
    pg 2.4 is stuck undersized for 26m, current state active+undersized+degraded, last acting [1,4]
    pg 2.5 is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.6 is stuck undersized for 26m, current state active+undersized, last acting [1,3]
    pg 2.7 is stuck undersized for 26m, current state active+undersized+degraded, last acting [3,2]
    pg 2.8 is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.9 is stuck undersized for 26m, current state active+undersized, last acting [1,4]
    pg 2.a is stuck undersized for 26m, current state active+undersized+degraded, last acting [1,4]
    pg 2.b is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.c is stuck undersized for 26m, current state active+undersized, last acting [2,3]
    pg 2.d is stuck undersized for 26m, current state active+undersized, last acting [1,3]
    pg 2.e is stuck undersized for 26m, current state active+undersized+degraded, last acting [2,3]
    pg 2.f is stuck undersized for 26m, current state active+undersized, last acting [4,0]
    pg 2.10 is stuck undersized for 26m, current state active+undersized, last acting [2,4]
    pg 2.11 is stuck undersized for 26m, current state active+undersized, last acting [4,1]
    pg 2.1c is stuck undersized for 26m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.1e is stuck undersized for 26m, current state active+undersized+degraded, last acting [2,5]
    pg 2.1f is stuck undersized for 26m, current state active+undersized+degraded, last acting [0,3]
    pg 2.20 is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,1]
    pg 2.21 is stuck undersized for 26m, current state active+undersized, last acting [2,4]
    pg 2.22 is stuck undersized for 26m, current state active+undersized, last acting [3,2]
    pg 2.23 is stuck undersized for 26m, current state active+undersized, last acting [0,3]
    pg 2.24 is stuck undersized for 26m, current state active+undersized, last acting [5,1]
    pg 2.25 is stuck undersized for 26m, current state active+undersized, last acting [4,1]
    pg 2.26 is stuck undersized for 26m, current state active+undersized, last acting [5,2]
    pg 2.27 is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.28 is stuck undersized for 26m, current state active+undersized, last acting [2,3]
    pg 2.29 is stuck undersized for 26m, current state active+undersized+degraded, last acting [3,1]
    pg 2.2a is stuck undersized for 26m, current state active+undersized, last acting [5,0]
    pg 2.2b is stuck undersized for 26m, current state active+undersized, last acting [2,4]
    pg 2.2c is stuck undersized for 26m, current state active+undersized, last acting [2,5]
    pg 2.2d is stuck undersized for 26m, current state active+undersized, last acting [5,2]
    pg 2.2e is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2f is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,0]
    pg 2.30 is stuck undersized for 26m, current state active+undersized+degraded, last acting [4,0]
    pg 2.31 is stuck undersized for 26m, current state active+undersized, last acting [0,5]
    pg 2.32 is stuck undersized for 26m, current state active+undersized, last acting [5,1]
    pg 2.33 is stuck undersized for 26m, current state active+undersized, last acting [3,1]
    pg 2.34 is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,0]
    pg 2.35 is stuck undersized for 26m, current state active+undersized, last acting [1,3]
    pg 2.36 is stuck undersized for 26m, current state active+undersized, last acting [1,4]
    pg 2.37 is stuck undersized for 26m, current state active+undersized, last acting [3,1]
    pg 2.38 is stuck undersized for 26m, current state active+undersized+degraded, last acting [0,5]
    pg 2.39 is stuck undersized for 26m, current state active+undersized, last acting [1,5]
    pg 2.7d is stuck undersized for 26m, current state active+undersized, last acting [0,4]
    pg 2.7e is stuck undersized for 26m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7f is stuck undersized for 26m, current state active+undersized+degraded, last acting [4,1]
root@proxmoxa:~#

続いて「ceph -s」

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            clock skew detected on mon.proxmoxb
            Degraded data redundancy: 120/366 objects degraded (32.787%), 79 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 27m)
    mgr: proxmoxa(active, since 34m), standbys: proxmoxb
    osd: 6 osds: 6 up (since 28m), 6 in (since 29m); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 122 objects, 436 MiB
    usage:   1.0 GiB used, 95 GiB / 96 GiB avail
    pgs:     120/366 objects degraded (32.787%)
             2/366 objects misplaced (0.546%)
             79 active+undersized+degraded
             49 active+undersized
             1  active+clean+remapped

  io:
    client:   15 KiB/s rd, 8.4 MiB/s wr, 17 op/s rd, 13 op/s wr

root@proxmoxa:~#

clock skew detected

まずは「clock skew detected」について確認

[WRN] MON_CLOCK_SKEW: clock skew detected on mon.proxmoxb
    mon.proxmoxb clock skew 0.305298s > max 0.05s (latency 0.00675958s)

MON_CLOCK_SKEW」にある通りサーバ間の時刻に差がある、というもの

mon_clock_drift_allowed が標準では 0.05秒で設定されているものに対して「mon.proxmoxb clock skew 0.305298s」となっているため警告となっている。

今回の検証環境はESXi 8.0 Free版の上に立てているので、全体的な処理パワーが足りずに遅延になっているのではないかと思われるため無視する

設定として無視する場合はproxmox wikiの Ceph Configuration にあるように「ceph config コマンド」で行う

現在の値を確認

root@proxmoxa:~# ceph config get mon mon_clock_drift_allowed
0.050000
root@proxmoxa:~#

設定を変更、今回は0.5ぐらいにしておくか

root@proxmoxa:~# ceph config set mon mon_clock_drift_allowed 0.5
root@proxmoxa:~# ceph config get mon mon_clock_drift_allowed
0.500000
root@proxmoxa:~#

メッセージが消えたことを確認

コマンドを実行しても消えていることを確認

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 427/1287 objects degraded (33.178%), 124 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 40m)
    mgr: proxmoxa(active, since 47m), standbys: proxmoxb
    osd: 6 osds: 6 up (since 41m), 6 in (since 41m); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 429 objects, 1.6 GiB
    usage:   3.4 GiB used, 93 GiB / 96 GiB avail
    pgs:     427/1287 objects degraded (33.178%)
             2/1287 objects misplaced (0.155%)
             124 active+undersized+degraded
             4   active+undersized
             1   active+clean+remapped

  io:
    client:   685 KiB/s rd, 39 KiB/s wr, 7 op/s rd, 2 op/s wr

root@proxmoxa:~#

ceph helth detailからも消えたことを確認

root@proxmoxa:~# ceph health
HEALTH_WARN Degraded data redundancy: 426/1284 objects degraded (33.178%), 124 pgs degraded, 128 pgs undersized
root@proxmoxa:~# ceph health detail
HEALTH_WARN Degraded data redundancy: 422/1272 objects degraded (33.176%), 124 pgs degraded, 128 pgs undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 422/1272 objects degraded (33.176%), 124 pgs degraded, 128 pgs undersized
    pg 2.0 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.1 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.3 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,2]
    pg 2.4 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.5 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.6 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,3]
    pg 2.7 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,2]
    pg 2.8 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.9 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.a is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.b is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.c is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,3]
    pg 2.d is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,3]
    pg 2.e is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,3]
    pg 2.f is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,0]
    pg 2.10 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,4]
    pg 2.11 is active+undersized+degraded, acting [4,1]
    pg 2.1c is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.1e is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,5]
    pg 2.1f is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,3]
    pg 2.20 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.21 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,4]
    pg 2.22 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,2]
    pg 2.23 is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,3]
    pg 2.24 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.25 is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,1]
    pg 2.26 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,2]
    pg 2.27 is stuck undersized for 39m, current state active+undersized, last acting [3,0]
    pg 2.28 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,3]
    pg 2.29 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.2a is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2b is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,4]
    pg 2.2c is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2d is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,2]
    pg 2.2e is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2f is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.30 is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,0]
    pg 2.31 is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,5]
    pg 2.32 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.33 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.34 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.35 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,3]
    pg 2.36 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.37 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.38 is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,5]
    pg 2.39 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,5]
    pg 2.7d is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7e is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7f is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,1]
root@proxmoxa:~#

PG_DEGRADED: Degraded data redundancy

たくさん出ているやつについて調査

まずは PG_DEGRADED を確認・・・

osdがdownしているわけではないので、参考にならなそう

とりあえず関連しそうな状態を確認

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 803/2415 objects degraded (33.251%), 128 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 89m)
    mgr: proxmoxa(active, since 96m), standbys: proxmoxb
    osd: 6 osds: 6 up (since 90m), 6 in (since 91m); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 805 objects, 3.1 GiB
    usage:   6.4 GiB used, 90 GiB / 96 GiB avail
    pgs:     803/2415 objects degraded (33.251%)
             2/2415 objects misplaced (0.083%)
             128 active+undersized+degraded
             1   active+clean+remapped

  io:
    client:   20 KiB/s rd, 13 MiB/s wr, 15 op/s rd, 29 op/s wr

root@proxmoxa:~#
root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  2/6 objects misplaced (33.333%)

pool cephpool id 2
  826/2478 objects degraded (33.333%)
  client io 14 KiB/s rd, 8.4 MiB/s wr, 14 op/s rd, 15 op/s wr

root@proxmoxa:~#
root@proxmoxa:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.06
pool 2 'cephpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 39 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.41
        removed_snaps_queue [2~1]

root@proxmoxa:~#

現状のcephpoolは pg_num=128, pgp_num=128 で作成されている

autoscaleの設定を見てみる

root@proxmoxa:~# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr      452.0k                3.0        98280M  0.0000                                  1.0       1              on         False
cephpool   2234M       20480M   3.0        98280M  0.6252                                  1.0     128              on         False
root@proxmoxa:~#

How I Built a 2-Node HA Proxmox Cluster with Ceph, Podman, and a Raspberry Pi (Yes, It Works)」にやりたいことがあるっぽい

このページでは「ceph config set osd osd_default_size 2」と「ceph config set osd osd_default_min_size 1」を実行しているが、ceph config getで確認してみると、値はない模様

root@proxmoxa:~# ceph config get osd osd_default_size
Error ENOENT: unrecognized key 'osd_default_size'
root@proxmoxa:~# ceph config get osd osd_default_min_size
Error ENOENT: unrecognized key 'osd_default_min_size'
root@proxmoxa:~#

設定出来たりしないかを念のため確認してみたが、エラーとなった

root@proxmoxa:~# ceph config set osd osd_default_size 2
Error EINVAL: unrecognized config option 'osd_default_size'
root@proxmoxa:~# ceph config set osd osd_default_min_size 1
Error EINVAL: unrecognized config option 'osd_default_min_size'
root@proxmoxa:~#

osd_pool_default_sizeとosd_pool_default_min_sizeならばあるので、そちらを設定してみることにした

root@proxmoxa:~# ceph config get osd osd_pool_default_size
3
root@proxmoxa:~# ceph config get osd osd_pool_default_min_size
0
root@proxmoxa:~#
root@proxmoxa:~# ceph config set osd osd_pool_default_size 2
root@proxmoxa:~# ceph config set osd osd_pool_default_min_size 1
root@proxmoxa:~# ceph config get osd osd_pool_default_size
2
root@proxmoxa:~# ceph config get osd osd_pool_default_min_size
1
root@proxmoxa:~#

状態に変化はなさそう

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 885/2661 objects degraded (33.258%), 128 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 2h)
    mgr: proxmoxa(active, since 2h), standbys: proxmoxb
    osd: 6 osds: 6 up (since 2h), 6 in (since 2h); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   7.1 GiB used, 89 GiB / 96 GiB avail
    pgs:     885/2661 objects degraded (33.258%)
             2/2661 objects misplaced (0.075%)
             128 active+undersized+degraded
             1   active+clean+remapped

root@proxmoxa:~# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr      452.0k                3.0        98280M  0.0000                                  1.0       1              on         False
cephpool   2319M       20480M   3.0        98280M  0.6252                                  1.0     128              on         False
root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  2/6 objects misplaced (33.333%)

pool cephpool id 2
  885/2655 objects degraded (33.333%)
  client io 170 B/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.06
pool 2 'cephpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 39 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.41
        removed_snaps_queue [2~1]

root@proxmoxa:~#

ceph osd pool get コマンドで、各プールのsizeとmin_sizeを確認

root@proxmoxa:~# ceph osd pool get cephpool size
size: 3
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 2
root@proxmoxa:~#

設定を変更

root@proxmoxa:~# ceph osd pool set cephpool size 2
set pool 2 size to 2
root@proxmoxa:~# ceph osd pool set cephpool min_size 1
set pool 2 min_size to 1
root@proxmoxa:~# ceph osd pool get cephpool size
size: 2
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 1
root@proxmoxa:~#

状態確認すると、ceph health がHEALTH_OKになっている

root@proxmoxa:~# ceph health
HEALTH_OK
root@proxmoxa:~# ceph health detail
HEALTH_OK
root@proxmoxa:~#

他のステータスは?と確認してみると、問題無く見える

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_OK

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 2h)
    mgr: proxmoxa(active, since 2h), standbys: proxmoxb
    osd: 6 osds: 6 up (since 2h), 6 in (since 2h); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   7.1 GiB used, 89 GiB / 96 GiB avail
    pgs:     2/1776 objects misplaced (0.113%)
             128 active+clean
             1   active+clean+remapped

  io:
    recovery: 1.3 MiB/s, 0 objects/s

root@proxmoxa:~# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr      452.0k                3.0        98280M  0.0000                                  1.0       1              on         False
cephpool   3479M       20480M   2.0        98280M  0.4168                                  1.0     128              on         False
root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  2/6 objects misplaced (33.333%)

pool cephpool id 2
  nothing is going on

root@proxmoxa:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.06
pool 2 'cephpool' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 42 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.17

root@proxmoxa:~#

GUIもHEALTH_OK

障害テスト

片側を停止してどうなるか?

PVEのクラスタ側は生きている

root@proxmoxa:~# ha-manager status
quorum OK
master proxmoxa (active, Wed Jan 21 17:43:39 2026)
lrm proxmoxa (active, Wed Jan 21 17:43:40 2026)
lrm proxmoxb (old timestamp - dead?, Wed Jan 21 17:43:08 2026)
service vm:100 (proxmoxa, started)
root@proxmoxa:~# pvecm status
Cluster information
-------------------
Name:             cephcluster
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jan 21 17:44:11 2026
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.3b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.2.64 (local)
0x00000000          1            Qdevice
root@proxmoxa:~#

しかし、cephのステータスは死んでいる

「ceph helth」コマンドを実行してみると返事が返ってこない

root@proxmoxa:~# ceph health

ダメそうなので、停止したノードを復帰

ceph osd poolの.mgrについてもsizeとmin_sizeを変更

root@proxmoxa:~# ceph osd pool ls
.mgr
cephpool
root@proxmoxa:~# ceph osd pool get .mgr size
size: 3
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 2
root@proxmoxa:~# ceph osd pool set .mgr size 2
set pool 1 size to 2
root@proxmoxa:~# ceph osd pool set .mgr min_size 1
set pool 1 min_size to 1
root@proxmoxa:~# ceph osd pool get .mgr size
size: 2
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 1
root@proxmoxa:~#

で、先のブログにあるようにmonパラメータも変更するため、現在値を確認

root@proxmoxa:~# ceph config get mon mon_osd_min_down_reporters
2
root@proxmoxa:~# ceph config get mon mon_osd_down_out_interval
600
root@proxmoxa:~# ceph config get mon mon_osd_report_timeout
900
root@proxmoxa:~#

これをそれぞれ変更

root@proxmoxa:~# ceph config set mon mon_osd_min_down_reporters 1
root@proxmoxa:~# ceph config set mon mon_osd_down_out_interval 120
root@proxmoxa:~# ceph config set mon mon_osd_report_timeout 90
root@proxmoxa:~# ceph config get mon mon_osd_min_down_reporters
1
root@proxmoxa:~# ceph config get mon mon_osd_down_out_interval
120
root@proxmoxa:~# ceph config get mon mon_osd_report_timeout
90
root@proxmoxa:~#

・・・相変わらずceph -sで応答がなくなる

cephを維持するための3番目のノードをどう作成する?

先ほどの記事の「Faking a Third Node with a Containerized MON」にコンテナとして3つめのceph monを起動させる話が書いてあった

Proxmox VEフォーラムの「3rd Ceph MON on external QDevice (Podman) – 4-node / 2-site cluster」からProxmox VE wikiの「Stretch Cluster」ではceph monではなく「tie-breaker node」を立てるとある

またStretch Clusterでは、先ほど変更したOSDのsize=4, min_size=2 として、2つのノードに2個のレプリカを保証する設定としていた。

とりあえず、OSDのsize/min_sizeを変更する

root@proxmoxa:~# ceph osd pool ls
.mgr
cephpool
root@proxmoxa:~# ceph osd pool get .mgr size
size: 2
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 1
root@proxmoxa:~# ceph osd pool set .mgr size 4
set pool 1 size to 4
root@proxmoxa:~# ceph osd pool set .mgr min_size 2
set pool 1 min_size to 2
root@proxmoxa:~# ceph osd pool get .mgr size
size: 4
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 2
root@proxmoxa:~# ceph osd pool get cephpool size
size: 2
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 1
root@proxmoxa:~# ceph osd pool set cephpool size 4
set pool 2 size to 4
root@proxmoxa:~# ceph osd pool set cephpool min_size 2
set pool 2 min_size to 2
root@proxmoxa:~# ceph osd pool get cephpool size
size: 4
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 2
root@proxmoxa:~#

この状態でceph osd pool statsを取ると先ほどまで33.333%だったものが50.0% なった

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects degraded (50.000%)

pool cephpool id 2
  1770/3540 objects degraded (50.000%)

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 6h)
    mgr: proxmoxb(active, since 6h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 6h), 6 in (since 24h)

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   7.0 GiB used, 89 GiB / 96 GiB avail
    pgs:     1774/3548 objects degraded (50.000%)
             129 active+undersized+degraded

root@proxmoxa:~#
root@proxmoxa:~# ceph health
HEALTH_WARN Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized
root@proxmoxa:~# ceph health detail
HEALTH_WARN Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized
    pg 1.0 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.0 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.1 is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.3 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,2]
    pg 2.4 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.5 is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,0]
    pg 2.6 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,3]
    pg 2.7 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,2]
    pg 2.8 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.9 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.a is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.b is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.c is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.d is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.e is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.f is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,0]
    pg 2.10 is active+undersized+degraded, acting [2,4]
    pg 2.1c is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.1e is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,5]
    pg 2.1f is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,3]
    pg 2.20 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.21 is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,4]
    pg 2.22 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,2]
    pg 2.23 is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,3]
    pg 2.24 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.25 is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,2]
    pg 2.26 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,2]
    pg 2.27 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.28 is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.29 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.2a is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2b is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,4]
    pg 2.2c is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2d is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,2]
    pg 2.2e is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2f is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.30 is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,0]
    pg 2.31 is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,5]
    pg 2.32 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.33 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.34 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.35 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,3]
    pg 2.36 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.37 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.38 is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,5]
    pg 2.39 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,5]
    pg 2.7d is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7e is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7f is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,1]
root@proxmoxa:~#

このときのceph osd treeは下記の状態

root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default
-5         0.04678      host proxmoxa
 3    ssd  0.01559          osd.3          up   1.00000  1.00000
 4    ssd  0.01559          osd.4          up   1.00000  1.00000
 5    ssd  0.01559          osd.5          up   1.00000  1.00000
-3         0.04678      host proxmoxb
 0    ssd  0.01559          osd.0          up   1.00000  1.00000
 1    ssd  0.01559          osd.1          up   1.00000  1.00000
 2    ssd  0.01559          osd.2          up   1.00000  1.00000
root@proxmoxa:~#

次にCRUSH Structureを2個作る

root@proxmoxa:~# ceph osd crush add-bucket room1 room
added bucket room1 type room to crush map
root@proxmoxa:~# ceph osd crush add-bucket room2 room
added bucket room2 type room to crush map
root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-8               0  room room2
-7               0  room room1
-1         0.09357  root default
-5         0.04678      host proxmoxa
 3    ssd  0.01559          osd.3          up   1.00000  1.00000
 4    ssd  0.01559          osd.4          up   1.00000  1.00000
 5    ssd  0.01559          osd.5          up   1.00000  1.00000
-3         0.04678      host proxmoxb
 0    ssd  0.01559          osd.0          up   1.00000  1.00000
 1    ssd  0.01559          osd.1          up   1.00000  1.00000
 2    ssd  0.01559          osd.2          up   1.00000  1.00000
root@proxmoxa:~#

で、移動?

root@proxmoxa:~# ceph osd crush move room1 root=default
moved item id -7 name 'room1' to location {root=default} in crush map
root@proxmoxa:~# ceph osd crush move room2 root=default
moved item id -8 name 'room2' to location {root=default} in crush map
root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default
-5         0.04678      host proxmoxa
 3    ssd  0.01559          osd.3          up   1.00000  1.00000
 4    ssd  0.01559          osd.4          up   1.00000  1.00000
 5    ssd  0.01559          osd.5          up   1.00000  1.00000
-3         0.04678      host proxmoxb
 0    ssd  0.01559          osd.0          up   1.00000  1.00000
 1    ssd  0.01559          osd.1          up   1.00000  1.00000
 2    ssd  0.01559          osd.2          up   1.00000  1.00000
-7               0      room room1
-8               0      room room2
root@proxmoxa:~#

次にノードをそれぞれ別のroomに移動

root@proxmoxa:~# ceph osd crush move proxmoxa room=room1
moved item id -5 name 'proxmoxa' to location {room=room1} in crush map
root@proxmoxa:~# ceph osd crush move proxmoxb room=room2
moved item id -3 name 'proxmoxb' to location {room=room2} in crush map
root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default
-7         0.04678      room room1
-5         0.04678          host proxmoxa
 3    ssd  0.01559              osd.3          up   1.00000  1.00000
 4    ssd  0.01559              osd.4          up   1.00000  1.00000
 5    ssd  0.01559              osd.5          up   1.00000  1.00000
-8         0.04678      room room2
-3         0.04678          host proxmoxb
 0    ssd  0.01559              osd.0          up   1.00000  1.00000
 1    ssd  0.01559              osd.1          up   1.00000  1.00000
 2    ssd  0.01559              osd.2          up   1.00000  1.00000
root@proxmoxa:~#

CRUSH ruleを作成

root@proxmoxa:~# ceph osd getcrushmap > crush.map.bin
25
root@proxmoxa:~# ls -l crush.map.bin
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
root@proxmoxa:~# crushtool -d crush.map.bin -o crush.map.txt
root@proxmoxa:~# ls -l crush.map*
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
-rw-r--r-- 1 root root 1779 Jan 22 16:31 crush.map.txt
root@proxmoxa:~#

crush.map.bin はバイナリファイルなので、crushtoolでテキストにしたものを作成

現状の内容は下記だった

root@proxmoxa:~# cat crush.map.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host proxmoxa {
        id -5           # do not change unnecessarily
        id -6 class ssd         # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item osd.3 weight 0.01559
        item osd.4 weight 0.01559
        item osd.5 weight 0.01559
}
room room1 {
        id -7           # do not change unnecessarily
        id -10 class ssd                # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item proxmoxa weight 0.04678
}
host proxmoxb {
        id -3           # do not change unnecessarily
        id -4 class ssd         # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.01559
        item osd.1 weight 0.01559
        item osd.2 weight 0.01559
}
room room2 {
        id -8           # do not change unnecessarily
        id -9 class ssd         # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item proxmoxb weight 0.04678
}
root default {
        id -1           # do not change unnecessarily
        id -2 class ssd         # do not change unnecessarily
        # weight 0.09357
        alg straw2
        hash 0  # rjenkins1
        item room1 weight 0.04678
        item room2 weight 0.04678
}

# rules
rule replicated_rule {
        id 0
        type replicated
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map
root@proxmoxa:~#

テキストファイルの最後に replicated_stretch_rule を追加。idは、テキストを見て他にあるruleのidの次の番号を設定する

root@proxmoxa:~# cp crush.map.txt crush-new.map.txt
root@proxmoxa:~# vi crush-new.map.txt
root@proxmoxa:~# diff -u crush.map.txt crush-new.map.txt
--- crush.map.txt       2026-01-22 16:31:44.837979276 +0900
+++ crush-new.map.txt   2026-01-22 16:35:12.146622553 +0900
@@ -87,3 +87,13 @@
 }

 # end crush map
+
+rule replicated_stretch_rule {
+        id 1
+        type replicated
+        step take default
+        step choose firstn 0 type room
+        step chooseleaf firstn 2 type host
+        step emit
+}
+
root@proxmoxa:~#

作成したファイルをcephに読み込ませる

root@proxmoxa:~# ls -l
total 12
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
-rw-r--r-- 1 root root 1779 Jan 22 16:31 crush.map.txt
-rw-r--r-- 1 root root 1977 Jan 22 16:35 crush-new.map.txt
root@proxmoxa:~# crushtool -c crush-new.map.txt -o crush-new.map.bin
root@proxmoxa:~# ls -l
total 16
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
-rw-r--r-- 1 root root 1779 Jan 22 16:31 crush.map.txt
-rw-r--r-- 1 root root 1195 Jan 22 16:36 crush-new.map.bin
-rw-r--r-- 1 root root 1977 Jan 22 16:35 crush-new.map.txt
root@proxmoxa:~# ceph osd setcrushmap -i crush-new.map.bin
26
root@proxmoxa:~#

そうするとcrush ruleが追加される

root@proxmoxa:~# ceph osd crush rule ls
replicated_rule
replicated_stretch_rule
root@proxmoxa:~#

別にosd treeは変わってない模様

root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         0.09354  root default
-7         0.04677      room room1
-5         0.04677          host proxmoxa
 3    ssd  0.01558              osd.3          up   1.00000  1.00000
 4    ssd  0.01558              osd.4          up   1.00000  1.00000
 5    ssd  0.01558              osd.5          up   1.00000  1.00000
-8         0.04677      room room2
-3         0.04677          host proxmoxb
 0    ssd  0.01558              osd.0          up   1.00000  1.00000
 1    ssd  0.01558              osd.1          up   1.00000  1.00000
 2    ssd  0.01558              osd.2          up   1.00000  1.00000
root@proxmoxa:~#
root@proxmoxa:~# ceph health
HEALTH_WARN Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 7h)
    mgr: proxmoxb(active, since 7h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 7h), 6 in (since 25h); 97 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   10 GiB used, 86 GiB / 96 GiB avail
    pgs:     448/3548 objects degraded (12.627%)
             1276/3548 objects misplaced (35.964%)
             59 active+undersized+remapped
             34 active+clean+remapped
             32 active+undersized+degraded
             4  active+clean

root@proxmoxa:~#

移動していかない?

以前と同じようにpgp_numを128から32に変えてみる

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  448/3540 objects degraded (12.655%)
  1272/3540 objects misplaced (35.932%)

root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~# ceph osd pool set cephpool pgp_num 32
set pool 2 pgp_num to 32
root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~#

かわっていかない・・・

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  448/3540 objects degraded (12.655%)
  1272/3540 objects misplaced (35.932%)
  client io 170 B/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized
            1 pools have pg_num > pgp_num

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 7h)
    mgr: proxmoxb(active, since 7h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 7h), 6 in (since 25h); 97 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   10 GiB used, 86 GiB / 96 GiB avail
    pgs:     448/3548 objects degraded (12.627%)
             1276/3548 objects misplaced (35.964%)
             59 active+undersized+remapped
             34 active+clean+remapped
             32 active+undersized+degraded
             4  active+clean

  io:
    client:   170 B/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~#

「1 pools have pg_num > pgp_num」とでているなら、pg_numもかえてみるか?

root@proxmoxa:~# ceph osd pool set cephpool pg_num 32
set pool 2 pg_num to 32
root@proxmoxa:~#
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 7h)
    mgr: proxmoxb(active, since 7h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 7h), 6 in (since 25h); 97 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   10 GiB used, 86 GiB / 96 GiB avail
    pgs:     448/3548 objects degraded (12.627%)
             1276/3548 objects misplaced (35.964%)
             59 active+undersized+remapped
             34 active+clean+remapped
             32 active+undersized+degraded
             4  active+clean

root@proxmoxa:~#

しばらく待ったものの変化はない

crush ruleが適用されているのか?

6.3. CRUSH ルールが作成され、プールが正しい CRUSH ルールに設定されていることの確認

現状のルールのrule idを確認

root@proxmoxa:~# ceph osd crush rule dump | grep -E "rule_(id|name)"
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "rule_id": 1,
        "rule_name": "replicated_stretch_rule",
root@proxmoxa:~#

実際のpoolに設定されているルールのIDを確認

root@proxmoxa:~# ceph osd dump|grep cephpool
pool 2 'cephpool' replicated size 4 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 133 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.22
root@proxmoxa:~#

「crush_rule 0」とあるので、変更されてないっぽい

既存poolにcrush ruleを適用する方法をRedHatドキュメントから

root@proxmoxa:~# ceph osd pool get cephpool crush_rule
crush_rule: replicated_rule
root@proxmoxa:~# ceph osd pool set cephpool crush_rule replicated_stretch_rule
set pool 2 crush_rule to replicated_stretch_rule
root@proxmoxa:~# ceph osd pool get cephpool crush_rule
crush_rule: replicated_stretch_rule
root@proxmoxa:~# ceph osd dump|grep cephpool
pool 2 'cephpool' replicated size 4 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 134 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.22
root@proxmoxa:~#

変更できた

うーん・・・・?

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  194/3540 objects degraded (5.480%)
  1562/3540 objects misplaced (44.124%)

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 194/3548 objects degraded (5.468%), 14 pgs degraded, 83 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 8h)
    mgr: proxmoxb(active, since 8h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 8h), 6 in (since 26h); 115 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   11 GiB used, 85 GiB / 96 GiB avail
    pgs:     194/3548 objects degraded (5.468%)
             1566/3548 objects misplaced (44.138%)
             69 active+undersized+remapped
             45 active+clean+remapped
             14 active+undersized+degraded
             1  active+clean

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  194/3540 objects degraded (5.480%)
  1562/3540 objects misplaced (44.124%)
  client io 1.4 KiB/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~#

手順: PG カウントの増加」にpg_numとpgp_numをかえる、という話があって、pgp_numを4にしてたので実行してみた


root@proxmoxa:~# ceph osd pool get cephpool pg_num
pg_num: 128
root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~# ceph osd pool set cephpool pgp_num 4
set pool 2 pgp_num to 4
root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 194/3548 objects degraded (5.468%), 14 pgs degraded, 83 pgs undersized
            1 pools have pg_num > pgp_num

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 9h)
    mgr: proxmoxb(active, since 9h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 9h), 6 in (since 27h); 115 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   11 GiB used, 85 GiB / 96 GiB avail
    pgs:     194/3548 objects degraded (5.468%)
             1566/3548 objects misplaced (44.138%)
             69 active+undersized+remapped
             45 active+clean+remapped
             14 active+undersized+degraded
             1  active+clean

root@proxmoxa:~#

「1 pools have pg_num > pgp_num」という出力がでるようになってしまった

じゃあ、pg_num も4にしてみる


root@proxmoxa:~# ceph osd pool set cephpool pg_num 4
set pool 2 pg_num to 4
root@proxmoxa:~# ceph osd pool get cephpool pg_num
pg_num: 128
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 194/3548 objects degraded (5.468%), 14 pgs degraded, 83 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 9h)
    mgr: proxmoxb(active, since 9h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 9h), 6 in (since 27h); 115 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   11 GiB used, 85 GiB / 96 GiB avail
    pgs:     194/3548 objects degraded (5.468%)
             1566/3548 objects misplaced (44.138%)
             69 active+undersized+remapped
             45 active+clean+remapped
             14 active+undersized+degraded
             1  active+clean

root@proxmoxa:~#

関係なさそう

samba 4.23.3 で立てたActive Directoryサーバの機能レベルが2008R2から動かせない件を修正する

ESXi8 Free環境上に Active Directoryサーバを立てるか、と、AlmaLinux 9 で samba 4.23.3 をソースからコンパイルして構築した

# /usr/local/samba/bin/samba-tool domain provision --use-rfc2307 --interactive
Realm [ADSAMPLE.LOCAL]:
Domain [ADSAMPLE]:
Server Role (dc, member, standalone) [dc]:
DNS backend (SAMBA_INTERNAL, BIND9_FLATFILE, BIND9_DLZ, NONE) [SAMBA_INTERNAL]:
DNS forwarder IP address (write 'none' to disable forwarding) [8.8.8.8]:  8.8.8.8
Administrator password:
Retype password:
INFO 2025-11-10 14:24:37,370 pid:1551 /usr/local/samba/lib64/python3.9/site-packages/samba/provision/__init__.py #2112: Looking up IPv4 addresses
<略>
INFO 2025-11-10 14:24:49,826 pid:1551 /usr/local/samba/lib64/python3.9/site-packages/samba/provision/__init__.py #501: DOMAIN SID:            S-1-5-21-1830428519-1651848948-1698044471
#

これで起動したActive Directoryサーバのフォレストレベル / ドメインレベル は下記の様にWindows 2008 R2 となっていた。

# samba-tool domain level show
Domain and forest function level for domain 'DC=adsample,DC=local'

Forest function level: (Windows) 2008 R2
Domain function level: (Windows) 2008 R2
Lowest function level of a DC: (Windows) 2008 R2
#

これをアップグレードしようと samba-tool domain level raiseコマンドを実行してみてもエラーとなる。

# samba-tool domain level raise --forest-level=2012_R2
ERROR: Forest function level can't be higher than the domain function level(s). Please raise it/them first!
# samba-tool domain level raise --domain-level=2012_R2
ERROR: Domain function level can't be higher than the lowest function level of a DC!
#

これはデフォルトのsamba設定で”ad dc functional level”が2008R2までとなっているからそういうことになっているのだという(参考:Samba domain controller: raising (all kinds of) level)

testparamコマンドを実行して現在の設定値を確認する

# /usr/local/samba/bin/testparm -s --section-name=global --parameter-name="ad dc functional level"
Load smb config files from /usr/local/samba/etc/smb.conf
Loaded services file OK.
Weak crypto is allowed by GnuTLS (e.g. NTLM as a compatibility fallback)

2008_R2
#

現状の /usr/local/samba/etc/smb.conf に記載はないが、 samba設定としては 2008_R2 として認識されている、ということを確認出来た

この結果を受けて/usr/local/samba/etc/smb.conf のglobalセクションに「ad dc functional level = 2016」という記述を追加する

# cat /usr/local/samba/etc/smb.conf
# Global parameters
[global]
        dns forwarder = 8.8.8.8
        netbios name = ADSERVER
        realm = ADSAMPLE.LOCAL
        server role = active directory domain controller
        workgroup = ADSAMPLE
        idmap_ldb:use rfc2307 = yes
        ad dc functional level = 2016

[sysvol]
        path = /usr/local/samba/var/locks/sysvol
        read only = No

[netlogon]
        path = /usr/local/samba/var/locks/sysvol/adsample.local/scripts
        read only = No
#

testparamで記述が反映されているかを確認

# /usr/local/samba/bin/testparm -s --section-name=global --parameter-name="ad dc functional level"
Load smb config files from /usr/local/samba/etc/smb.conf
Loaded services file OK.
Weak crypto is allowed by GnuTLS (e.g. NTLM as a compatibility fallback)

2016
#

sambaを再起動して、機能レベルがどうなったのかを確認

# systemctl restart samba-ad-dc
# samba-tool domain level show
Domain and forest function level for domain 'DC=adsample,DC=local'

Forest function level: (Windows) 2008 R2
Domain function level: (Windows) 2008 R2
Lowest function level of a DC: (Windows) 2016
#

Lowest function level of a DC が変更されたので、上2つも変更できるようになった

まずはドメインの機能レベルを変更

# samba-tool domain level raise --domain-level=2012_R2
Domain function level changed!
All changes applied successfully!
# samba-tool domain level show
Domain and forest function level for domain 'DC=adsample,DC=local'

Forest function level: (Windows) 2008 R2
Domain function level: (Windows) 2012 R2
Lowest function level of a DC: (Windows) 2016
#

続いてフォレストの機能レベルを変更

# samba-tool domain level raise --forest-level=2012_R2
Forest function level changed!
All changes applied successfully!
# samba-tool domain level show
Domain and forest function level for domain 'DC=adsample,DC=local'

Forest function level: (Windows) 2012 R2
Domain function level: (Windows) 2012 R2
Lowest function level of a DC: (Windows) 2016
#

これで問題なくなった。

iscsiadmコマンドのメモ

HPE VM Essentails に iSCSIストレージをつないだ場合の動作がわからない点が多かった、Web UIからではなく、CLIでいろいろ調べる羽目になったのでメモ書き

Linux汎用で使える話ではある

接続の確認

iscsiが接続できているかを「iscsiadm -m session」で確認

pcuser@hpevme6:~$ sudo iscsiadm -m session
tcp: [1] 192.168.3.34:3260,1029 iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3 (non-flash)
tcp: [2] 192.168.2.34:3260,1028 iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3 (non-flash)
pcuser@hpevme6:~$

何もつながっていない場合は下記

pcuser@hpevme6:~$ sudo iscsiadm -m session
iscsiadm: No active sessions.
pcuser@hpevme6:~$

詳細を確認したい場合は「-P 数字」というオプションを付ける。0,1,2,3が指定できるが「-P 0」は付けない場合と同じ

0~2は、接続先IPアドレスとログイン情報などの範囲
3になると、デバイスが認識されているかがわかるようになるので「sudo iscsiadm -m session -P 3」はトラブル時に必須

pcuser@hpevme6:~$ sudo iscsiadm -m session --print=3
iSCSI Transport Class version 2.0-870
version 2.1.9
Target: iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3 (non-flash)
        Current Portal: 192.168.3.34:3260,1029
        Persistent Portal: 192.168.3.34:3260,1029
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.3.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 33 State: running
                scsi33 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdb          State: running
                scsi33 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sdd          State: running
        Current Portal: 192.168.2.34:3260,1028
        Persistent Portal: 192.168.2.34:3260,1028
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.2.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 34 State: running
                scsi34 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdc          State: running
                scsi34 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sde          State: running
pcuser@hpevme6:~$

“Attached SCSI devices:” のあとに scsi~ という表記があるかどうか

ない場合は、iSCSIストレージ側で、アクセス許可されてない可能性があるので、設定を確認

まず、Linux側のInitiatorNameを確認。Linuxの場合 /etc/iscsi/initiatorname.iscsi に記載されいて、OSインストール直後などは「InitiatorName=iqn.2004-10.com.ubuntu:01:<ランダム>」といった値で設定されていることが多い

HPE VMEの場合、hpe-vmセットアップ直後は ubuntuランダムなのだが、Web UIからiSCSI接続をするとホスト名 ランダムといった下記のような設定に切り替わる

pcuser@hpevme6:~$ sudo cat /etc/iscsi/initiatorname.iscsi
## DO NOT EDIT OR REMOVE THIS FILE!
## If you remove this file, the iSCSI daemon will not start.
## If you change the InitiatorName, existing access control lists
## may reject this initiator.  The InitiatorName must be unique
## for each iSCSI initiator.  Do NOT duplicate iSCSI InitiatorNames.
InitiatorName=iqn.2024-12.com.hpe:hpevme6:59012
pcuser@hpevme6:~$

この「InitiatorName」の値をiSCSIストレージ側の「イニシエータ」の登録に追加する必要がある

NetAppの場合の設定例

HPE VMEの場合、iSCSI設定を行う際に、Manager仮想マシンが各サーバの /etc/iscsi/initiatorname.iscsi の値を書き換えるので、設定したはずなのにつながらない場合は、最新の名前がiSCSIストレージ側に登録されているかを確認すること

設定変更した後、「sudo iscsiadm -m session –rescan」を実行して再スキャンを行う

認識していない状態から–rescanを実行して認識した、という実行ログ

pcuser@hpevme6:~$ sudo iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 2.1.9
Target: iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3 (non-flash)
        Current Portal: 192.168.3.34:3260,1029
        Persistent Portal: 192.168.3.34:3260,1029
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.3.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 120
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 33 State: running
        Current Portal: 192.168.2.34:3260,1028
        Persistent Portal: 192.168.2.34:3260,1028
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.2.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 120
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 34 State: running
pcuser@hpevme6:~$ sudo iscsiadm -m session --rescan
Rescanning session [sid: 1, target: iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3, portal: 192.168.3.34,3260]
Rescanning session [sid: 2, target: iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3, portal: 192.168.2.34,3260]
pcuser@hpevme6:~$ sudo iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 2.1.9
Target: iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3 (non-flash)
        Current Portal: 192.168.3.34:3260,1029
        Persistent Portal: 192.168.3.34:3260,1029
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.3.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 33 State: running
                scsi33 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdb          State: running
                scsi33 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sdd          State: running
        Current Portal: 192.168.2.34:3260,1028
        Persistent Portal: 192.168.2.34:3260,1028
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.2.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 34 State: running
                scsi34 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdc          State: running
                scsi34 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sde          State: running
pcuser@hpevme6:~$

マルチパスの認識

iSCSIストレージは複数のセッション=マルチパスで接続されるので、下の例では、scsi33とscsi34 の2つで見えている

pcuser@hpevme6:~$ sudo iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 2.1.9
Target: iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3 (non-flash)
        Current Portal: 192.168.3.34:3260,1029
        Persistent Portal: 192.168.3.34:3260,1029
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.3.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 33 State: running
                scsi33 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdb          State: running
                scsi33 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sdd          State: running
        Current Portal: 192.168.2.34:3260,1028
        Persistent Portal: 192.168.2.34:3260,1028
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme6:59012
                Iface IPaddress: 192.168.2.60
                Iface HWaddress: default
                Iface Netdev: default
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 34 State: running
                scsi34 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdc          State: running
                scsi34 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sde          State: running
pcuser@hpevme6:~$

2パスで見えているものを1つにまとめるのが multipathd の役割

「sudo multipath -ll」を実行して認識状況を確認

pcuser@hpevme6:~$ sudo multipath -ll
3600a09807770457a795d5a4159416c34 dm-2 NETAPP,LUN C-Mode
size=70G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 34:0:0:0 sdc 8:32 active ready running
  `- 33:0:0:0 sdb 8:16 active ready running
3600a09807770457a795d5a4159416c35 dm-1 NETAPP,LUN C-Mode
size=5.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 33:0:0:1 sdd 8:48 active ready running
  `- 34:0:0:1 sde 8:64 active ready running
pcuser@hpevme6:~$

multipathdでまとめられたデバイスは /dev/mapper の下にデバイスファイルがある

pcuser@hpevme6:~$ ls /dev/mapper/*
/dev/mapper/3600a09807770457a795d5a4159416c34  /dev/mapper/control
/dev/mapper/3600a09807770457a795d5a4159416c35  /dev/mapper/ubuntu--vg-ubuntu--lv
pcuser@hpevme6:~$

「sudo multipath -ll」で何も表示されていない場合は、手動でデバイスを登録する

まず、認識している /dev/sd? に対応するWWIDを調べるため「/lib/udev/scsi_id -g -u -d /dev/sd?」を実行する

/lib/udev/scsi_id -g -u -d /dev/sdX

このWWIDをmutlipathに登録するため「multipath -a WWID」を実行する

multipath -a WWID

登録した後は「multipath -r」で再読み込みして、「multipath -ll」で追加されたかを確認する

ターゲットログインなどの初期設定

「iscsiadm -m discovery -t sendtargets -p IPアドレス」で接続

接続パラメータの変更

現在のパラメータ確認は「sudo iscsiadm -m node」でポータル名を確認

pcuser@hpevme6:~$ sudo iscsiadm -m node
192.168.2.34:3260,1028 iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3
192.168.3.34:3260,1029 iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3
pcuser@hpevme6:~$

各ポータルに設定されているパラメータを「sudo iscsiadm -m node -p <ポータル名>」で確認

pcuser@hpevme6:~$ sudo iscsiadm -m node -p 192.168.2.34:3260,1028
# BEGIN RECORD 2.1.9
node.name = iqn.1992-08.com.netapp:sn.e56cfbb6bab111f09b2a000c2980b7f5:vs.3
node.tpgt = 1028
node.startup = automatic
node.leading_login = No
iface.iscsi_ifacename = default
iface.net_ifacename = <empty>
iface.ipaddress = <empty>
iface.prefix_len = 0
iface.hwaddress = <empty>
iface.transport_name = tcp
iface.initiatorname = <empty>
iface.state = <empty>
iface.vlan_id = 0
iface.vlan_priority = 0
iface.vlan_state = <empty>
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
iface.bootproto = <empty>
iface.subnet_mask = <empty>
iface.gateway = <empty>
iface.dhcp_alt_client_id_state = <empty>
iface.dhcp_alt_client_id = <empty>
iface.dhcp_dns = <empty>
iface.dhcp_learn_iqn = <empty>
iface.dhcp_req_vendor_id_state = <empty>
iface.dhcp_vendor_id_state = <empty>
iface.dhcp_vendor_id = <empty>
iface.dhcp_slp_da = <empty>
iface.fragmentation = <empty>
iface.gratuitous_arp = <empty>
iface.incoming_forwarding = <empty>
iface.tos_state = <empty>
iface.tos = 0
iface.ttl = 0
iface.delayed_ack = <empty>
iface.tcp_nagle = <empty>
iface.tcp_wsf_state = <empty>
iface.tcp_wsf = 0
iface.tcp_timer_scale = 0
iface.tcp_timestamp = <empty>
iface.redirect = <empty>
iface.def_task_mgmt_timeout = 0
iface.header_digest = <empty>
iface.data_digest = <empty>
iface.immediate_data = <empty>
iface.initial_r2t = <empty>
iface.data_seq_inorder = <empty>
iface.data_pdu_inorder = <empty>
iface.erl = 0
iface.max_receive_data_len = 0
iface.first_burst_len = 0
iface.max_outstanding_r2t = 0
iface.max_burst_len = 0
iface.chap_auth = <empty>
iface.bidi_chap = <empty>
iface.strict_login_compliance = <empty>
iface.discovery_auth = <empty>
iface.discovery_logout = <empty>
node.discovery_address = 192.168.2.34
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = 0
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.nr_sessions = 1
node.session.auth.authmethod = None
node.session.auth.username = <empty>
node.session.auth.password = <empty>
node.session.auth.username_in = <empty>
node.session.auth.password_in = <empty>
node.session.auth.chap_algs = MD5
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.session.scan = auto
node.session.reopen_max = 0
node.conn[0].address = 192.168.2.34
node.conn[0].port = 3260
node.conn[0].startup = automatic
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No
# END RECORD
pcuser@hpevme6:~$

マルチパスで一部のセッションが切れた時の再接続にかかる時間がnode.session.timeo.replacement_timeout で設定されていれ標準は120秒となっている

これだと長いので、例えばHPEの「HPE Primera Red Hat Enterprise Linux実装ガイド」では 10秒 としている

今すぐ変更したい場合はiscsiadmを実行

pcuser@hpevme6:~$ sudo iscsiadm -m node -p 192.168.2.34:3260,1028 |grep node.session.timeo.replacem
ent_timeout
node.session.timeo.replacement_timeout = 120
pcuser@hpevme6:~$ sudo iscsiadm -m node -p 192.168.2.34:3260,1028 -o update -n node.session.timeo.replacement_timeout -v 10
pcuser@hpevme6:~$ sudo iscsiadm -m node -p 192.168.2.34:3260,1028 |grep node.session.timeo.replacement_timeout
node.session.timeo.replacement_timeout = 10
pcuser@hpevme6:~$

恒久的に変更するには /etc/iscsi/iscsid.conf にて該当する行を修正する。

HPE Morpheus VM EssentialsのHCI構成を組んでみた

vSphere代替とも言われるHPE Morpheus VM Essentials (HPE VME, HVM, HPE VM Essentails) で、クラスタをセットアップする際に下記の選択肢がある。

「HVM 1.2 HCI Ceph Cluster on HVM/Ubuntu 24.04」ということで、共有ストレージとしてCephを使用するHCI構成があるらしい。

どういう構成を組めばいいのかわからなかったのですが、ドキュメントを探すと [Infrastructure]-[Clusters]-[HVM Clusters]-[Base Cluster Details]という非常にわかりづらいところに、HCI構成の場合に求める仕様が書いてあった。

・物理サーバが最低3台
・CPUコア数1以上
・メモリ4GB以上。Cephを使う場合ディスク1個ごとに4GB追加
・OSディスク 20GB以上、Ceph用データディスク500GB以上

Example Cluster Deployment」にサンプル構成がある

まずは、最低限のスペックでUbuntu 24.04+HVMをインストールして、Manager仮想マシンをセットアップした。

クラスタを作るところまでの手順は省略

で・・・私がはまった点の1つとして、レイアウト選択のバグ動作、というのがあります。

ver 8.0.10で実施したところ、「HVM 1.2 HCI Ceph Cluster on HVM/Ubuntu 24.04」を選択したところ下記の様にSSHホストが1つしか選択できない状態でした。

そういうものかと思って手順を進めて作成を開始すると指定してないサーバが2つ登場してプロセスが開始されるという状況に・・・

何度かクラスタレイアウトの選択をやり直すと、下記の様に3サーバ分の空欄が表示されるときがありました。この状態であればHCIクラスタの作成に成功しました。

下記の様に入力し作成を開始

作成完了

まあ、わかってしまえば構成自体は簡単だったのですが・・・

Cephのステータスを確認する画面が無いのはどうかと思うんですよ・・・

たとえば[インフラストラクチャ]-[クラスター]で作成したHCIクラスタを表示すると下記の様に「CEPH」って表示があります。

問題は「ステータス:WARN」って表示以上のことを調べるインタフェースが無い、ということ

[ストレージ]で確認できるものは下記の情報だけで、Cephの状態について確認できない

ESXi 8.0でNVMe SSDをUSBケースでつないでデータ移行したら面倒くさいことになった

先日セットアップしたミニPCにESXi8.0 Freeでは余ってたM.2 SATA 256GBにシステムを、M.2 NVMe 512GBを主データストアとして使っていた。

ふと手持ちのM.2系ストレージを見てみると、M.2 NVMeの2TB SSDが2枚余っていたので、片方をESXi用とするか、とまずはUSB NVMeケースに入れてVMFSでフォーマットし、M.2 NVMe 512GB からデータを移動させた。

ちなみに、USB NVMeケースを認識させるには、ESXi8.0で標準動作しているUSB パススルー用の USB Arbitrator service を停止させる必要があった。

出典:Configuring a vSphere ESXi host to use a local USB device for VMkernel coredumps

# /etc/init.d/usbarbitrator stop

M.2 NVMe 2TB を内蔵させてESXi Host Clientから確認

[ストレージ]-[デバイス]ではちゃんとSPD SP7002D2TNGH が認識されている

クリックすると、中にVMFSパーテーションがあるのも認識されている

しかし[ストレージ]-[データストア]には表示されていない。

どういうことなのか、いろいろ調べた結果、なんとか解決した

どうやら、USB NVMeケースでマウントしていたVMFS領域について、明示的にumountしておかないと、いろんな処理がらみで面倒なことになっていたのではないか、と推測される状態となっていた。

まず、M.2 NVMeストレージとして認識されているかを「esxcli nvme namespace list」と「esxcli nvme controller list」を実行して確認

[root@esxi:~] esxcli nvme namespace list
Name                                                                   Controller Number  Namespace ID  Block Size  Capacity in MB
---------------------------------------------------------------------  -----------------  ------------  ----------  --------------
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000                256             1         512         1953514
[root@esxi:~] esxcli nvme controller list
Name                                                                                      Controller Number  Adapter  Transport Type  Is Online  Controller Type  Is VVOL  Keep Alive Timeout  IO Queue Number  IO Queue Size
----------------------------------------------------------------------------------------  -----------------  -------  --------------  ---------  ---------------  -------  ------------------  ---------------  -------------
nqn.2014-08.org.nvmexpress_1e4b_SPD_SP700-2TNGH_________________________0901SP7007D00399                256  vmhba1   PCIe                 true                     false                   0                1           1024
[root@esxi:~]

次に /vmfs/devices/disks/ 以下にデバイスがあるかを確認

[root@esxi:~] ls /vmfs/devices/disks/
t10.ATA_____W800S_256GB_____________________________2202211088199_______
t10.ATA_____W800S_256GB_____________________________2202211088199_______:1
t10.ATA_____W800S_256GB_____________________________2202211088199_______:5
t10.ATA_____W800S_256GB_____________________________2202211088199_______:6
t10.ATA_____W800S_256GB_____________________________2202211088199_______:7
t10.ATA_____W800S_256GB_____________________________2202211088199_______:8
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000:1
vml.0100000000303230305f303030305f303030305f3030303000535044205350
vml.0100000000303230305f303030305f303030305f3030303000535044205350:1
vml.01000000003232303232313130383831393920202020202020573830305320
vml.01000000003232303232313130383831393920202020202020573830305320:1
vml.01000000003232303232313130383831393920202020202020573830305320:5
vml.01000000003232303232313130383831393920202020202020573830305320:6
vml.01000000003232303232313130383831393920202020202020573830305320:7
vml.01000000003232303232313130383831393920202020202020573830305320:8
vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2
vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2:1
[root@esxi:~]

今回認識していないのはSPDのDevfs pathを「esxcli storage core device list」で確認

[root@esxi:~] esxcli storage core device list
t10.ATA_____W800S_256GB_____________________________2202211088199_______
   Display Name: Local ATA Disk (t10.ATA_____W800S_256GB_____________________________2202211088199_______)
   Has Settable Display Name: true
   Size: 244198
   Device Type: Direct-Access
   Multipath Plugin: HPP
   Devfs Path: /vmfs/devices/disks/t10.ATA_____W800S_256GB_____________________________2202211088199_______
   Vendor: ATA
   Model: W800S 256GB
   Revision: 3G5A
   SCSI Level: 5
   Is Pseudo: false
   Status: on
   Is RDM Capable: false
   Is Local: true
   Is Removable: false
   Is SSD: true
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: yes
   Attached Filters:
   VAAI Status: unsupported
   Other UIDs: vml.01000000003232303232313130383831393920202020202020573830305320
   Is Shared Clusterwide: false
   Is SAS: false
   Is USB: false
   Is Boot Device: true
   Device Max Queue Depth: 31
   No of outstanding IOs with competing worlds: 31
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   Display Name: Local NVMe Disk (t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000)
   Has Settable Display Name: true
   Size: 1953514
   Device Type: Direct-Access
   Multipath Plugin: HPP
   Devfs Path: /vmfs/devices/disks/t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   Vendor: NVMe
   Model: SPD SP700-2TNGH
   Revision: SP02203A
   SCSI Level: 0
   Is Pseudo: false
   Status: on
   Is RDM Capable: false
   Is Local: true
   Is Removable: false
   Is SSD: true
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: no
   Attached Filters:
   VAAI Status: unsupported
   Other UIDs: vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2
   Is Shared Clusterwide: false
   Is SAS: false
   Is USB: false
   Is Boot Device: false
   Device Max Queue Depth: 1023
   No of outstanding IOs with competing worlds: 32
   Drive Type: physical
   RAID Level: NA
   Number of Physical Drives: 1
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false
[root@esxi:~]

パーテーションは1番の方なので下記を実施

[root@esxi:~] voma -m vmfs -f check -N -d /vmfs/devices/disks/vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2:1
Running VMFS Checker version 2.1 in check mode
Initializing LVM metadata, Basic Checks will be done

Checking for filesystem activity
Performing filesystem liveness check..|Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).
         Reservation Support is not present for NVME devices
Performing filesystem liveness check..|
########################################################################
#   Warning !!!                                                        #
#                                                                      #
#   You are about to execute VOMA without device reservation.          #
#   Any access to this device from other hosts when VOMA is running    #
#   can cause severe data corruption                                   #
#                                                                      #
#   This mode is supported only under VMware support supervision.      #
########################################################################
Do you want to continue (Y/N)?

0) _Yes
1) _No

Select a number from 0-1: 0
Phase 1: Checking VMFS header and resource files
   Detected VMFS-6 file system (labeled:'nvme2tb') with UUID:68e4cab1-0a865c28-49c0-04ab182311d3, Version 6:82
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.

Total Errors Found:           0
[root@esxi:~]

esxcli storage filesystem rescanを実行すると、ファイルシステムかスナップショットのどちらかにVMFS領域が認識されている

[root@esxi:~] esxcli storage filesystem rescan
[root@esxi:~] esxcli storage filesystem list
Mount Point                                        Volume Name                                 UUID                                 Mounted  Type            Size          Free
-------------------------------------------------  ------------------------------------------  -----------------------------------  -------  ------  ------------  ------------
/vmfs/volumes/68cad69a-e82d8e40-5b65-5bb7fb6107f2  datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2     true  VMFS-6  118380036096   91743059968
/vmfs/volumes/68cad69a-d23fb18e-73e5-5bb7fb6107f2  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2     true  VMFSOS  128580583424  125363552256
/vmfs/volumes/fa8a25f7-ba40ebee-45ac-f419c9f388e0  BOOTBANK1                                   fa8a25f7-ba40ebee-45ac-f419c9f388e0     true  vfat      4293591040    4022075392
/vmfs/volumes/f43b0450-7e4d6762-c6be-52e6552cc1f8  BOOTBANK2                                   f43b0450-7e4d6762-c6be-52e6552cc1f8     true  vfat      4293591040    4021354496
[root@esxi:~] esxcli storage vmfs snapshot lis
Error: Unknown command or namespace storage vmfs snapshot lis

[root@esxi:~] esxcli storage vmfs snapshot list
68e4cab1-0a865c28-49c0-04ab182311d3
   Volume Name: nvme2tb
   VMFS UUID: 68e4cab1-0a865c28-49c0-04ab182311d3
   Can mount: true
   Reason for un-mountability:
   Can resignature: true
   Reason for non-resignaturability:
   Unresolved Extent Count: 1
[root@esxi:~]

今回はスナップショットとして認識されていたので、再署名を行う

[root@esxi:~] esxcli storage vmfs snapshot resignature --volume-label=nvme2tb
[root@esxi:~]

再署名すると”snap”という名前ながら普通のファイルシステムとして認識された

[root@esxi:~] esxcli storage vmfs snapshot list
[root@esxi:~] esxcli storage filesystem list
Mount Point                                        Volume Name                                 UUID                                 Mounted  Type             Size           Free
-------------------------------------------------  ------------------------------------------  -----------------------------------  -------  ------  -------------  -------------
/vmfs/volumes/68cad69a-e82d8e40-5b65-5bb7fb6107f2  datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2     true  VMFS-6   118380036096    91743059968
/vmfs/volumes/68e5b682-56352c06-7c60-04ab182311d3  snap-444b0642-nvme2tb                       68e5b682-56352c06-7c60-04ab182311d3     true  VMFS-6  2048162529280  1222844088320
/vmfs/volumes/68cad69a-d23fb18e-73e5-5bb7fb6107f2  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2     true  VMFSOS   128580583424   125363552256
/vmfs/volumes/fa8a25f7-ba40ebee-45ac-f419c9f388e0  BOOTBANK1                                   fa8a25f7-ba40ebee-45ac-f419c9f388e0     true  vfat       4293591040     4022075392
/vmfs/volumes/f43b0450-7e4d6762-c6be-52e6552cc1f8  BOOTBANK2                                   f43b0450-7e4d6762-c6be-52e6552cc1f8     true  vfat       4293591040     4021354496
[root@esxi:~]

再起動しても認識状態は変わらず、普通のVMFS領域として使用できたので、データストア名を元に戻して再使用を開始した


ここから下は調査ログ

ここから下は、状況調査する際に参照した情報について列挙したメモです


KB「VMware ESXi/ESX を操作するときのディスクの識別」にあるコマンドをいくつか実行してみる

[root@esxi:~] esxcli storage core path list
sata.vmhba0-sata.0:1-t10.ATA_____W800S_256GB_____________________________2202211088199_______
   UID: sata.vmhba0-sata.0:1-t10.ATA_____W800S_256GB_____________________________2202211088199_______
   Runtime Name: vmhba0:C0:T1:L0
   Device: t10.ATA_____W800S_256GB_____________________________2202211088199_______
   Device Display Name: Local ATA Disk (t10.ATA_____W800S_256GB_____________________________2202211088199_______)
   Adapter: vmhba0
   Controller: Not Applicable
   Channel: 0
   Target: 1
   LUN: 0
   Plugin: HPP
   State: active
   Transport: sata
   Adapter Identifier: sata.vmhba0
   Target Identifier: sata.0:1
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
   Maximum IO Size: 33554432

pcie.300-pcie.0:0-t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   UID: pcie.300-pcie.0:0-t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   Runtime Name: vmhba1:C0:T0:L0
   Device: t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   Device Display Name: Local NVMe Disk (t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000)
   Adapter: vmhba1
   Controller: nqn.2014-08.org.nvmexpress_1e4b_SPD_SP700-2TNGH_________________________0901SP7007D00399
   Channel: 0
   Target: 0
   LUN: 0
   Plugin: HPP
   State: active
   Transport: pcie
   Adapter Identifier: pcie.300
   Target Identifier: pcie.0:0
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
   Maximum IO Size: 524288
[root@esxi:~]
[root@esxi:~] esxcfg-mpath -b
t10.ATA_____W800S_256GB_____________________________2202211088199_______ : Local ATA Disk (t10.ATA_____W800S_256GB_____________________________2202211088199_______)
   vmhba0:C0:T1:L0 LUN:0 state:active Local HBA vmhba0 channel 0 target 1

t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000 : Local NVMe Disk (t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000)
   vmhba1:C0:T0:L0 LUN:0 state:active Local HBA vmhba1 channel 0 target 0

[root@esxi:~]
[root@esxi:~] esxcli storage core device list
t10.ATA_____W800S_256GB_____________________________2202211088199_______
   Display Name: Local ATA Disk (t10.ATA_____W800S_256GB_____________________________2202211088199_______)
   Has Settable Display Name: true
   Size: 244198
   Device Type: Direct-Access
   Multipath Plugin: HPP
   Devfs Path: /vmfs/devices/disks/t10.ATA_____W800S_256GB_____________________________2202211088199_______
   Vendor: ATA
   Model: W800S 256GB
   Revision: 3G5A
   SCSI Level: 5
   Is Pseudo: false
   Status: on
   Is RDM Capable: false
   Is Local: true
   Is Removable: false
   Is SSD: true
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: yes
   Attached Filters:
   VAAI Status: unsupported
   Other UIDs: vml.01000000003232303232313130383831393920202020202020573830305320
   Is Shared Clusterwide: false
   Is SAS: false
   Is USB: false
   Is Boot Device: true
   Device Max Queue Depth: 31
   No of outstanding IOs with competing worlds: 31
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   Display Name: Local NVMe Disk (t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000)
   Has Settable Display Name: true
   Size: 1953514
   Device Type: Direct-Access
   Multipath Plugin: HPP
   Devfs Path: /vmfs/devices/disks/t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
   Vendor: NVMe
   Model: SPD SP700-2TNGH
   Revision: SP02203A
   SCSI Level: 0
   Is Pseudo: false
   Status: on
   Is RDM Capable: false
   Is Local: true
   Is Removable: false
   Is SSD: true
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: no
   Attached Filters:
   VAAI Status: unsupported
   Other UIDs: vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2
   Is Shared Clusterwide: false
   Is SAS: false
   Is USB: false
   Is Boot Device: false
   Device Max Queue Depth: 1023
   No of outstanding IOs with competing worlds: 32
   Drive Type: physical
   RAID Level: NA
   Number of Physical Drives: 1
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false
[root@esxi:~]
[root@esxi:~] esxcfg-scsidevs -c
Device UID                                                                Device Type      Console Device                                                                                Size      Multipath PluginDisplay Name
t10.ATA_____W800S_256GB_____________________________2202211088199_______  Direct-Access    /vmfs/devices/disks/t10.ATA_____W800S_256GB_____________________________2202211088199_______  244198MB  HPP     Local ATA Disk (t10.ATA_____W800S_256GB_____________________________2202211088199_______)
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000     Direct-Access    /vmfs/devices/disks/t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000     1953514MB HPP     Local NVMe Disk (t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000)
[root@esxi:~]

vmfsに関する出力となると、2TBデバイスが登場しない

[root@esxi:~] esxcli storage vmfs extent list
Volume Name                                 VMFS UUID                            Extent Number  Device Name                                                               Partition
------------------------------------------  -----------------------------------  -------------  ------------------------------------------------------------------------  ---------
datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2              0  t10.ATA_____W800S_256GB_____________________________2202211088199_______          8
OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2              0  t10.ATA_____W800S_256GB_____________________________2202211088199_______          7
[root@esxi:~]
[root@esxi:~] esxcfg-scsidevs -m
t10.ATA_____W800S_256GB_____________________________2202211088199_______:8 /vmfs/devices/disks/t10.ATA_____W800S_256GB_____________________________2202211088199_______:8 68cad69a-e82d8e40-5b65-5bb7fb6107f2  0  datastore1
t10.ATA_____W800S_256GB_____________________________2202211088199_______:7 /vmfs/devices/disks/t10.ATA_____W800S_256GB_____________________________2202211088199_______:7 68cad69a-d23fb18e-73e5-5bb7fb6107f2  0  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2
[root@esxi:~]
[root@esxi:~] esxcli storage filesystem list
Mount Point                                        Volume Name                                 UUID                                 Mounted  Type            Size          Free
-------------------------------------------------  ------------------------------------------  -----------------------------------  -------  ------  ------------  ------------
/vmfs/volumes/68cad69a-e82d8e40-5b65-5bb7fb6107f2  datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2     true  VMFS-6  118380036096   91743059968
/vmfs/volumes/68cad69a-d23fb18e-73e5-5bb7fb6107f2  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2     true  VMFSOS  128580583424  125363552256
/vmfs/volumes/fa8a25f7-ba40ebee-45ac-f419c9f388e0  BOOTBANK1                                   fa8a25f7-ba40ebee-45ac-f419c9f388e0     true  vfat      4293591040    4022075392
/vmfs/volumes/f43b0450-7e4d6762-c6be-52e6552cc1f8  BOOTBANK2                                   f43b0450-7e4d6762-c6be-52e6552cc1f8     true  vfat      4293591040    4021354496
[root@esxi:~]

/vmfs/devices/disks の下を見てみる

[root@esxi:~] ls -alh /vmfs/devices/disks
total 4500908025
drwxr-xr-x    2 root     root         512 Oct  8 00:11 .
drwxr-xr-x   16 root     root         512 Oct  8 00:11 ..
-rw-------    1 root     root      238.5G Oct  8 00:11 t10.ATA_____W800S_256GB_____________________________2202211088199_______
-rw-------    1 root     root      100.0M Oct  8 00:11 t10.ATA_____W800S_256GB_____________________________2202211088199_______:1
-rw-------    1 root     root        4.0G Oct  8 00:11 t10.ATA_____W800S_256GB_____________________________2202211088199_______:5
-rw-------    1 root     root        4.0G Oct  8 00:11 t10.ATA_____W800S_256GB_____________________________2202211088199_______:6
-rw-------    1 root     root      119.9G Oct  8 00:11 t10.ATA_____W800S_256GB_____________________________2202211088199_______:7
-rw-------    1 root     root      110.5G Oct  8 00:11 t10.ATA_____W800S_256GB_____________________________2202211088199_______:8
-rw-------    1 root     root        1.9T Oct  8 00:11 t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
-rw-------    1 root     root        1.9T Oct  8 00:11 t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000:1
lrwxrwxrwx    1 root     root          69 Oct  8 00:11 vml.0100000000303230305f303030305f303030305f3030303000535044205350 -> t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
lrwxrwxrwx    1 root     root          71 Oct  8 00:11 vml.0100000000303230305f303030305f303030305f3030303000535044205350:1 -> t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000:1
lrwxrwxrwx    1 root     root          72 Oct  8 00:11 vml.01000000003232303232313130383831393920202020202020573830305320 -> t10.ATA_____W800S_256GB_____________________________2202211088199_______
lrwxrwxrwx    1 root     root          74 Oct  8 00:11 vml.01000000003232303232313130383831393920202020202020573830305320:1 -> t10.ATA_____W800S_256GB_____________________________2202211088199_______:1
lrwxrwxrwx    1 root     root          74 Oct  8 00:11 vml.01000000003232303232313130383831393920202020202020573830305320:5 -> t10.ATA_____W800S_256GB_____________________________2202211088199_______:5
lrwxrwxrwx    1 root     root          74 Oct  8 00:11 vml.01000000003232303232313130383831393920202020202020573830305320:6 -> t10.ATA_____W800S_256GB_____________________________2202211088199_______:6
lrwxrwxrwx    1 root     root          74 Oct  8 00:11 vml.01000000003232303232313130383831393920202020202020573830305320:7 -> t10.ATA_____W800S_256GB_____________________________2202211088199_______:7
lrwxrwxrwx    1 root     root          74 Oct  8 00:11 vml.01000000003232303232313130383831393920202020202020573830305320:8 -> t10.ATA_____W800S_256GB_____________________________2202211088199_______:8
lrwxrwxrwx    1 root     root          69 Oct  8 00:11 vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2 -> t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
lrwxrwxrwx    1 root     root          71 Oct  8 00:11 vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2:1 -> t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000:1
[root@esxi:~]

KB「Detach a LUN device from ESXi hosts」より

[root@esxi:~] esxcli storage core device world list
Device                                                                    World ID  Open Count  World Name
------------------------------------------------------------------------  --------  ----------  ----------
t10.ATA_____W800S_256GB_____________________________2202211088199_______    524300           1  idle0
t10.ATA_____W800S_256GB_____________________________2202211088199_______    524399           1  OCFlush
t10.ATA_____W800S_256GB_____________________________2202211088199_______    524403           1  bcflushd
t10.ATA_____W800S_256GB_____________________________2202211088199_______    524728           1  Vol3JournalExtendMgrWorld
t10.ATA_____W800S_256GB_____________________________2202211088199_______    524813           1  J6AsyncReplayManager
t10.ATA_____W800S_256GB_____________________________2202211088199_______    525311           1  hostd
t10.ATA_____W800S_256GB_____________________________2202211088199_______    525543           1  healthd
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000       525311           1  hostd
[root@esxi:~] esxcli storage core device world list -d vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2
Device                                                                 World ID  Open Count  World Name
---------------------------------------------------------------------  --------  ----------  ----------
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000    525311           1  hostd
[root@esxi:~]
[root@esxi:~] esxcli storage core adapter list
HBA Name  Driver     Link State  UID          Capabilities  Description
--------  ---------  ----------  -----------  ------------  -----------
vmhba0    vmw_ahci   link-n/a    sata.vmhba0                (0000:00:17.0) Intel Corporation Alder Lake-N SATA AHCI Controller
vmhba1    nvme_pcie  link-n/a    pcie.300                   (0000:03:00.0) MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 (DRAM-less)
[root@esxi:~]
[root@esxi:~] esxcli storage core device partition list
Device                                                                    Partition  Start Sector  End Sector  Type           Size
------------------------------------------------------------------------  ---------  ------------  ----------  ----  -------------
t10.ATA_____W800S_256GB_____________________________2202211088199_______          0             0   500118191     0   256060514304
t10.ATA_____W800S_256GB_____________________________2202211088199_______          1            64      204863     0      104857600
t10.ATA_____W800S_256GB_____________________________2202211088199_______          5        208896     8595455     6     4293918720
t10.ATA_____W800S_256GB_____________________________2202211088199_______          6       8597504    16984063     6     4293918720
t10.ATA_____W800S_256GB_____________________________2202211088199_______          7      16986112   268435455    f8   128742064128
t10.ATA_____W800S_256GB_____________________________2202211088199_______          8     268437504   500118158    fb   118620495360
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000             0             0  4000797359     0  2048408248320
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000             1          2048  4000794624    fb  2048405799424
[root@esxi:~] esxcli storage core device partition showguid
Device                                                                    Partition  Layout  GUID
------------------------------------------------------------------------  ---------  ------  ----
t10.ATA_____W800S_256GB_____________________________2202211088199_______          0  GPT     00000000000000000000000000000000
t10.ATA_____W800S_256GB_____________________________2202211088199_______          1  GPT     c12a7328f81f11d2ba4b00a0c93ec93b
t10.ATA_____W800S_256GB_____________________________2202211088199_______          5  GPT     ebd0a0a2b9e5443387c068b6b72699c7
t10.ATA_____W800S_256GB_____________________________2202211088199_______          6  GPT     ebd0a0a2b9e5443387c068b6b72699c7
t10.ATA_____W800S_256GB_____________________________2202211088199_______          7  GPT     4eb2ea3978554790a79efae495e21f8d
t10.ATA_____W800S_256GB_____________________________2202211088199_______          8  GPT     aa31e02a400f11db9590000c2911d1b8
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000             0  GPT     00000000000000000000000000000000
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000             1  GPT     aa31e02a400f11db9590000c2911d1b8
[root@esxi:~]

ESXiのvmkernelのモジュールに関するパラメータを調査

まず、nvmeに関連しそうなモジュール一覧

[root@esxi:~] esxcli system module list|grep nvme
vmknvme                              true        true
vmknvme_vmkapi_compat                true        true
nvme_pcie                            true        true
[root@esxi:~]

それぞれのモジュールにあるパラメータを確認

[root@esxi:~] esxcli system module parameters list --module=nvme_pcie
Name                         Type  Value  Description
---------------------------  ----  -----  -----------
nvmePCIEBlkSizeAwarePollAct  int          NVMe PCIe block size aware poll activate. Valid if poll activated. Default activated.
nvmePCIEDebugMask            int          NVMe PCIe driver debug mask
nvmePCIEDma4KSwitch          int          NVMe PCIe 4k-alignment DMA
nvmePCIEFakeAdminQSize       uint         NVMe PCIe fake ADMIN queue size. 0's based
nvmePCIELogLevel             int          NVMe PCIe driver log level
nvmePCIEMsiEnbaled           int          NVMe PCIe MSI interrupt enable
nvmePCIEPollAct              int          NVMe PCIe hybrid poll activate, MSIX interrupt must be enabled. Default activated.
nvmePCIEPollInterval         uint         NVMe PCIe hybrid poll interval between each poll in microseconds. Valid if poll activated. Default 50us.
nvmePCIEPollOIOThr           uint         NVMe PCIe hybrid poll OIO threshold of automatic switch from interrupt to poll. Valid if poll activated. Default 30 OIO commands per IO queue.
[root@esxi:~] esxcli system module parameters list --module=vmknvme
Name                                   Type  Value  Description
-------------------------------------  ----  -----  -----------
vmknvme_adapter_num_cmpl_queues        uint         Number of PSA completion queues for NVMe-oF adapter, min: 1, max: 16, default: 4
vmknvme_bind_intr                      uint         If enabled, the interrupt cookies are binded to completion worlds. This parameter is only applied when using driver completion worlds.
vmknvme_compl_world_type               uint         completion world type, PSA: 0, VMKNVME: 1
vmknvme_ctlr_recover_initial_attempts  uint         Number of initial controller recover attempts, MIN: 2, MAX: 30
vmknvme_ctlr_recover_method            uint         controller recover method after initial recover attempts, RETRY: 0, DELETE: 1
vmknvme_cw_rate                        uint         Number of completion worlds per IO queue (NVMe/PCIe only). Number is a power of 2. Applies when number of queues less than 4.
vmknvme_enable_noiob                   uint         If enabled, driver will split the commands based on NOIOB.
vmknvme_hostnqn_format                 uint         HostNQN format, UUID: 0, HostName: 1
vmknvme_io_queue_num                   uint         vmknvme IO queue number for NVMe/PCIe adapter: pow of 2 in [1, 16]
vmknvme_io_queue_size                  uint         IO queue size: [8, 1024]
vmknvme_iosplit_workaround             uint         If enabled, qdepth in PSA layer is half size of vmknvme settings.
vmknvme_log_level                      uint         log level: [0, 20]
vmknvme_max_prp_entries_num            uint         User defined maximum number of PRP entries per controller:default value is 0
vmknvme_stats                          uint         Nvme statistics per controller (NVMe/PCIe only now). Logical OR of flags for collecting. 0x0 for disable, 0x1 for basic data (IO pattern), 0x2 for histogram without IO block size, 0x4 for histogram with IO block size. Default 0x2.
vmknvme_total_io_queue_size            uint         Aggregated IO queue size of a controller, MIN: 64, MAX: 4096
vmknvme_use_default_domain_name        uint         If set to 1, the default domain name "com.vmware", not the system domain name will always be used to generate host NQN. Not used: 0, used: 1, default: 0
[root@esxi:~] esxcli system module parameters list --module=vmknvme_vmkapi_compat
[root@esxi:~]

データストアとしての取り扱いに関連しそうなものはなさそうに見える

esxcliを調べるとesxcli nvmeコマンド群があった

HPE Alletra 9000:VMware ESXi実装ガイドのESXiホストからのネームスペースの検出とネームスペースへの接続 に NVMe over FC時のesxcli nvmeコマンドでの実行例があるので実行してみる

[root@esxi:~] esxcli nvme adapter list
Adapter  Adapter Qualified Name                                                               Transport Type  Driver     Associated Devices
-------  -----------------------------------------------------------------------------------  --------------  ---------  ------------------
vmhba1   aqn:nvme_pcie:nqn.2014-08.org.nvmexpress1e4b1e4b0901SP7007D00399____SPD_SP700-2TNGH  PCIe            nvme_pcie
[root@esxi:~]

ネームスペースはすでにある

[root@esxi:~] esxcli nvme namespace list
Name                                                                   Controller Number  Namespace ID  Block Size  Capacity in MB
---------------------------------------------------------------------  -----------------  ------------  ----------  --------------
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000                256             1         512         1953514
[root@esxi:~] esxcli nvme controller list
Name                                                                                      Controller Number  Adapter  Transport Type  Is Online  Controller Type  Is VVOL  Keep Alive Timeout  IO Queue Number  IO Queue Size
----------------------------------------------------------------------------------------  -----------------  -------  --------------  ---------  ---------------  -------  ------------------  ---------------  -------------
nqn.2014-08.org.nvmexpress_1e4b_SPD_SP700-2TNGH_________________________0901SP7007D00399                256  vmhba1   PCIe                 true                     false                   0                1           1024
[root@esxi:~]

PowerEdge:DellサーバーおよびVMware ESXiでのNVMe LED管理 にLED管理の前段階となるデバイスにどういう設定ができるか表示するといった項目があった

[root@esxi:~] esxcli nvme device list
HBA Name  Status  Signature
--------  ------  ---------
vmhba1    Online  nvmeMgmt-nvmhba0
[root@esxi:~] esxcli nvme device get -A vmhba1
Controller Identify Info:
   PCIVID: 0x1e4b
   PCISSVID: 0x1e4b
   Serial Number: 0901SP7007D00399
   Model Number: SPD SP700-2TNGH
   Firmware Revision: SP02203A
   Recommended Arbitration Burst: 0
   IEEE OUI Identifier: 000000
   Controller Associated with an SR-IOV Virtual Function: false
   Controller Associated with a PCI Function: true
   NVM Subsystem May Contain Two or More Controllers: false
   NVM Subsystem Contains Only One Controller: true
   NVM Subsystem May Contain Two or More PCIe Ports: false
   NVM Subsystem Contains Only One PCIe Port: true
   Max Data Transfer Size: 7
   Controller ID: 0
   Version: 1.4
   RTD3 Resume Latency: 500000 us
   RTD3 Entry Latency: 2000000 us
   Optional Firmware Activation Event Support: true
   Optional Namespace Attribute Changed Event Support: false
   Host Identifier Support: false
   Namespace Management and Attachment Support: false
   Firmware Activate and Download Support: true
   Format NVM Support: true
   Security Send and Receive Support: true
   Abort Command Limit: 2
   Async Event Request Limit: 3
   Firmware Activate Without Reset Support: true
   Firmware Slot Number: 3
   The First Slot Is Read-only: false
   Telemetry Log Page Support: false
   Command Effects Log Page Support: true
   SMART/Health Information Log Page per Namespace Support: false
   Error Log Page Entries: 63
   Number of Power States Support: 4
   Format of Admin Vendor Specific Commands Is Same: true
   Format of Admin Vendor Specific Commands Is Vendor Specific: false
   Autonomous Power State Transitions Support: true
   Warning Composite Temperature Threshold: 363
   Critical Composite Temperature Threshold: 368
   Max Time for Firmware Activation: 200 * 100ms
   Host Memory Buffer Preferred Size: 8192 * 4KB
   Host Memory Buffer Min Size: 8192 * 4KB
   Total NVM Capacity: 0x1dceea56000
   Unallocated NVM Capacity: 0x0
   Access Size: 0 * 512B
   Total Size: 0 * 128KB
   Authentication Method: 0
   Number of RPMB Units: 0
   Keep Alive Support: 0
   Max Submission Queue Entry Size: 64 Bytes
   Required Submission Queue Entry Size: 64 Bytes
   Max Completion Queue Entry Size: 16 Bytes
   Required Completion Queue Entry Size: 16 Bytes
   Max Outstanding Commands: 0
   Number of Namespaces: 1
   Reservation Support: false
   Save/Select Field in Set/Get Feature Support: true
   Write Zeroes Command Support: true
   Dataset Management Command Support: true
   Write Uncorrectable Command Support: true
   Compare Command Support: true
   Fused Operation Support: false
   Cryptographic Erase as Part of Secure Erase Support: false
   Cryptographic Erase and User Data Erase to All Namespaces: false
   Cryptographic Erase and User Data Erase to One Particular Namespace: true
   Format Operation to All Namespaces: false
   Format Opertaion to One Particular Namespace: true
   Volatile Write Cache Is Present: true
   Atomic Write Unit Normal: 0 Logical Blocks
   Atomic Write Unit Power Fail: 0 Logical Blocks
   Format of All NVM Vendor Specific Commands Is Same: false
   Format of All NVM Vendor Specific Commands Is Vendor Specific: true
   Atomic Compare and Write Unit: 0
   SGL Address Specify Offset Support: false
   MPTR Contain SGL Descriptor Support: false
   SGL Length Able to Larger than Data Amount: false
   SGL Length Shall Be Equal to Data Amount: true
   Byte Aligned Contiguous Physical Buffer of Metadata Support: false
   SGL Bit Bucket Descriptor Support: false
   SGL Keyed SGL Data Block Descriptor Support: false
   SGL for NVM Command Set Support: false
   NVM Subsystem NVMe Qualified Name:
   NVM Subsystem NVMe Qualified Name (hex format):
[root@esxi:~]

SCSIからNVMe VMware VMFSデータストアへのオフライン移行手順

[root@esxi:~] esxcli storage vmfs lockmode list
Volume Name                                 UUID                                 Type      Locking Mode  ATS Compatible  ATS Upgrade Modes  ATS Incompatibility Reason
------------------------------------------  -----------------------------------  --------  ------------  --------------  -----------------  --------------------------
datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2  VMFS-6    ATS+SCSI               false  None               Device does not support ATS
OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2  Non-VMFS  ATS+SCSI               false  None               Device does not support ATS
[root@esxi:~]

vomaコマンドでファイルシステムチェック

[root@esxi:~] ls /vmfs/devices/disks/
t10.ATA_____W800S_256GB_____________________________2202211088199_______
t10.ATA_____W800S_256GB_____________________________2202211088199_______:1
t10.ATA_____W800S_256GB_____________________________2202211088199_______:5
t10.ATA_____W800S_256GB_____________________________2202211088199_______:6
t10.ATA_____W800S_256GB_____________________________2202211088199_______:7
t10.ATA_____W800S_256GB_____________________________2202211088199_______:8
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000
t10.NVMe____SPD_SP7002D2TNGH_________________________0200000000000000:1
vml.0100000000303230305f303030305f303030305f3030303000535044205350
vml.0100000000303230305f303030305f303030305f3030303000535044205350:1
vml.01000000003232303232313130383831393920202020202020573830305320
vml.01000000003232303232313130383831393920202020202020573830305320:1
vml.01000000003232303232313130383831393920202020202020573830305320:5
vml.01000000003232303232313130383831393920202020202020573830305320:6
vml.01000000003232303232313130383831393920202020202020573830305320:7
vml.01000000003232303232313130383831393920202020202020573830305320:8
vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2
vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b89f9e2:1
[root@esxi:~] voma -m vmfs -f check -N -d /vmfs/devices/disks/vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b8
9f9e2:1
Running VMFS Checker version 2.1 in check mode
Initializing LVM metadata, Basic Checks will be done

Checking for filesystem activity
Performing filesystem liveness check..|Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).
         Reservation Support is not present for NVME devices
Performing filesystem liveness check..|
########################################################################
#   Warning !!!                                                        #
#                                                                      #
#   You are about to execute VOMA without device reservation.          #
#   Any access to this device from other hosts when VOMA is running    #
#   can cause severe data corruption                                   #
#                                                                      #
#   This mode is supported only under VMware support supervision.      #
########################################################################
Do you want to continue (Y/N)?

0) _Yes
1) _No

Select a number from 0-1: 0
Phase 1: Checking VMFS header and resource files
   Detected VMFS-6 file system (labeled:'nvme2tb') with UUID:68e4cab1-0a865c28-49c0-04ab182311d3, Version 6:82
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.

Total Errors Found:           0
[root@esxi:~]

ファイルシステムが追加されたわけではない?

[root@esxi:~] esxcli storage filesystem rescan
[root@esxi:~] esxcli storage filesystem list
Mount Point                                        Volume Name                                 UUID                                 Mounted  Type            Size          Free
-------------------------------------------------  ------------------------------------------  -----------------------------------  -------  ------  ------------  ------------
/vmfs/volumes/68cad69a-e82d8e40-5b65-5bb7fb6107f2  datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2     true  VMFS-6  118380036096   91743059968
/vmfs/volumes/68cad69a-d23fb18e-73e5-5bb7fb6107f2  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2     true  VMFSOS  128580583424  125363552256
/vmfs/volumes/fa8a25f7-ba40ebee-45ac-f419c9f388e0  BOOTBANK1                                   fa8a25f7-ba40ebee-45ac-f419c9f388e0     true  vfat      4293591040    4022075392
/vmfs/volumes/f43b0450-7e4d6762-c6be-52e6552cc1f8  BOOTBANK2                                   f43b0450-7e4d6762-c6be-52e6552cc1f8     true  vfat      4293591040    4021354496
[root@esxi:~]

スナップショットがある?

[root@esxi:~] esxcli storage vmfs snapshot list
68e4cab1-0a865c28-49c0-04ab182311d3
   Volume Name: nvme2tb
   VMFS UUID: 68e4cab1-0a865c28-49c0-04ab182311d3
   Can mount: true
   Reason for un-mountability:
   Can resignature: true
   Reason for non-resignaturability:
   Unresolved Extent Count: 1
[root@esxi:~]

再署名してみた

[root@esxi:~] esxcli storage vmfs snapshot resignature --volume-label=nvme2tb
[root@esxi:~] esxcli storage vmfs snapshot list
[root@esxi:~] esxcli storage filesystem list
Mount Point                                        Volume Name                                 UUID                                 Mounted  Type             Size           Free
-------------------------------------------------  ------------------------------------------  -----------------------------------  -------  ------  -------------  -------------
/vmfs/volumes/68cad69a-e82d8e40-5b65-5bb7fb6107f2  datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2     true  VMFS-6   118380036096    91743059968
/vmfs/volumes/68e5b682-56352c06-7c60-04ab182311d3  snap-444b0642-nvme2tb                       68e5b682-56352c06-7c60-04ab182311d3     true  VMFS-6  2048162529280  1222844088320
/vmfs/volumes/68cad69a-d23fb18e-73e5-5bb7fb6107f2  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2     true  VMFSOS   128580583424   125363552256
/vmfs/volumes/fa8a25f7-ba40ebee-45ac-f419c9f388e0  BOOTBANK1                                   fa8a25f7-ba40ebee-45ac-f419c9f388e0     true  vfat       4293591040     4022075392
/vmfs/volumes/f43b0450-7e4d6762-c6be-52e6552cc1f8  BOOTBANK2                                   f43b0450-7e4d6762-c6be-52e6552cc1f8     true  vfat       4293591040     4021354496
[root@esxi:~]

スナップショット領域がファイルシステムとして認識された?

名前がsnapとついてるだけで普通のデータストア?

もう1回vomaでチェック

[root@esxi:~] voma -m vmfs -f check -N -d /vmfs/devices/disks/vml.05c56298c6cae09f64ef49957d1d7af93c98b2a5792c87d191b47f87ea5b8
9f9e2:1
Running VMFS Checker version 2.1 in check mode
Initializing LVM metadata, Basic Checks will be done

Checking for filesystem activity
Performing filesystem liveness check..|Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).
         Reservation Support is not present for NVME devices
Performing filesystem liveness check..|
########################################################################
#   Warning !!!                                                        #
#                                                                      #
#   You are about to execute VOMA without device reservation.          #
#   Any access to this device from other hosts when VOMA is running    #
#   can cause severe data corruption                                   #
#                                                                      #
#   This mode is supported only under VMware support supervision.      #
########################################################################
Do you want to continue (Y/N)?

0) _Yes
1) _No

Select a number from 0-1: 0
Phase 1: Checking VMFS header and resource files
   Detected VMFS-6 file system (labeled:'snap-444b0642-nvme2tb') with UUID:68e5b682-56352c06-7c60-04ab182311d3, Version 6:82
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.

Total Errors Found:           0
[root@esxi:~] esxcli storage vmfs snapshot list
[root@esxi:~] esxcli storage filesystem list
Mount Point                                        Volume Name                                 UUID                                 Mounted  Type             Size           Free
-------------------------------------------------  ------------------------------------------  -----------------------------------  -------  ------  -------------  -------------
/vmfs/volumes/68cad69a-e82d8e40-5b65-5bb7fb6107f2  datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2     true  VMFS-6   118380036096    91743059968
/vmfs/volumes/68e5b682-56352c06-7c60-04ab182311d3  snap-444b0642-nvme2tb                       68e5b682-56352c06-7c60-04ab182311d3     true  VMFS-6  2048162529280  1222844088320
/vmfs/volumes/68cad69a-d23fb18e-73e5-5bb7fb6107f2  OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2     true  VMFSOS   128580583424   125363552256
/vmfs/volumes/fa8a25f7-ba40ebee-45ac-f419c9f388e0  BOOTBANK1                                   fa8a25f7-ba40ebee-45ac-f419c9f388e0     true  vfat       4293591040     4022075392
/vmfs/volumes/f43b0450-7e4d6762-c6be-52e6552cc1f8  BOOTBANK2                                   f43b0450-7e4d6762-c6be-52e6552cc1f8     true  vfat       4293591040     4021354496
[root@esxi:~]

特に状況は変わらない

lockmode確認すると、そちらでもデバイスは増えた

[root@esxi:~] esxcli storage vmfs lockmode list
Volume Name                                 UUID                                 Type      Locking Mode  ATS Compatible  ATS Upgrade Modes  ATS Incompatibility Reason
------------------------------------------  -----------------------------------  --------  ------------  --------------  -----------------  --------------------------
datastore1                                  68cad69a-e82d8e40-5b65-5bb7fb6107f2  VMFS-6    ATS+SCSI               false  None               Device does not support ATS
snap-444b0642-nvme2tb                       68e5b682-56352c06-7c60-04ab182311d3  VMFS-6    ATS+SCSI               false  None               Device does not support ATS
OSDATA-68cad69a-d23fb18e-73e5-5bb7fb6107f2  68cad69a-d23fb18e-73e5-5bb7fb6107f2  Non-VMFS  ATS+SCSI               false  None               Device does not support ATS
[root@esxi:~]

とりあえずESXiを再起動

再起動してみても、同じ認識状況だったので、snapを普通の名前に変えて使用継続することとした