テスト用に作ったceph環境でOSDが落ちまくるので osd heartbeat grace を変更してみた(様子見中


Proxmox VEのcephストレージ環境の動作を確認するためESXi上に RAM 16GBの仮想マシンを4台作ってテスト中(+1台 corosync qnetdサーバがいてProxmox VEクラスタの維持に使用)

で、あるタイミングから、各ノード上のosdのdownが多発するようになった

root@pve37:~# ceph osd df tree
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
-1         0.62549         -  480 GiB  210 GiB  204 GiB  178 KiB  5.4 GiB  270 GiB  43.65  1.00    -          root default
-3         0.15637         -  160 GiB   59 GiB   58 GiB   47 KiB  1.4 GiB  101 GiB  36.92  0.85    -              host pve36
 0    hdd  0.03909   1.00000   40 GiB   15 GiB   14 GiB    7 KiB  403 MiB   25 GiB  36.27  0.83   36      up          osd.0
 1    hdd  0.03909   1.00000   40 GiB   18 GiB   17 GiB   13 KiB  332 MiB   22 GiB  44.03  1.01   52      up          osd.1
 2    hdd  0.03909   1.00000   40 GiB   11 GiB   10 GiB   18 KiB  337 MiB   29 GiB  26.46  0.61   27      up          osd.2
 3    hdd  0.03909   1.00000   40 GiB   16 GiB   16 GiB    9 KiB  393 MiB   24 GiB  40.91  0.94   48      up          osd.3
-5         0.15637         -  160 GiB   67 GiB   66 GiB   75 KiB  1.6 GiB   93 GiB  41.95  0.96    -              host pve37
 4    hdd  0.03909   1.00000   40 GiB   19 GiB   18 GiB   24 KiB  443 MiB   21 GiB  46.87  1.07   51      up          osd.4
 5    hdd  0.03909   1.00000   40 GiB   11 GiB   11 GiB   21 KiB  201 MiB   29 GiB  28.58  0.65   30      up          osd.5
 6    hdd  0.03909   1.00000   40 GiB   16 GiB   16 GiB   12 KiB  294 MiB   24 GiB  39.51  0.91   40      up          osd.6
 7    hdd  0.03909   1.00000   40 GiB   21 GiB   20 GiB   18 KiB  693 MiB   19 GiB  52.84  1.21   61      up          osd.7
-7         0.15637         -   80 GiB   49 GiB   47 GiB   36 KiB  1.3 GiB   31 GiB  60.91  1.40    -              host pve38
 8    hdd  0.03909         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down          osd.8
 9    hdd  0.03909         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down          osd.9
10    hdd  0.03909   1.00000   40 GiB   20 GiB   20 GiB   17 KiB  415 MiB   20 GiB  51.02  1.17   53      up          osd.10
11    hdd  0.03909   1.00000   40 GiB   28 GiB   27 GiB   19 KiB  922 MiB   12 GiB  70.80  1.62   73      up          osd.11
-9         0.15637         -   80 GiB   35 GiB   34 GiB   20 KiB  1.1 GiB   45 GiB  43.27  0.99    -              host pve39
12    hdd  0.03909   1.00000   40 GiB   20 GiB   20 GiB    7 KiB  824 MiB   20 GiB  50.81  1.16   63      up          osd.12
13    hdd  0.03909         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down          osd.13
14    hdd  0.03909   1.00000   40 GiB   14 GiB   14 GiB   13 KiB  303 MiB   26 GiB  35.72  0.82    0    down          osd.14
15    hdd  0.03909         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down          osd.15
                       TOTAL  480 GiB  210 GiB  204 GiB  183 KiB  5.4 GiB  270 GiB  43.65
MIN/MAX VAR: 0.61/1.62  STDDEV: 11.55
root@pve37:~#

pve38のosd.8とosd.9がdownになっているので、pve38にログインしてプロセスを確認すると、–id 8 と –id 9のceph-osdサービスが起動していないので、これらを再起動する

root@pve38:~# ps -ef|grep osd
ceph        1676       1  1 12:14 ?        00:02:01 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
ceph        1681       1  2 12:14 ?        00:02:45 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph
root       30916   30893  0 14:10 pts/0    00:00:00 grep osd
root@pve38:~# systemctl restart ceph-osd@8
root@pve38:~# systemctl restart ceph-osd@9
root@pve38:~#

しばらく待つとupになる

root@pve38:~# ceph osd df tree
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
-1         0.62549         -  600 GiB  227 GiB  221 GiB  229 KiB  5.6 GiB  373 GiB  37.84  1.00    -          root default
-3         0.15637         -  160 GiB   53 GiB   52 GiB   47 KiB  1.4 GiB  107 GiB  33.11  0.88    -              host pve36
 0    hdd  0.03909   1.00000   40 GiB   13 GiB   12 GiB    7 KiB  403 MiB   27 GiB  31.50  0.83   28      up          osd.0
 1    hdd  0.03909   1.00000   40 GiB   16 GiB   16 GiB   13 KiB  332 MiB   24 GiB  40.30  1.07   47      up          osd.1
 2    hdd  0.03909   1.00000   40 GiB  9.7 GiB  9.4 GiB   18 KiB  337 MiB   30 GiB  24.21  0.64   22      up          osd.2
 3    hdd  0.03909   1.00000   40 GiB   15 GiB   14 GiB    9 KiB  393 MiB   25 GiB  36.41  0.96   41      up          osd.3
-5         0.15637         -  160 GiB   61 GiB   59 GiB   75 KiB  1.6 GiB   99 GiB  37.89  1.00    -              host pve37
 4    hdd  0.03909   1.00000   40 GiB   16 GiB   15 GiB   24 KiB  443 MiB   24 GiB  39.75  1.05   41      up          osd.4
 5    hdd  0.03909   1.00000   40 GiB   10 GiB   10 GiB   21 KiB  201 MiB   30 GiB  25.52  0.67   26      up          osd.5
 6    hdd  0.03909   1.00000   40 GiB   14 GiB   13 GiB   12 KiB  278 MiB   26 GiB  34.26  0.91   32      up          osd.6
 7    hdd  0.03909   1.00000   40 GiB   21 GiB   20 GiB   18 KiB  693 MiB   19 GiB  52.02  1.37   52      up          osd.7
-7         0.15637         -  120 GiB   57 GiB   55 GiB   54 KiB  1.5 GiB   63 GiB  47.17  1.25    -              host pve38
 8    hdd  0.03909   1.00000   40 GiB   14 GiB   14 GiB   18 KiB  132 MiB   26 GiB  35.75  0.94   30      up          osd.8
 9    hdd  0.03909   1.00000      0 B      0 B      0 B      0 B      0 B      0 B      0     0   22      up          osd.9
10    hdd  0.03909   1.00000   40 GiB   18 GiB   18 GiB   17 KiB  419 MiB   22 GiB  45.92  1.21   42      up          osd.10
11    hdd  0.03909   1.00000   40 GiB   24 GiB   23 GiB   19 KiB  939 MiB   16 GiB  59.84  1.58   42      up          osd.11
-9         0.15637         -  160 GiB   57 GiB   56 GiB   53 KiB  1.2 GiB  103 GiB  35.51  0.94    -              host pve39
12    hdd  0.03909   1.00000   40 GiB   15 GiB   14 GiB    7 KiB  841 MiB   25 GiB  37.05  0.98   37      up          osd.12
13    hdd  0.03909   1.00000   40 GiB   16 GiB   16 GiB   16 KiB  144 MiB   24 GiB  39.70  1.05   42      up          osd.13
14    hdd  0.03909   1.00000   40 GiB   14 GiB   14 GiB   16 KiB   84 MiB   26 GiB  35.82  0.95   39      up          osd.14
15    hdd  0.03909   1.00000   40 GiB   12 GiB   12 GiB   14 KiB  127 MiB   28 GiB  29.48  0.78   30      up          osd.15
                       TOTAL  600 GiB  227 GiB  221 GiB  236 KiB  5.6 GiB  373 GiB  37.84
MIN/MAX VAR: 0/1.58  STDDEV: 12.91
root@pve38:~#

が・・・またしばらくすると、他のosdが落ちる、などしていた

RedHat Ceph Storage 7 トラブルシューティングガイドの「第5章 Ceph OSD のトラブルシューティング5.1.7. OSDS のフラップ を確認すると、osdに指定されているディスクが遅いから、ということになるようだ。

osd_heartbeat_grace_time というパラメータをデフォルトの20秒から変更すると、タイムアウトまでの値を緩和できるのかな、と思ったのだが、どうやって設定するのかが不明…

ceph.orgのOSD Setting を見ると /etc/ceph/ceph.conf (PVEの場合、 /etc/pve/ceph.conf )に追加すればいいのかな?というところなんだけど、OSD Config Reference , Configuring Monitor/OSD Interaction を見ても osd_heartbeat_grace_time というパラメータが無い…(osd_heartbeat_grace ならあった)

RedHatドキュメントの続きに書いてある「この問題を解決するには、以下を行います。」のところを見ると、「ceph osd set noup」「ceph osd set nodown」を設定して、OSDをdownおよびupとしてマークするのを停止する、とある。

試しにnoup,nodownの療法を設定してみたところ、OSDサービスを起動してもceph osd df treeで確認するとdownのままとなっていた。

まあ、upになったとしてもupのマークを付けないのが「noup」だから当然ですね・・・

そんなわけで、「ceph osd unset noup」「ceph osd set nodown」でdownにしない、という設定を入れてみた

設定を入れると「ceph osd stat」での状態確認で「flags nodown」と表示されるようになる。

root@pve38:~# ceph osd stat
16 osds: 16 up (since 62m), 16 in (since 62m); epoch: e4996
flags nodown
root@pve38:~#

とりあえず、これで一時的なごまかしはできた。

ただ、これは、OSDで使用しているディスクが壊れたとしても downにならない、ということでもある。

なので、「nodown」フラグを設定しっぱなしで使う、というのはとても不適切となる。

ちゃんとした対処を行うためには、具体的に何が問題になっているのかを「ceph health detail」を実行して、具体的にSlow OSD heartbeats がどれくらい遅いのかを確認する

root@pve38:~# ceph health detail
HEALTH_WARN nodown flag(s) set; Slow OSD heartbeats on back (longest 5166.450ms); Slow OSD heartbeats on front (longest 5467.151ms)
[WRN] OSDMAP_FLAGS: nodown flag(s) set
[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 5166.450ms)
    Slow OSD heartbeats on back from osd.13 [] to osd.8 [] 5166.450 msec
    Slow OSD heartbeats on back from osd.13 [] to osd.0 [] 3898.044 msec
    Slow OSD heartbeats on back from osd.12 [] to osd.9 [] 3268.881 msec
    Slow OSD heartbeats on back from osd.10 [] to osd.3 [] 2610.064 msec possibly improving
    Slow OSD heartbeats on back from osd.12 [] to osd.8 [] 2588.321 msec
    Slow OSD heartbeats on back from osd.6 [] to osd.14 [] 2565.141 msec
    Slow OSD heartbeats on back from osd.8 [] to osd.7 [] 2385.851 msec possibly improving
    Slow OSD heartbeats on back from osd.13 [] to osd.11 [] 2324.505 msec
    Slow OSD heartbeats on back from osd.8 [] to osd.12 [] 2305.474 msec possibly improving
    Slow OSD heartbeats on back from osd.14 [] to osd.11 [] 2275.033 msec
    Truncated long network list.  Use ceph daemon mgr.# dump_osd_network for more information
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 5467.151ms)
    Slow OSD heartbeats on front from osd.13 [] to osd.8 [] 5467.151 msec
    Slow OSD heartbeats on front from osd.13 [] to osd.0 [] 3956.364 msec
    Slow OSD heartbeats on front from osd.12 [] to osd.9 [] 3513.493 msec
    Slow OSD heartbeats on front from osd.12 [] to osd.8 [] 2657.999 msec
    Slow OSD heartbeats on front from osd.6 [] to osd.14 [] 2657.486 msec
    Slow OSD heartbeats on front from osd.10 [] to osd.3 [] 2610.558 msec possibly improving
    Slow OSD heartbeats on front from osd.8 [] to osd.7 [] 2436.661 msec possibly improving
    Slow OSD heartbeats on front from osd.14 [] to osd.11 [] 2351.914 msec
    Slow OSD heartbeats on front from osd.14 [] to osd.10 [] 2351.812 msec
    Slow OSD heartbeats on front from osd.13 [] to osd.11 [] 2335.698 msec
    Truncated long network list.  Use ceph daemon mgr.# dump_osd_network for more information
root@pve38:~#

osd.7のログが出てるpve37にログインして /var/log/ceph/ceph-osd.7.log から「no replay from」と「osd.8」でgrep をかけてログを確認

おそらく「Slow OSD heartbeats on front from osd.8 [] to osd.7 [] 2436.661 msec」に相当するあたりがコレなのかな?というところ


2024-11-14T14:46:05.457+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:02.037605+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:06.454+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:02.037605+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:07.467+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:07.338127+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:08.418+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:07.338127+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:09.371+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:09.038264+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:10.416+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:09.038264+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:11.408+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:11.338592+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)

oldset deadlineにある時刻と、その前にある時刻の差は20秒なので、 osd_heartbeat_grace もしくは osd_heartbeat_grace_time のデフォルト値 20 が効いてるんだろうなぁ、と推定できる

設定手法について記載を探してみたのだがなかなかない

 Ceph Block Device 3rd Party Integration »  Ceph iSCSI Gateway »  iSCSI Gateway Requirements に下記のような設定例がある

[osd]
osd heartbeat grace = 20
osd heartbeat interval = 5

また、下記のように個別OSDに対して値を設定することも可能であるようだ

ceph tell osd.* config set osd_heartbeat_grace 20
ceph tell osd.* config set osd_heartbeat_interval 5
ceph daemon osd.0 config set osd_heartbeat_grace 20
ceph daemon osd.0 config set osd_heartbeat_interval 5

ceph tellの書式を確認すると「ceph tell osd.* config get osd_heartbeat_grace」で値がとれる模様

root@pve37:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
    "osd_heartbeat_grace": "20"
}
osd.1: {
    "osd_heartbeat_grace": "20"
}
osd.2: {
    "osd_heartbeat_grace": "20"
}
osd.3: {
    "osd_heartbeat_grace": "20"
}
osd.4: {
    "osd_heartbeat_grace": "20"
}
osd.5: {
    "osd_heartbeat_grace": "20"
}
osd.6: {
    "osd_heartbeat_grace": "20"
}
osd.7: {
    "osd_heartbeat_grace": "20"
}
osd.8: {
    "osd_heartbeat_grace": "20"
}
osd.9: {
    "osd_heartbeat_grace": "20"
}
osd.10: {
    "osd_heartbeat_grace": "20"
}
osd.11: {
    "osd_heartbeat_grace": "20"
}
osd.12: {
    "osd_heartbeat_grace": "20"
}
osd.13: {
    "osd_heartbeat_grace": "20"
}
osd.14: {
    "osd_heartbeat_grace": "20"
}
osd.15: {
    "osd_heartbeat_grace": "20"
}
root@pve37:~#

とりあえず「ceph tell osd.* config set osd_heartbeat_grace 30」と実行し、30に設定してみる

root@pve37:~# ceph tell osd.* config set osd_heartbeat_grace 30
osd.0: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.1: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.2: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.3: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.4: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.5: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.6: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.7: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.8: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.9: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.10: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.11: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.12: {
    "success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.13: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.14: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.15: {
    "success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
root@pve37:~#

すべて「”success”」ではあるので設定変更は完了しているのだと思うが、応答が2種類あるのはなんなのだろうか?

設定が変更されたかどうかを確認

root@pve37:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
    "osd_heartbeat_grace": "30"
}
osd.1: {
    "osd_heartbeat_grace": "30"
}
osd.2: {
    "osd_heartbeat_grace": "30"
}
osd.3: {
    "osd_heartbeat_grace": "30"
}
osd.4: {
    "osd_heartbeat_grace": "30"
}
osd.5: {
    "osd_heartbeat_grace": "30"
}
osd.6: {
    "osd_heartbeat_grace": "30"
}
osd.7: {
    "osd_heartbeat_grace": "30"
}
osd.8: {
    "osd_heartbeat_grace": "30"
}
osd.9: {
    "osd_heartbeat_grace": "30"
}
osd.10: {
    "osd_heartbeat_grace": "30"
}
osd.11: {
    "osd_heartbeat_grace": "30"
}
osd.12: {
    "osd_heartbeat_grace": "30"
}
osd.13: {
    "osd_heartbeat_grace": "30"
}
osd.14: {
    "osd_heartbeat_grace": "30"
}
osd.15: {
    "osd_heartbeat_grace": "30"
}
root@pve37:~#

とはいえ、set時の出力に「(not observed, change may require restart)」とあるとおり、ceph-osdの再起動が必須であるようだ

/etc/pve/ceph.conf に変更したパラメータは反映されてない模様なので、 osd.4~osd.7があるサーバを再起動してからもう一度値を確認してみたら、20に戻っていた。

root@pve38:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
    "osd_heartbeat_grace": "30"
}
osd.1: {
    "osd_heartbeat_grace": "30"
}
osd.2: {
    "osd_heartbeat_grace": "30"
}
osd.3: {
    "osd_heartbeat_grace": "30"
}
osd.4: {
    "osd_heartbeat_grace": "20"
}
osd.5: {
    "osd_heartbeat_grace": "20"
}
osd.6: {
    "osd_heartbeat_grace": "20"
}
osd.7: {
    "osd_heartbeat_grace": "20"
}
osd.8: {
    "osd_heartbeat_grace": "30"
}
osd.9: {
    "osd_heartbeat_grace": "30"
}
osd.10: {
    "osd_heartbeat_grace": "30"
}
osd.11: {
    "osd_heartbeat_grace": "30"
}
osd.12: {
    "osd_heartbeat_grace": "30"
}
osd.13: {
    "osd_heartbeat_grace": "30"
}
osd.14: {
    "osd_heartbeat_grace": "30"
}
osd.15: {
    "osd_heartbeat_grace": "30"
}
root@pve38:~#

/etc/pve/ceph.conf の最後に下記を追加

[osd]
        osd heartbeat grace = 30

設定後、再起動してから確認すると、想定通り30になっているのを確認。そもそも、osd_heartbeat_grace についてはceph tellコマンドでの設定変更後、再起動しないでも大丈夫、というやつなんでは?

root@pve38:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
    "osd_heartbeat_grace": "30"
}
osd.1: {
    "osd_heartbeat_grace": "30"
}
osd.2: {
    "osd_heartbeat_grace": "30"
}
osd.3: {
    "osd_heartbeat_grace": "30"
}
osd.4: {
    "osd_heartbeat_grace": "30"
}
osd.5: {
    "osd_heartbeat_grace": "30"
}
osd.6: {
    "osd_heartbeat_grace": "30"
}
osd.7: {
    "osd_heartbeat_grace": "30"
}
osd.8: {
    "osd_heartbeat_grace": "30"
}
osd.9: {
    "osd_heartbeat_grace": "30"
}
osd.10: {
    "osd_heartbeat_grace": "30"
}
osd.11: {
    "osd_heartbeat_grace": "30"
}
osd.12: {
    "osd_heartbeat_grace": "30"
}
osd.13: {
    "osd_heartbeat_grace": "30"
}
osd.14: {
    "osd_heartbeat_grace": "30"
}
osd.15: {
    "osd_heartbeat_grace": "30"
}
root@pve38:~#

Proxmox VEクラスタをUPSで停止する手法のメモ(調査段階


Proxmox VE環境でcephによるストレージ領域を作成して、物理的にディスクを共有していない状態で複数サーバ間を1ファイルシステムで運用できる環境についての試験中。

Proxmox VEクラスタをUPS連動で停止する場合の処理について確認しているのだが、公式ドキュメントにそのまま使えるようなものがないので、情報を集めているところ・・・

Proxmox VE公式ドキュメント:Shutdown Proxmox VE + Ceph HCI cluster
これはceph側の処理だけかかれていて、ceph側を止める前には仮想マシンを停止しなければならないのに、そこについて触れていない。

Proxmox VEフォーラムを探す
Clean shutdown of whole cluster (2023/01/16)
Shutdown of the Hyper-Converged Cluster (CEPH) (2020/04/05)

ここらのスクリプトが使えそうだが、仮想マシン/コンテナを停止するのは「各ノードで pvenode stopallを実行」ではなく、「APIを使ってすべてを停止する」が推奨される模様。

ここから試験

「pvesh get /nodes」でノードリストを作って、「ssh ホスト名 pvenode stopall」で仮想マシンを停止できるか試してみたが、管理Web上は「VMとコンテナの一括シャットダウン」と出力されるのだが、仮想マシンの停止が実行されなかった。

3.11.3. Bulk Guest Power Management を見ると止まっても良さそうなんだけど・・・

pvesh コマンドを調べるとこちらでも停止させることができる模様なので下記で実施

for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`; do for vmid in `pvesh ls /nodes/$hostname/qemu/|awk '{ print $2 }'`; do pvesh create /nodes/$hostname/qemu/$vmid/status/shutdown; done; done

実行例

root@pve36:~# for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`
> do for vmid in `pvesh ls /nodes/$hostname/qemu/|awk '{ print $2 }'`
>   do
>     pvesh create /nodes/$hostname/qemu/$vmid/status/shutdown
>   done
> done
VM 102 not running

Requesting HA stop for VM 102
UPID:pve36:00014A47:0018FC0A:6731A591:hastop:102:root@pam:
VM 100 not running
Requesting HA stop for VM 100"UPID:pve37:00013A71:0018FDDC:6731A597:hastop:100:root@pam:"
VM 101 not running
Requesting HA stop for VM 101"UPID:pve38:000144BF:00192F03:6731A59D:hastop:101:root@pam:"
Requesting HA stop for VM 103"UPID:pve38:000144DC:0019305D:6731A5A0:hastop:103:root@pam:"
root@pve36:~#

これで、停止することを確認

次に仮想マシンの停止確認

root@pve36:~# pvesh get /nodes/pve36/qemu
lqqqqqqqqqwqqqqqqwqqqqqqwqqqqqqwqqqqqqqqqqqwqqqqqqqqqqwqqqqqqqqqwqqqqqwqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqk
x status  x vmid x cpus x lock x   maxdisk x   maxmem x name    x pid x qmpstatus x running-machine x running-qemu x tags x uptime x
tqqqqqqqqqnqqqqqqnqqqqqqnqqqqqqnqqqqqqqqqqqnqqqqqqqqqqnqqqqqqqqqnqqqqqnqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqu
x stopped x  102 x    2 x      x 32.00 GiB x 2.00 GiB x testvm2 x     x           x                 x              x      x     0s x
mqqqqqqqqqvqqqqqqvqqqqqqvqqqqqqvqqqqqqqqqqqvqqqqqqqqqqvqqqqqqqqqvqqqqqvqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqj
root@pve36:~#

ヘッダーを付けない形式で出力

root@pve36:~# pvesh get /nodes/pve36/qemu --noborder --noheader
stopped  102    2      32.00 GiB 2.00 GiB testvm2                                                 0s
root@pve36:~#

“stopped” となってない行があればまだ停止していない、ということになるので「pvesh get /nodes/pve36/qemu –noborder –noheader|grep -v “stopped”」の出力結果があるかどうかで判断できそう

また、これらはqemu仮想マシンについてのみなので、lxcコンテナについては含まれないので、そちらについても対応する

停止

for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`; do for vmid in `pvesh ls /nodes/$hostname/lxc/|awk '{ print $2 }'`; do pvesh create /nodes/$hostname/lxc/$vmid/status/shutdown;done;done

仮想マシンが止まったかを判断するには、すべての仮想マシンの状態が”stopped”になっているか、で判定するなら下記

for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`; do  echo "=== $hostname ===";    flag=0;  while [ $flag -eq 0 ];  do    pvesh get /nodes/$hostname/qemu --noborder --noheader|grep -v "stopped" > /dev/null;    flag=$?;    echo $flag;  done; done

すべての仮想マシンの状態で”running”がないことなら

for hostname in pvesh ls /nodes/|awk '{ print $2 }'; do echo “=== $hostname ===”; flag=0; while [ $flag -eq 0 ]; do pvesh get /nodes/$hostname/qemu –noborder –noheader|grep “running” > /dev/null ; flag=$?; echo $flag; done; done

どっちがいいかは悩むところ


Cephの停止についてはRedHatの「2.10. Red Hat Ceph Storage クラスターの電源をオフにして再起動」とProxmoxの「Shutdown Proxmox VE + Ceph HCI cluster 」を確認

Proxmox VE側だと下記だけ

ceph osd set noout
ceph osd set norecover
ceph osd set norebalance
ceph osd set nobackfill
ceph osd set nodown
ceph osd set pause

RedHat側にはこれらを実行する前にcephfsを停止するための手順が追加されている。

ceph fs set FS_NAME max_mds 1
ceph mds deactivate FS_NAME:1 # rank 2 of 2
ceph status # wait for rank 1 to finish stopping
ceph fs set FS_NAME cluster_down true
ceph mds fail FS_NAME:0

ceph fs setで設定しているmax_mdsとcluster_downの値はどうなっているのかを確認

root@pve36:~# ceph fs get cephfs
Filesystem 'cephfs' (1)
fs_name cephfs
epoch   65
flags   12 joinable allow_snaps allow_multimds_snaps
created 2024-11-05T14:29:45.941671+0900
modified        2024-11-11T11:04:06.223151+0900
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
max_xattr_size  65536
required_client_features        {}
last_failure    0
last_failure_osd_epoch  3508
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in      0
up      {0=784113}
failed
damaged
stopped
data_pools      [3]
metadata_pool   4
inline_data     disabled
balancer
bal_rank_mask   -1
standby_count_wanted    1
[mds.pve36{0:784113} state up:active seq 29 addr [v2:172.17.44.36:6800/1472122357,v1:172.17.44.36:6801/1472122357] compat {c=[1],r=[1],i=[7ff]}]
root@pve36:~#

cluster_downはない?

root@pve36:~# ceph mds stat
cephfs:1 {0=pve36=up:active} 1 up:standby
root@pve36:~# ceph mds compat show
compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
root@pve36:~#

んー? ceph fsのcluster_down を実行してみる

root@pve36:~# ceph fs set cephfs cluster_down true
cephfs marked not joinable; MDS cannot join as newly active. WARNING: cluster_down flag is deprecated and will be removed in a future version. Please use "joinable".
root@pve36:~#

joinableを使えとあるので、この記述は古いらしい

おや?と思って再度探したところRedHatのドキュメントが古かった。RedHat「2.5. Red Hat Ceph Storage クラスターの電源をオフにして再起動」、もしくはIBMの「Ceph File System ・クラスターの停止

ceph fs set FS_NAME max_mds 1
ceph fs fail FS_NAME
ceph status
ceph fs set FS_NAME joinable false

IBM手順の方だとmax_mdsの操作は行わずに実施している

root@pve36:~# ceph fs status
cephfs - 4 clients
======
RANK  STATE    MDS      ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active  pve36  Reqs:    0 /s    21     20     16     23
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   122G
  cephfs_data      data    31.8G   122G
STANDBY MDS
   pve37
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
root@pve36:~# ceph fs status
cephfs - 0 clients
======
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
 0    failed
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   122G
  cephfs_data      data    31.8G   122G
STANDBY MDS
   pve37
   pve36
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
root@pve36:~# echo $?
0
root@pve36:~#  ceph fs set cephfs joinable false
cephfs marked not joinable; MDS cannot join as newly active.
root@pve36:~# echo $?
0
root@pve36:~# ceph fs status
cephfs - 0 clients
======
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
 0    failed
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   122G
  cephfs_data      data    31.8G   122G
STANDBY MDS
   pve37
   pve36
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~#

…止めてしまったら、dfコマンドが実行できなくなるので注意

root@pve36:~# ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3599
root@pve36:~# ceph osd set noout
noout is set
root@pve36:~# ceph osd set norecover
norecover is set
root@pve36:~# ceph osd set norebalance
norebalance is set
root@pve36:~# ceph osd set nobackfill
nobackfill is set
root@pve36:~# ceph osd set nodown
nodown is set
root@pve36:~# ceph osd set pause
pauserd,pausewr is set
root@pve36:~# ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3605
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~#

ドキュメントに「重要 上記の例は、OSD ノード内のサービスと各 OSD を停止する場合のみであり、各 OSD ノードで繰り返す必要があります。」とあるので、各サーバで確認してみたが、別に各サーバで実行する必要はなさそうである。

root@pve36:~# ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3605
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~# ssh pve37 ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3605
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~# ssh pve38 ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3605
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~# ssh pve39 ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3605
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~#

このあと、各サーバに対してshutdown -h nowを実行して止めた

起動後

root@pve36:~# ceph status
  cluster:
    id:     4647497d-17da-46f4-8e7b-231365d96e42
    health: HEALTH_ERR
            1 filesystem is degraded
            1 filesystem is offline
            pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set

  services:
    mon: 3 daemons, quorum pve36,pve37,pve38 (age 41s)
    mgr: pve38(active, since 30s), standbys: pve37, pve36
    mds: 0/1 daemons up (1 failed), 2 standby
    osd: 16 osds: 16 up (since 48s), 16 in (since 3d)
         flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover

  data:
    volumes: 0/1 healthy, 1 failed
    pools:   4 pools, 193 pgs
    objects: 17.68k objects, 69 GiB
    usage:   206 GiB used, 434 GiB / 640 GiB avail
    pgs:     193 active+clean

root@pve36:~# ceph osd stat
16 osds: 16 up (since 69s), 16 in (since 3d); epoch: e3621
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~#

root@pve36:~# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
root@pve36:~# ceph fs status
cephfs - 0 clients
======
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
 0    failed
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   125G
  cephfs_data      data    31.8G   125G
STANDBY MDS
   pve37
   pve36
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~#

復帰のためのコマンド群1

root@pve36:~# ceph osd unset noout
noout is unset
root@pve36:~# ceph osd unset norecover
norecover is unset
root@pve36:~# ceph osd unset norebalance
norebalance is unset
root@pve36:~#  ceph osd unset nobackfill
nobackfill is unset
root@pve36:~# ceph osd unset nodown
nodown is unset
root@pve36:~# ceph osd unset pause
pauserd,pausewr is unset
root@pve36:~# ceph osd stat
16 osds: 16 up (since 100s), 16 in (since 3d); epoch: e3627
root@pve36:~# ceph status
  cluster:
    id:     4647497d-17da-46f4-8e7b-231365d96e42
    health: HEALTH_ERR
            1 filesystem is degraded
            1 filesystem is offline

  services:
    mon: 3 daemons, quorum pve36,pve37,pve38 (age 102s)
    mgr: pve38(active, since 90s), standbys: pve37, pve36
    mds: 0/1 daemons up (1 failed), 2 standby
    osd: 16 osds: 16 up (since 108s), 16 in (since 3d)

  data:
    volumes: 0/1 healthy, 1 failed
    pools:   4 pools, 193 pgs
    objects: 17.68k objects, 69 GiB
    usage:   206 GiB used, 434 GiB / 640 GiB avail
    pgs:     193 active+clean

  io:
    client:   21 KiB/s rd, 0 B/s wr, 9 op/s rd, 1 op/s wr

root@pve36:~#

ファイルシステム再開

root@pve36:~# ceph fs set cephfs joinable true
cephfs marked joinable; MDS may join as newly active.
root@pve36:~# ceph fs status
cephfs - 4 clients
======
RANK    STATE     MDS   ACTIVITY   DNS    INOS   DIRS   CAPS
 0    reconnect  pve36              10     10      6      0
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   125G
  cephfs_data      data    31.8G   125G
STANDBY MDS
   pve37
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# ceph osd stat
16 osds: 16 up (since 2m), 16 in (since 3d); epoch: e3627
root@pve36:~# cpeh status
-bash: cpeh: command not found
root@pve36:~# ceph status
  cluster:
    id:     4647497d-17da-46f4-8e7b-231365d96e42
    health: HEALTH_WARN
            1 filesystem is degraded

  services:
    mon: 3 daemons, quorum pve36,pve37,pve38 (age 2m)
    mgr: pve38(active, since 2m), standbys: pve37, pve36
    mds: 1/1 daemons up, 1 standby
    osd: 16 osds: 16 up (since 2m), 16 in (since 3d)

  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   4 pools, 193 pgs
    objects: 17.68k objects, 69 GiB
    usage:   206 GiB used, 434 GiB / 640 GiB avail
    pgs:     193 active+clean

root@pve36:~#
root@pve36:~# ceph fs status
cephfs - 4 clients
======
RANK  STATE    MDS   ACTIVITY   DNS    INOS   DIRS   CAPS
 0    rejoin  pve36              10     10      6      0
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   125G
  cephfs_data      data    31.8G   125G
STANDBY MDS
   pve37
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# df
Filesystem           1K-blocks     Used Available Use% Mounted on
udev                   8156156        0   8156156   0% /dev
tmpfs                  1638000     1124   1636876   1% /run
/dev/mapper/pve-root  28074060 14841988  11780656  56% /
tmpfs                  8189984    73728   8116256   1% /dev/shm
tmpfs                     5120        0      5120   0% /run/lock
/dev/fuse               131072       36    131036   1% /etc/pve
tmpfs                  8189984       28   8189956   1% /var/lib/ceph/osd/ceph-2
tmpfs                  8189984       28   8189956   1% /var/lib/ceph/osd/ceph-0
tmpfs                  8189984       28   8189956   1% /var/lib/ceph/osd/ceph-1
tmpfs                  8189984       28   8189956   1% /var/lib/ceph/osd/ceph-3
tmpfs                  1637996        0   1637996   0% /run/user/0
root@pve36:~# ceph fs status
cephfs - 0 clients
======
RANK  STATE    MDS      ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active  pve36  Reqs:    0 /s    10     10      6      0
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   244M   125G
  cephfs_data      data    31.8G   125G
STANDBY MDS
   pve37
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# df
Filesystem                               1K-blocks     Used Available Use% Mounted on
udev                                       8156156        0   8156156   0% /dev
tmpfs                                      1638000     1128   1636872   1% /run
/dev/mapper/pve-root                      28074060 14841992  11780652  56% /
tmpfs                                      8189984    73728   8116256   1% /dev/shm
tmpfs                                         5120        0      5120   0% /run/lock
/dev/fuse                                   131072       36    131036   1% /etc/pve
tmpfs                                      8189984       28   8189956   1% /var/lib/ceph/osd/ceph-2
tmpfs                                      8189984       28   8189956   1% /var/lib/ceph/osd/ceph-0
tmpfs                                      8189984       28   8189956   1% /var/lib/ceph/osd/ceph-1
tmpfs                                      8189984       28   8189956   1% /var/lib/ceph/osd/ceph-3
tmpfs                                      1637996        0   1637996   0% /run/user/0
172.17.44.36,172.17.44.37,172.17.44.38:/ 142516224 11116544 131399680   8% /mnt/pve/cephfs
root@pve36:~#

ただ、これだと通常のPVE起動プロセスで実行される「VMとコンテナの一括起動」で仮想マシンが実行されなかった。おや?と思ったら、設定が変わってた

root@pve36:~# ha-manager status
quorum OK
master pve39 (active, Mon Nov 11 18:24:13 2024)
lrm pve36 (idle, Mon Nov 11 18:24:15 2024)
lrm pve37 (idle, Mon Nov 11 18:24:18 2024)
lrm pve38 (idle, Mon Nov 11 18:24:18 2024)
lrm pve39 (idle, Mon Nov 11 18:24:15 2024)
service vm:100 (pve37, stopped)
service vm:101 (pve38, stopped)
service vm:102 (pve36, stopped)
service vm:103 (pve38, stopped)
root@pve36:~# ha-manager config
vm:100
        state stopped

vm:101
        state stopped

vm:102
        state stopped

vm:103
        state stopped

root@pve36:~#

Debianベースの商用ディストリビューションeLxr (エリクサー)


ITmediaに「CentOS後継の“本命”となるか ウインドリバーCTOが語る最新Linuxディストリビューションの狙い」という記事が掲載されていた。

RHELベースのディストリビューションかな?と読み進めてみると、組み込み向けでシェアを持ってるWind RiverがDebianベースで出した商用ディストリビューション eLxr Proの宣伝記事だった。

年間いくらのライセンス料金?とページを見てみるも「お問い合わせ」しかなく価格帯不明。

英語版eLxr Proのページを見てみると、もうちょっと内容があってFAQに eLxr project が紹介されていたが価格については見当たらず、StoreにもeLxr関連の記述がない。

記事を探すとZDnet「ウインドリバー、企業向けLinux商用版の国内提供を発表–エッジ領域などに展開」にて、年間7万円から、という記載があった。

 eLxr Server Proおよびサービスは、仮想/物理サーバー1台当たりの年間契約のサブスクリプションになり、サポート内容に応じた2つのメニューをそろえる。標準メニューの「Enterprise」の料金は年間7万円から、Enterpriseの内容に加え高度なSLAに基づく技術サポートや通信事業者レベルのサポートを追加する「Priority」が同15万2000円からとなっている。

RHELのサイオステクノロジーでの価格が年間約13万円~ と比較すると安いけど・・・


本題と外れるが、CentOS の代替としてRHEL系Linuxを選ぶ場合に検討対象となるのは以下になるであろう

RedHat Enterprise Linux (RHEL)
 まあ、公式ですが、お値段が問題となる
Alma LinuxRocky Linux
 現状、どちらにすべき、というコレといったものがないので、好みやサポートを買える場所があるかどうかで選ぶ。例えば下記のような動きとか
  Rocky Linuxを作る主体のCIQ社はOracleとSUSEとくんでopenELAという団体を作った
  Alma LinuxはAzure上の認定を取得した
Oracle Linux
 有償だけど、無償でも十分使える
 Oracle DB/mysql/opensolaris周りのコレまでの事例を見てると、突然無償の範囲が変わりそうなのがちょっと恐い
 Oracle Cloud(OCI)上で動かしたいのでれば、Oracle Linuxにするべき

CentOS代替話であがってくるけど個人的にはあまりひかれないディストリビューション
SUSE Liberty Linux
 CentOS7に対して2028年6月末まで、CentOS9に対して当分の間、パッチを提供する有償サービス
MIRACLE LINUX
 無償でも使える日本語のRHEL互換。いまはサイバートラスト社で作ってる
 AlmaLinuxの開発にも関与しており、RHEL10の世代ではAlmaLinuxに合流する見込み
  


それはさておき、eLxr projectの方では、コミュニティ版のインストールイメージが入手できるようになっていた。

というわけで、 eLxr Server version 12 for x86 (12.6.1.0) をESXi環境にインストールしてみた

現状のeLxr 12.6.1.0はSecure boot非対応なようで、有効状態の場合は下記のようなエラーとなって起動できない

Secure bootを無効にして、作業を進めていく

ホスト名関連

パスワードとユーザ

パーテーション設定

最後の質問

上記で「続ける」をクリックしたらインストールが開始された。

そして、インストール終了

再起動

ログイン

標準状態ではopenssh-serverがインストールされていないので、sshログインして使うのであれば「apt install openssh-server」を実行してインストールを行う必要がある。

root@elxr:~# uname -a
Linux elxr 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC eLxr Linux 6.1.99-elxr3-1 (2024-09-12) x86_64 GNU/Linux
root@elxr:~#

標準設定でインストールした場合のパーテーション設定

osakanataro@elxr:~$ df -h
ファイルシス   サイズ  使用  残り 使用% マウント位置
udev             2.9G     0  2.9G    0% /dev
tmpfs            593M  712K  592M    1% /run
/dev/sda2         48G  1.8G   44G    4% /
tmpfs            2.9G     0  2.9G    0% /dev/shm
tmpfs            5.0M     0  5.0M    0% /run/lock
/dev/sda1        511M  5.9M  506M    2% /boot/efi
osakanataro@elxr:~$ cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=3014364k,nr_inodes=753591,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=606692k,mode=755,inode64 0 0
/dev/sda2 / ext4 rw,relatime,errors=remount-ro 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13633 0 0
ramfs /run/credentials/systemd-sysctl.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
ramfs /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
ramfs /run/credentials/systemd-tmpfiles-setup-dev.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
/dev/sda1 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 0
ramfs /run/credentials/systemd-tmpfiles-setup.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
osakanataro@elxr:~$

インストール時に一般ユーザを作ったのでsudoコマンドが使えるだろうと思ったのに使えないという・・・

osakanataro@elxr:~$ sudo -i
[sudo] osakanataro のパスワード:
osakanataro は sudoers ファイルにありません。
osakanataro@elxr:~$

とりあえずsuを使うしかない

osakanataro@elxr:~$ su -
パスワード:
root@elxr:~# fdisk -l /dev/sda
Disk /dev/sda: 50 GiB, 53687091200 bytes, 104857600 sectors
Disk model: Virtual disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 65F27FE7-BB4C-46DE-A0B7-46C31D3A2F61

Device         Start       End   Sectors  Size Type
/dev/sda1       2048   1050623   1048576  512M EFI System
/dev/sda2    1050624 102856703 101806080 48.5G Linux filesystem
/dev/sda3  102856704 104855551   1998848  976M Linux swap
root@elxr:~#

レポジトリの確認

root@elxr:~# ls -l /etc/apt
total 28
drwxr-xr-x 2 root root 4096 Oct 31 12:34 apt.conf.d
drwxr-xr-x 2 root root 4096 May 25  2023 auth.conf.d
drwxr-xr-x 2 root root 4096 May 25  2023 keyrings
drwxr-xr-x 2 root root 4096 May 25  2023 preferences.d
-rw-r--r-- 1 root root   57 Oct 31 12:32 sources.list
drwxr-xr-x 2 root root 4096 May 25  2023 sources.list.d
-rw-r--r-- 1 root root    0 Oct 31 12:29 sources.list~
drwxr-xr-x 2 root root 4096 Oct 31 12:32 trusted.gpg.d
root@elxr:~# ls -l /etc/apt/*
-rw-r--r-- 1 root root   57 Oct 31 12:32 /etc/apt/sources.list
-rw-r--r-- 1 root root    0 Oct 31 12:29 /etc/apt/sources.list~

/etc/apt/apt.conf.d:
total 16
-rw-r--r-- 1 root root  82 Oct 31 12:29 00CDMountPoint
-rw-r--r-- 1 root root  40 Oct 31 12:29 00trustcdrom
-rw-r--r-- 1 root root 399 May 25  2023 01autoremove
-rw-r--r-- 1 root root 182 Jan  9  2023 70debconf

/etc/apt/auth.conf.d:
total 0

/etc/apt/keyrings:
total 0

/etc/apt/preferences.d:
total 0

/etc/apt/sources.list.d:
total 0

/etc/apt/trusted.gpg.d:
total 88
-rw-r--r-- 1 root root 11861 Jul 31  2023 debian-archive-bookworm-automatic.asc
-rw-r--r-- 1 root root 11873 Jul 31  2023 debian-archive-bookworm-security-automatic.asc
-rw-r--r-- 1 root root   461 Jul 31  2023 debian-archive-bookworm-stable.asc
-rw-r--r-- 1 root root 11861 Jul 31  2023 debian-archive-bullseye-automatic.asc
-rw-r--r-- 1 root root 11873 Jul 31  2023 debian-archive-bullseye-security-automatic.asc
-rw-r--r-- 1 root root  3403 Jul 31  2023 debian-archive-bullseye-stable.asc
-rw-r--r-- 1 root root 11093 Jul 31  2023 debian-archive-buster-automatic.asc
-rw-r--r-- 1 root root 11105 Jul 31  2023 debian-archive-buster-security-automatic.asc
-rw-r--r-- 1 root root  1704 Jul 31  2023 debian-archive-buster-stable.asc
-r--r--r-- 1 root root  2789 Oct 31 12:32 eLxr-base.asc
lrwxrwxrwx 1 root root    44 Oct 31 12:29 elxr-archive-keyring.gpg -> /usr/share/keyrings/elxr-archive-keyring.gpg
root@elxr:~# cat /etc/apt/sources.list
deb [trusted=yes] https://mirror.elxr.dev/elxr aria main
root@elxr:~#

現状だと elxrのレポジトリが1つだけ登録されているだけで、オリジナルのDebianレポジトリが登録されていたりはしない。

今回、ESXi上にインストールしたけど、open-vm-toolsは・・・インストールされていた

root@elxr:~# dpkg -l|grep open-vm
ii  open-vm-tools                 2:12.2.0-1+deb12u2            amd64        Open VMware Tools for virtual machines hosted on VMware (CLI)
root@elxr:~# systemctl status vmtoolsd
* open-vm-tools.service - Service for virtual machines hosted on VMware
     Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; preset>
     Active: active (running) since Thu 2024-10-31 12:34:57 JST; 34min ago
       Docs: http://open-vm-tools.sourceforge.net/about.php
   Main PID: 592 (vmtoolsd)
      Tasks: 3 (limit: 7064)
     Memory: 11.5M
        CPU: 1.562s
     CGroup: /system.slice/open-vm-tools.service
             `-592 /usr/bin/vmtoolsd

Oct 31 12:34:57 elxr systemd[1]: Started open-vm-tools.service - Service for vi>
root@elxr:~#

デフォルトでインストールしたあと、openssh-server の追加と、パッケージ更新したあとのeLxr 12.6.1.0のパッケージ状態

root@elxr:~# dpkg -l|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                          Version                       Architecture Description
+++-=============================-=============================-============-=============================================================================
ii  acpi-support-base             0.143-5.1                     all          scripts for handling base ACPI events such as the power button
ii  acpid                         1:2.0.33-2+b1                 amd64        Advanced Configuration and Power Interface event daemon
ii  adduser                       3.134                         all          add and remove users and groups
ii  apparmor                      3.0.8-3                       amd64        user-space parser utility for AppArmor
ii  apt                           2.6.1                         amd64        commandline package manager
ii  apt-utils                     2.6.1                         amd64        package management related utility programs
ii  base-files                    12.5elxr6                     amd64        Debian base system miscellaneous files
ii  base-passwd                   3.6.1                         amd64        Debian base system master password and group files
ii  bash                          5.2.15-2+b7                   amd64        GNU Bourne Again SHell
ii  bsdextrautils                 2.38.1-5+deb12u1              amd64        extra utilities from 4.4BSD-Lite
ii  bsdutils                      1:2.38.1-5+deb12u1            amd64        basic utilities from 4.4BSD-Lite
ii  busybox                       1:1.35.0-4+b3elxr1            amd64        Tiny utilities for small and embedded systems
ii  ca-certificates               20230311                      all          Common CA certificates
ii  console-setup                 1.221                         all          console font and keymap setup program
ii  console-setup-linux           1.221                         all          Linux specific part of console-setup
ii  coreutils                     9.1-1                         amd64        GNU core utilities
ii  cpio                          2.13+dfsg-7.1                 amd64        GNU cpio -- a program to manage archives of files
ii  cron                          3.0pl1-162                    amd64        process scheduling daemon
ii  cron-daemon-common            3.0pl1-162                    all          process scheduling daemon's configuration files
ii  dash                          0.5.12-2                      amd64        POSIX-compliant shell
ii  dbus                          1.14.10-1~deb12u1             amd64        simple interprocess messaging system (system message bus)
ii  dbus-bin                      1.14.10-1~deb12u1             amd64        simple interprocess messaging system (command line utilities)
ii  dbus-daemon                   1.14.10-1~deb12u1             amd64        simple interprocess messaging system (reference message bus)
ii  dbus-session-bus-common       1.14.10-1~deb12u1             all          simple interprocess messaging system (session bus configuration)
ii  dbus-system-bus-common        1.14.10-1~deb12u1             all          simple interprocess messaging system (system bus configuration)
ii  dbus-user-session             1.14.10-1~deb12u1             amd64        simple interprocess messaging system (systemd --user integration)
ii  debconf                       1.5.82                        all          Debian configuration management system
ii  debconf-i18n                  1.5.82                        all          full internationalization support for debconf
ii  debian-archive-keyring        2023.3+deb12u1                all          GnuPG archive keys of the Debian archive
ii  debianutils                   5.7-0.5~deb12u1               amd64        Miscellaneous utilities specific to Debian
ii  diffutils                     1:3.8-4                       amd64        File comparison utilities
ii  discover                      2.1.2-10                      amd64        hardware identification system
ii  discover-data                 2.2013.01.13                  all          Data lists for Discover hardware detection system
ii  dmidecode                     3.4-1                         amd64        SMBIOS/DMI table decoder
ii  dmsetup                       2:1.02.185-2                  amd64        Linux Kernel Device Mapper userspace library
ii  dpkg                          1.21.22                       amd64        Debian package management system
ii  e2fsprogs                     1.47.0-2                      amd64        ext2/ext3/ext4 file system utilities
ii  efibootmgr                    17-2                          amd64        Interact with the EFI Boot Manager
ii  eject                         2.38.1-5+deb12u1              amd64        ejects CDs and operates CD-Changers under Linux
ii  elxr-archive-keyring          2024.1-2                      all          GnuPG archive keys of the eLxr archive
ii  ethtool                       1:6.1-1                       amd64        display or change Ethernet device settings
ii  fdisk                         2.38.1-5+deb12u1              amd64        collection of partitioning utilities
ii  findutils                     4.9.0-4                       amd64        utilities for finding files--find, xargs
ii  firmware-linux-free           20200122-1                    all          Binary firmware for various drivers in the Linux kernel
ii  firstboot-config              1.0                           amd64        Firstboot configure Service
ii  fuse3                         3.14.0-4                      amd64        Filesystem in Userspace (3.x version)
ii  gcc-12-base:amd64             12.2.0-14                     amd64        GCC, the GNU Compiler Collection (base package)
ii  gettext-base                  0.21-12                       amd64        GNU Internationalization utilities for the base system
ii  gpgv                          2.2.40-1.1                    amd64        GNU privacy guard - signature verification tool
ii  grep                          3.8-5                         amd64        GNU grep, egrep and fgrep
ii  groff-base                    1.22.4-10                     amd64        GNU troff text-formatting system (base system components)
ii  grub-common                   2.06-14elxr5                  amd64        GRand Unified Bootloader (common files)
ii  grub-efi-amd64                2.06-14elxr5                  amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 version)
ii  grub-efi-amd64-bin            2.06-14elxr5                  amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 modules)
ii  grub-efi-amd64-signed         1+2.06+13+deb12u1             amd64        GRand Unified Bootloader, version 2 (amd64 UEFI signed by Debian)
ii  grub2-common                  2.06-14elxr5                  amd64        GRand Unified Bootloader (common files for version 2)
ii  gzip                          1.12-1                        amd64        GNU compression utilities
ii  hostname                      3.23+nmu1                     amd64        utility to set/show the host name or domain name
ii  ifupdown                      0.8.41                        amd64        high level tools to configure network interfaces
ii  info                          6.8-6+b1                      amd64        Standalone GNU Info documentation browser
ii  init                          1.65.2                        amd64        metapackage ensuring an init system is installed
ii  init-system-helpers           1.65.2                        all          helper tools for all init systems
ii  initramfs-tools               0.142+deb12u1                 all          generic modular initramfs generator (automation)
ii  initramfs-tools-core          0.142+deb12u1                 all          generic modular initramfs generator (core tools)
ii  install-info                  6.8-6+b1                      amd64        Manage installed documentation in info format
ii  installation-report           2.89                          all          system installation report
ii  iproute2                      6.1.0-3                       amd64        networking and traffic control tools
ii  iputils-ping                  3:20221126-1                  amd64        Tools to test the reachability of network hosts
ii  isc-dhcp-client               4.4.3-P1-2                    amd64        DHCP client for automatically obtaining an IP address
ii  isc-dhcp-common               4.4.3-P1-2                    amd64        common manpages relevant to all of the isc-dhcp packages
ii  kbd                           2.5.1-1+b1                    amd64        Linux console font and keytable utilities
ii  keyboard-configuration        1.221                         all          system-wide keyboard preferences
ii  klibc-utils                   2.0.12-1                      amd64        small utilities built with klibc for early boot
ii  kmod                          30+20221128-1                 amd64        tools for managing Linux kernel modules
ii  laptop-detect                 0.16                          all          system chassis type checker
ii  less                          590-2.1~deb12u2               amd64        pager program similar to more
ii  libacl1:amd64                 2.3.1-3                       amd64        access control list - shared library
ii  libapparmor1:amd64            3.0.8-3                       amd64        changehat AppArmor library
ii  libapt-pkg6.0:amd64           2.6.1                         amd64        package management runtime library
ii  libargon2-1:amd64             0~20171227-0.3+deb12u1        amd64        memory-hard hashing function - runtime library
ii  libattr1:amd64                1:2.5.1-4                     amd64        extended attribute handling - shared library
ii  libaudit-common               1:3.0.9-1                     all          Dynamic library for security auditing - common files
ii  libaudit1:amd64               1:3.0.9-1                     amd64        Dynamic library for security auditing
ii  libblkid1:amd64               2.38.1-5+deb12u1              amd64        block device ID library
ii  libbpf1:amd64                 1:1.1.0-1                     amd64        eBPF helper library (shared library)
ii  libbrotli1:amd64              1.0.9-2+b6                    amd64        library implementing brotli encoder and decoder (shared libraries)
ii  libbsd0:amd64                 0.11.7-2                      amd64        utility functions from BSD systems - shared library
ii  libbz2-1.0:amd64              1.0.8-5+b1                    amd64        high-quality block-sorting file compressor library - runtime
ii  libc-bin                      2.36-9+deb12u8                amd64        GNU C Library: Binaries
ii  libc-l10n                     2.36-9+deb12u8                all          GNU C Library: localization files
ii  libc6:amd64                   2.36-9+deb12u8                amd64        GNU C Library: Shared libraries
ii  libcap-ng0:amd64              0.8.3-1+b3                    amd64        alternate POSIX capabilities library
ii  libcap2:amd64                 1:2.66-4                      amd64        POSIX 1003.1e capabilities (library)
ii  libcap2-bin                   1:2.66-4                      amd64        POSIX 1003.1e capabilities (utilities)
ii  libcbor0.8:amd64              0.8.0-2+b1                    amd64        library for parsing and generating CBOR (RFC 7049)
ii  libcom-err2:amd64             1.47.0-2                      amd64        common error description library
ii  libcrypt1:amd64               1:4.4.33-2                    amd64        libcrypt shared library
ii  libcryptsetup12:amd64         2:2.6.1-4~deb12u2             amd64        disk encryption support - shared library
ii  libdb5.3:amd64                5.3.28+dfsg2-1                amd64        Berkeley v5.3 Database Libraries [runtime]
ii  libdbus-1-3:amd64             1.14.10-1~deb12u1             amd64        simple interprocess messaging system (library)
ii  libdebconfclient0:amd64       0.270                         amd64        Debian Configuration Management System (C-implementation library)
ii  libdevmapper1.02.1:amd64      2:1.02.185-2                  amd64        Linux Kernel Device Mapper userspace library
ii  libdiscover2                  2.1.2-10                      amd64        hardware identification library
ii  libdrm-common                 2.4.114-1                     all          Userspace interface to kernel DRM services -- common files
ii  libdrm2:amd64                 2.4.114-1+b1                  amd64        Userspace interface to kernel DRM services -- runtime
ii  libedit2:amd64                3.1-20221030-2                amd64        BSD editline and history libraries
ii  libefiboot1:amd64             37-6                          amd64        Library to manage UEFI variables
ii  libefivar1:amd64              37-6                          amd64        Library to manage UEFI variables
ii  libelf1:amd64                 0.188-2.1                     amd64        library to read and write ELF files
ii  libexpat1:amd64               2.5.0-1+deb12u1               amd64        XML parsing C library - runtime library
ii  libext2fs2:amd64              1.47.0-2                      amd64        ext2/ext3/ext4 file system libraries
ii  libfdisk1:amd64               2.38.1-5+deb12u1              amd64        fdisk partitioning library
ii  libffi8:amd64                 3.4.4-1                       amd64        Foreign Function Interface library runtime
ii  libfido2-1:amd64              1.12.0-2+b1                   amd64        library for generating and verifying FIDO 2.0 objects
ii  libfreetype6:amd64            2.12.1+dfsg-5+deb12u3         amd64        FreeType 2 font engine, shared library files
ii  libfuse2:amd64                2.9.9-6+b1                    amd64        Filesystem in Userspace (library)
ii  libfuse3-3:amd64              3.14.0-4                      amd64        Filesystem in Userspace (library) (3.x version)
ii  libgcc-s1:amd64               12.2.0-14                     amd64        GCC support library
ii  libgcrypt20:amd64             1.10.1-3                      amd64        LGPL Crypto library - runtime library
ii  libgdbm6:amd64                1.23-3                        amd64        GNU dbm database routines (runtime version)
ii  libglib2.0-0:amd64            2.74.6-2+deb12u3              amd64        GLib library of C routines
ii  libglib2.0-data               2.74.6-2+deb12u3              all          Common files for GLib library
ii  libgmp10:amd64                2:6.2.1+dfsg1-1.1             amd64        Multiprecision arithmetic library
ii  libgnutls30:amd64             3.7.9-2+deb12u3               amd64        GNU TLS library - main runtime library
ii  libgpg-error0:amd64           1.46-1                        amd64        GnuPG development runtime library
ii  libgssapi-krb5-2:amd64        1.20.1-2+deb12u2              amd64        MIT Kerberos runtime libraries - krb5 GSS-API Mechanism
ii  libhogweed6:amd64             3.8.1-2                       amd64        low level cryptographic library (public-key cryptos)
ii  libicu72:amd64                72.1-3                        amd64        International Components for Unicode
ii  libidn2-0:amd64               2.3.3-1+b1                    amd64        Internationalized domain names (IDNA2008/TR46) library
ii  libip4tc2:amd64               1.8.9-2                       amd64        netfilter libip4tc library
ii  libjansson4:amd64             2.14-2                        amd64        C library for encoding, decoding and manipulating JSON data
ii  libjson-c5:amd64              0.16-2                        amd64        JSON manipulation library - shared library
ii  libk5crypto3:amd64            1.20.1-2+deb12u2              amd64        MIT Kerberos runtime libraries - Crypto Library
ii  libkeyutils1:amd64            1.6.3-2                       amd64        Linux Key Management Utilities (library)
ii  libklibc:amd64                2.0.12-1                      amd64        minimal libc subset for use with initramfs
ii  libkmod2:amd64                30+20221128-1                 amd64        libkmod shared library
ii  libkrb5-3:amd64               1.20.1-2+deb12u2              amd64        MIT Kerberos runtime libraries
ii  libkrb5support0:amd64         1.20.1-2+deb12u2              amd64        MIT Kerberos runtime libraries - Support library
ii  liblocale-gettext-perl        1.07-5                        amd64        module using libc functions for internationalization in Perl
ii  liblz4-1:amd64                1.9.4-1                       amd64        Fast LZ compression algorithm library - runtime
ii  liblzma5:amd64                5.4.1-0.2                     amd64        XZ-format compression library
ii  libmd0:amd64                  1.0.4-2                       amd64        message digest functions from BSD systems - shared library
ii  libmnl0:amd64                 1.0.4-3                       amd64        minimalistic Netlink communication library
ii  libmount1:amd64               2.38.1-5+deb12u1              amd64        device mounting library
ii  libmspack0:amd64              0.11-1                        amd64        library for Microsoft compression formats (shared library)
ii  libncursesw6:amd64            6.4-4                         amd64        shared libraries for terminal handling (wide character support)
ii  libnettle8:amd64              3.8.1-2                       amd64        low level cryptographic library (symmetric and one-way cryptos)
ii  libnftables1:amd64            1.0.6-2+deb12u2               amd64        Netfilter nftables high level userspace API library
ii  libnftnl11:amd64              1.2.4-2                       amd64        Netfilter nftables userspace API library
ii  libnsl2:amd64                 1.3.0-2                       amd64        Public client interface for NIS(YP) and NIS+
ii  libp11-kit0:amd64             0.24.1-2                      amd64        library for loading and coordinating access to PKCS#11 modules - runtime
ii  libpam-modules:amd64          1.5.2-6+deb12u1               amd64        Pluggable Authentication Modules for PAM
ii  libpam-modules-bin            1.5.2-6+deb12u1               amd64        Pluggable Authentication Modules for PAM - helper binaries
ii  libpam-runtime                1.5.2-6+deb12u1               all          Runtime support for the PAM library
ii  libpam-systemd:amd64          252.30-1~deb12u2              amd64        system and service manager - PAM module
ii  libpam0g:amd64                1.5.2-6+deb12u1               amd64        Pluggable Authentication Modules library
ii  libpci3:amd64                 1:3.9.0-4                     amd64        PCI utilities (shared library)
ii  libpcre2-8-0:amd64            10.42-1                       amd64        New Perl Compatible Regular Expression Library- 8 bit runtime files
ii  libpipeline1:amd64            1.5.7-1                       amd64        Unix process pipeline manipulation library
ii  libpng16-16:amd64             1.6.39-2                      amd64        PNG library - runtime (version 1.6)
ii  libpopt0:amd64                1.19+dfsg-1                   amd64        lib for parsing cmdline parameters
ii  libproc2-0:amd64              2:4.0.2-3                     amd64        library for accessing process information from /proc
ii  libreadline8:amd64            8.2-1.3                       amd64        GNU readline and history libraries, run-time libraries
ii  libseccomp2:amd64             2.5.4-1+deb12u1               amd64        high level interface to Linux seccomp filter
ii  libselinux1:amd64             3.4-1+b6                      amd64        SELinux runtime shared libraries
ii  libsemanage-common            3.4-1                         all          Common files for SELinux policy management libraries
ii  libsemanage2:amd64            3.4-1+b5                      amd64        SELinux policy management library
ii  libsepol2:amd64               3.4-2.1                       amd64        SELinux library for manipulating binary security policies
ii  libsmartcols1:amd64           2.38.1-5+deb12u1              amd64        smart column output alignment library
ii  libss2:amd64                  1.47.0-2                      amd64        command-line interface parsing library
ii  libssl3:amd64                 3.0.14-1~deb12u2              amd64        Secure Sockets Layer toolkit - shared libraries
ii  libstdc++6:amd64              12.2.0-14                     amd64        GNU Standard C++ Library v3
ii  libsystemd-shared:amd64       252.30-1~deb12u2              amd64        systemd shared private library
ii  libsystemd0:amd64             252.30-1~deb12u2              amd64        systemd utility library
ii  libtasn1-6:amd64              4.19.0-2                      amd64        Manage ASN.1 structures (runtime)
ii  libtext-charwidth-perl:amd64  0.04-11                       amd64        get display widths of characters on the terminal
ii  libtext-iconv-perl:amd64      1.7-8                         amd64        module to convert between character sets in Perl
ii  libtext-wrapi18n-perl         0.06-10                       all          internationalized substitute of Text::Wrap
ii  libtinfo6:amd64               6.4-4                         amd64        shared low-level terminfo library for terminal handling
ii  libtirpc-common               1.3.3+ds-1                    all          transport-independent RPC library - common files
ii  libtirpc3:amd64               1.3.3+ds-1                    amd64        transport-independent RPC library
ii  libuchardet0:amd64            0.0.7-1                       amd64        universal charset detection library - shared library
ii  libudev1:amd64                252.30-1~deb12u2              amd64        libudev shared library
ii  libunistring2:amd64           1.0-2                         amd64        Unicode string library for C
ii  libusb-1.0-0:amd64            2:1.0.26-1                    amd64        userspace USB programming library
ii  libuuid1:amd64                2.38.1-5+deb12u1              amd64        Universally Unique ID library
ii  libwrap0:amd64                7.6.q-32                      amd64        Wietse Venema's TCP wrappers library
ii  libx11-6:amd64                2:1.8.4-2+deb12u2             amd64        X11 client-side library
ii  libx11-data                   2:1.8.4-2+deb12u2             all          X11 client-side library
ii  libxau6:amd64                 1:1.0.9-1                     amd64        X11 authorisation library
ii  libxcb1:amd64                 1.15-1                        amd64        X C Binding
ii  libxdmcp6:amd64               1:1.1.2-3                     amd64        X11 Display Manager Control Protocol library
ii  libxext6:amd64                2:1.3.4-1+b1                  amd64        X11 miscellaneous extension library
ii  libxml2:amd64                 2.9.14+dfsg-1.3~deb12u1       amd64        GNOME XML library
ii  libxmlsec1:amd64              1.2.37-2                      amd64        XML security library
ii  libxmlsec1-openssl:amd64      1.2.37-2                      amd64        Openssl engine for the XML security library
ii  libxmuu1:amd64                2:1.1.3-3                     amd64        X11 miscellaneous micro-utility library
ii  libxslt1.1:amd64              1.1.35-1                      amd64        XSLT 1.0 processing library - runtime library
ii  libxtables12:amd64            1.8.9-2                       amd64        netfilter xtables library
ii  libxxhash0:amd64              0.8.1-1                       amd64        shared library for xxhash
ii  libzstd1:amd64                1.5.4+dfsg2-5                 amd64        fast lossless compression algorithm
ii  linux-base                    4.9                           all          Linux image base package
ii  linux-image-6.1.0-23-amd64    6.1.99-elxr3-1                amd64        Linux 6.1 for 64-bit PCs (signed)
ii  linux-image-6.1.0-26-amd64    6.1.112-elxr2-1               amd64        Linux 6.1 for 64-bit PCs (signed)
ii  linux-image-6.1.0-26-rt-amd64 6.1.112-elxr2-1               amd64        Linux 6.1 for 64-bit PCs, PREEMPT_RT (signed)
ii  linux-image-amd64             6.1.112-elxr2-1               amd64        Linux for 64-bit PCs (meta-package)
ii  linux-image-rt-amd64          6.1.112-elxr2-1               amd64        Linux for 64-bit PCs (meta-package)
ii  locales                       2.36-9+deb12u8                all          GNU C Library: National Language (locale) data [support]
ii  login                         1:4.13+dfsg1-1+b1             amd64        system login tools
ii  logrotate                     3.21.0-1                      amd64        Log rotation utility
ii  logsave                       1.47.0-2                      amd64        save the output of a command in a log file
ii  lsb-release                   12.0-1                        all          Linux Standard Base version reporting utility (minimal implementation)
ii  man-db                        2.11.2-2                      amd64        tools for reading manual pages
ii  mawk                          1.3.4.20200120-3.1            amd64        Pattern scanning and text processing language
ii  mokutil                       0.6.0-2                       amd64        tools for manipulating machine owner keys
ii  mount                         2.38.1-5+deb12u1              amd64        tools for mounting and manipulating filesystems
ii  nano                          7.2-1+deb12u1                 amd64        small, friendly text editor inspired by Pico
ii  ncurses-base                  6.4-4                         all          basic terminal type definitions
ii  ncurses-bin                   6.4-4                         amd64        terminal-related programs and man pages
ii  ncurses-term                  6.4-4                         all          additional terminal type definitions
ii  net-tools                     2.10-0.1                      amd64        NET-3 networking toolkit
ii  netbase                       6.4                           all          Basic TCP/IP networking system
ii  netcat-openbsd                1.219-1                       amd64        TCP/IP swiss army knife
ii  nftables                      1.0.6-2+deb12u2               amd64        Program to control packet filtering rules by Netfilter project
ii  open-vm-tools                 2:12.2.0-1+deb12u2            amd64        Open VMware Tools for virtual machines hosted on VMware (CLI)
ii  openssh-client                1:9.2p1-2+deb12u3             amd64        secure shell (SSH) client, for secure access to remote machines
ii  openssh-server                1:9.2p1-2+deb12u3             amd64        secure shell (SSH) server, for secure access from remote machines
ii  openssh-sftp-server           1:9.2p1-2+deb12u3             amd64        secure shell (SSH) sftp server module, for SFTP access from remote machines
ii  openssl                       3.0.14-1~deb12u2              amd64        Secure Sockets Layer toolkit - cryptographic utility
ii  os-prober                     1.81                          amd64        utility to detect other OSes on a set of drives
ii  passwd                        1:4.13+dfsg1-1+b1             amd64        change and administer password and group data
ii  pci.ids                       0.0~2023.04.11-1              all          PCI ID Repository
ii  pciutils                      1:3.9.0-4                     amd64        PCI utilities
ii  perl-base                     5.36.0-7+deb12u1              amd64        minimal Perl system
ii  procps                        2:4.0.2-3                     amd64        /proc file system utilities
ii  qat2.0.l                      1.1.40-00018elxr2b3           amd64        Intel(r) QuickAssist Technology API
ii  qat2.0.l-common               1.1.40-00018elxr2b3           amd64        This package provides the common files of Intel(r) QuickAssist Technology API
ii  qatengine                     1.5.0-1elxr1                  amd64        Intel QuickAssist Technology (QAT) OpenSSL Engine
ii  qatzip                        1.1.2-1elxr2                  amd64        Compression Library accelerated by Intel QuickAssist Technology
ii  readline-common               8.2-1.3                       all          GNU readline and history libraries, common files
ii  runit-helper                  2.15.2                        all          dh-runit implementation detail
ii  sed                           4.9-1                         amd64        GNU stream editor for filtering/transforming text
ii  sensible-utils                0.0.17+nmu1                   all          Utilities for sensible alternative selection
ii  shared-mime-info              2.2-1                         amd64        FreeDesktop.org shared MIME database and spec
ii  shim-helpers-amd64-signed     1+15.8+1~deb12u1              amd64        boot loader to chain-load signed boot loaders (signed by Debian)
ii  shim-signed:amd64             1.44~1+deb12u1+15.8-1~deb12u1 amd64        Secure Boot chain-loading bootloader (Microsoft-signed binary)
ii  shim-signed-common            1.44~1+deb12u1+15.8-1~deb12u1 all          Secure Boot chain-loading bootloader (common helper scripts)
ii  shim-unsigned:amd64           15.8-1~deb12u1                amd64        boot loader to chain-load signed boot loaders under Secure Boot
ii  sudo                          1.9.13p3-1+deb12u1            amd64        Provide limited super user privileges to specific users
ii  systemd                       252.30-1~deb12u2              amd64        system and service manager
ii  systemd-sysv                  252.30-1~deb12u2              amd64        system and service manager - SysV compatibility symlinks
ii  sysvinit-utils                3.06-4                        amd64        System-V-like utilities
ii  tar                           1.34+dfsg-1.2+deb12u1         amd64        GNU version of the tar archiving utility
ii  traceroute                    1:2.1.2-1                     amd64        Traces the route taken by packets over an IPv4/IPv6 network
ii  tzdata                        2024a-0+deb12u1               all          time zone and daylight-saving time data
ii  ucf                           3.0043+nmu1                   all          Update Configuration File(s): preserve user changes to config files
ii  udev                          252.30-1~deb12u2              amd64        /dev/ and hotplug management daemon
ii  usbutils                      1:014-1+deb12u1               amd64        Linux USB utilities
ii  usr-is-merged                 37~deb12u1                    all          Transitional package to assert a merged-/usr system
ii  util-linux                    2.38.1-5+deb12u1              amd64        miscellaneous system utilities
ii  util-linux-extra              2.38.1-5+deb12u1              amd64        interactive login tools
ii  vim-common                    2:9.0.1378-2                  all          Vi IMproved - Common files
ii  vim-tiny                      2:9.0.1378-2                  amd64        Vi IMproved - enhanced vi editor - compact version
ii  xauth                         1:1.1.2-1                     amd64        X authentication utility
ii  xdg-user-dirs                 0.18-1                        amd64        tool to manage well known user directories
ii  xkb-data                      2.35.1-1                      all          X Keyboard Extension (XKB) configuration data
ii  zerofree                      1.1.1-1                       amd64        zero free blocks from ext2, ext3 and ext4 file-systems
ii  zlib1g:amd64                  1:1.2.13.dfsg-1elxr1          amd64        compression library - runtime
ii  zstd                          1.5.4+dfsg2-5                 amd64        fast lossless compression algorithm -- CLI tool
root@elxr:~#

サポート必要とするDebianがほしいならeLxrを選んでも良い、という気はするが・・・

LLM環境とかGPU関連で使うのであれば、Ubuntuで用意されている追加のレポジトリが前提になってたりもするので、そこら辺がないディストリビューションを使う意義が感じにくいなぁ、と


proxmoxの外部corosyncサーバに使えるかな?と思ったんだけど、レポジトリに corosync-qdeviceがありませんでした

osakanataro@elxr:~$ apt search corosync-
ソート中... 完了
全文検索... 完了
corosync-doc/aria 3.1.7-1 all
  cluster engine HTML documentation

corosync-notifyd/aria 3.1.7-1 amd64
  cluster engine notification daemon

corosync-vqsim/aria 3.1.7-1 amd64
  cluster engine votequorum simulator

libcorosync-common-dev/aria 3.1.7-1 amd64
  cluster engine common development

libcorosync-common4/aria 3.1.7-1 amd64
  cluster engine common library

resource-agents/aria 1:4.12.0-2 amd64
  Cluster Resource Agents

osakanataro@elxr:~$


2024/11/05追記

ドキュメントサイトを見直したら「eLxr 12 Quick Start – Adding and Updating Packages」に /etc/apt/sources.list に対してdebianのレポジトリ追加する、

root@elxr:/etc/apt# cat sources.list
deb [trusted=yes] https://mirror.elxr.dev/elxr aria main
root@elxr:/etc/apt# vi sources.list
root@elxr:/etc/apt# cat sources.list
deb [trusted=yes] https://mirror.elxr.dev/elxr aria main
deb https://deb.debian.org/debian bookworm contrib main
root@elxr:/etc/apt#

追加したあとで、先ほどなかったcorosync-qdeviceを探してみるとdebianのほう起因で出てきた。

root@elxr:/etc/apt# apt update
Get:1 https://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 https://deb.debian.org/debian bookworm/contrib amd64 Packages [54.1 kB]
Hit:3 https://mirror.elxr.dev/elxr aria InRelease
Get:4 https://deb.debian.org/debian bookworm/contrib Translation-en [48.8 kB]
Get:5 https://deb.debian.org/debian bookworm/main amd64 Packages [8787 kB]
Get:6 https://deb.debian.org/debian bookworm/main Translation-en [6109 kB]
Fetched 15.2 MB in 1min 52s (135 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
root@elxr:/etc/apt# apt search corosync-
Sorting... Done
Full Text Search... Done
corosync-doc/aria,stable 3.1.7-1 all
  cluster engine HTML documentation

corosync-notifyd/aria,stable 3.1.7-1 amd64
  cluster engine notification daemon

corosync-qdevice/stable 3.0.3-1 amd64
  cluster engine quorum device daemon

corosync-qnetd/stable 3.0.3-1 amd64
  cluster engine quorum device network daemon

corosync-vqsim/aria,stable 3.1.7-1 amd64
  cluster engine votequorum simulator

libcorosync-common-dev/aria,stable 3.1.7-1 amd64
  cluster engine common development

libcorosync-common4/aria,stable 3.1.7-1 amd64
  cluster engine common library

resource-agents/aria,stable 1:4.12.0-2 amd64
  Cluster Resource Agents

root@elxr:/etc/apt#

proxmox 8.1.4で適当にcephストレージ作ったらWARNINGが出た件の対処


proxmox 8.1.4でサーバを3台インストールして、cpehストレージ作ってみるかー、と適当に設定した。

基本は「Deploy Hyper-Converged Ceph Cluster」を見ながらやったんだけど、CephFSを作成するときに、手順だと「pveceph fs create –pg_num 128 –add-storage」と書いてあったんだけど、「pveceph fs create」だけで実行したらどうなるんだろ?と思ってやってみたところ、警告が出た

注:なお、結局のところ、指定しない場合のデフォルト値も128だった。

HEALTH_WARN: 1 pools have too many placement groups
Pool storagepool has 128 placement groups, should have 32

調べてみるとproxmoxのフォーラムに「CEPH pools have too many placement groups」という若干古め(2020年のpoxmox 6.3時代)のものが見つかった。

「pveceph pool ls」で現在の設定を確認

root@zstack137:~# ceph -v
ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
root@zstack137:~# pveversion
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-8-pve)
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.08950029648258e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x 4.41906962578287e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x    128 x             x             32 x warn              x                          x                           x replicated_rule x   0.0184257291257381 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~#

「ceph osd pool autoscale-status」

root@zstack137:~# ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr             452.0k                3.0        449.9G  0.0000                                  1.0       1              on         False
cephfs_data          0                 3.0        449.9G  0.0000                                  1.0      32              on         False
cephfs_metadata  66203                 3.0        449.9G  0.0000                                  4.0      32              on         False
storagepool       2681M                3.0        449.9G  0.0175                                  1.0     128              warn       False
root@zstack137:~#

そういえば、むかし、cephをテスト構築した時もなんかあったな、と思い出して確認してみると2018年に「CephのOSD毎のPlacement Groupの数を確認する」というメモを残していた。

「ceph health」を実行してみると状況は違うようだった。

root@zstack137:~# ceph health
HEALTH_WARN 1 pools have too many placement groups

root@zstack137:~# ceph health detail
HEALTH_WARN 1 pools have too many placement groups
[WRN] POOL_TOO_MANY_PGS: 1 pools have too many placement groups
    Pool storagepool has 128 placement groups, should have 32
root@zstack137:~#
root@zstack137:~# ceph -s
  cluster:
    id:     9e085d6a-77f3-41f1-8f6d-71fadc9c011b
    health: HEALTH_WARN
            1 pools have too many placement groups

  services:
    mon: 3 daemons, quorum zstack136,zstack135,zstack137 (age 3h)
    mgr: zstack136(active, since 3h), standbys: zstack135
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 3h), 9 in (since 3d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 716 objects, 2.7 GiB
    usage:   8.3 GiB used, 442 GiB / 450 GiB avail
    pgs:     193 active+clean

root@zstack137:~#

とはいえ、「ceph pg dump」の出力結果を整形して表示する下記コマンドが実行できるか確認してみる。

ceph pg dump | awk '
BEGIN { IGNORECASE = 1 }
 /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
 /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
 up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
 for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
}
END {
 printf("\n");
 printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
 for (i in poollist) printf("--------"); printf("----------------\n");
 for (i in osdlist) { printf("osd.%i\t", i); sum=0;
   for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
 for (i in poollist) printf("--------"); printf("----------------\n");
 printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]); printf("|\n");
}'

無事実行できた。

root@zstack137:~# ceph pg dump | awk '
BEGIN { IGNORECASE = 1 }
 /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
 /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
 up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
 for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
}
END {
 printf("\n");
 printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
 for (i in poollist) printf("--------"); printf("----------------\n");
 for (i in osdlist) { printf("osd.%i\t", i); sum=0;
   for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
 for (i in poollist) printf("--------"); printf("----------------\n");
 printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]); printf("|\n");
}'
dumped all

pool :  3       2       1       4       | SUM
------------------------------------------------
osd.3   4       5       1       13      | 23
osd.8   4       6       0       12      | 22
osd.6   2       4       0       15      | 21
osd.5   6       4       0       16      | 26
osd.2   3       3       0       15      | 21
osd.1   4       3       0       10      | 17
osd.4   1       1       0       16      | 18
osd.0   5       2       0       10      | 17
osd.7   3       4       0       21      | 28
------------------------------------------------
SUM :   32      32      1       128     |
root@zstack137:~#

poolによって差がありすぎている?

中国語のページで「ceph使用问题积累」というところがあって「HEALTH_WARN:pools have too many placement groups」と「HEALTH_WARN: mons are allowing insecure global_id reclaim」についての対処方法が載っている。

前者については↑で出てきたproxmoxフォーラム記事を参照元として「ceph mgr module disable pg_autoscaler」を実行してauto scale機能を無効化する、とある

後者については「ceph config set mon auth_allow_insecure_global_id_reclaim false」となっていた。

module設定変える前に「ceph mgr module ls」で状態確認

root@zstack137:~# ceph mgr module ls
MODULE
balancer           on (always on)
crash              on (always on)
devicehealth       on (always on)
orchestrator       on (always on)
pg_autoscaler      on (always on)
progress           on (always on)
rbd_support        on (always on)
status             on (always on)
telemetry          on (always on)
volumes            on (always on)
iostat             on
nfs                on
restful            on
alerts             -
influx             -
insights           -
localpool          -
mirroring          -
osd_perf_query     -
osd_support        -
prometheus         -
selftest           -
snap_schedule      -
stats              -
telegraf           -
test_orchestrator  -
zabbix             -
root@zstack137:~#

SUSEのページにあるSUSE Enterprise Storage 7 DocumentationのAdministration and Operations Guide「12 Determine the cluster state」を見るといろいろな状態確認コマンドがあった。

root@zstack137:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85
TOTAL  450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85

--- POOLS ---
POOL             ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr              1    1  449 KiB        2  1.3 MiB      0    140 GiB
cephfs_data       2   32      0 B        0      0 B      0    140 GiB
cephfs_metadata   3   32   35 KiB       22  194 KiB      0    140 GiB
storagepool       4  128  2.6 GiB      692  7.9 GiB   1.84    140 GiB
root@zstack137:~#  ceph df detail
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85
TOTAL  450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85

--- POOLS ---
POOL             ID  PGS   STORED   (DATA)   (OMAP)  OBJECTS     USED   (DATA)   (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
.mgr              1    1  449 KiB  449 KiB      0 B        2  1.3 MiB  1.3 MiB      0 B      0    140 GiB            N/A          N/A    N/A         0 B          0 B
cephfs_data       2   32      0 B      0 B      0 B        0      0 B      0 B      0 B      0    140 GiB            N/A          N/A    N/A         0 B          0 B
cephfs_metadata   3   32   35 KiB   18 KiB   17 KiB       22  194 KiB  144 KiB   50 KiB      0    140 GiB            N/A          N/A    N/A         0 B          0 B
storagepool       4  128  2.6 GiB  2.6 GiB  3.0 KiB      692  7.9 GiB  7.9 GiB  9.1 KiB   1.84    140 GiB            N/A          N/A    N/A         0 B          0 B
root@zstack137:~# 

TOO_MANY_PGSの時の対処としていかが書かれている

TOO_MANY_PGS
The number of PGs in use is above the configurable threshold of mon_pg_warn_max_per_osd PGs per OSD. This can lead to higher memory usage for OSD daemons, slower peering after cluster state changes (for example OSD restarts, additions, or removals), and higher load on the Ceph Managers and Ceph Monitors.

While the pg_num value for existing pools cannot be reduced, the pgp_num value can. This effectively co-locates some PGs on the same sets of OSDs, mitigating some of the negative impacts described above. The pgp_num value can be adjusted with:

proxmox「Deploy Hyper-Converged Ceph Cluster」のあたりをみると PG Autoscale Modeはwarnで設定されるのが標準であるようだ。

cephのautomated scalingを見ると「ceph config set global mon_target_pg_per_osd 100」で値を設定することが書かれているが、現在値の確認方法が書いてない。

ceph config get <who> <key>というのはわかったのだが、whoの部分がなんなのかがわからなかった。(globalではなかった)

「ceph config dump」を実行したところ、いま標準値から変更されているところであろう設定が出てきて、whoに該当するものとしてmonがあった。であればmon_target_pg_per_osdのwhoはmonだろうと試すと現在値らしきものが確認できた。

root@zstack137:~# ceph config dump
WHO  MASK  LEVEL     OPTION                                 VALUE  RO
mon        advanced  auth_allow_insecure_global_id_reclaim  false
root@zstack137:~# ceph config get mon  mon_target_pg_per_osd
100
root@zstack137:~#

とりあえず、「ceph mgr module disable pg_autoscaler」を実行してみたのだが、変更不可だった

root@zstack137:~# ceph mgr module disable pg_autoscaler
Error EINVAL: module 'pg_autoscaler' cannot be disabled (always-on)
root@zstack137:~#

じゃあ、「ceph osd pool set storagepool pgp_num 32」を実行してpgp_numを128から32に変更してみる

root@zstack137:~# ceph osd pool stats
pool .mgr id 1
  nothing is going on

pool cephfs_data id 2
  nothing is going on

pool cephfs_metadata id 3
  nothing is going on

pool storagepool id 4
  nothing is going on

root@zstack137:~# ceph osd pool get storagepool pgp_num
pgp_num: 128
root@zstack137:~# ceph osd pool set storagepool pgp_num 32
set pool 4 pgp_num to 32
root@zstack137:~# ceph osd pool get storagepool pgp_num
pgp_num: 125
root@zstack137:~# ceph osd pool get storagepool pgp_num
pgp_num: 119
root@zstack137:~#

徐々に変更されていく模様

root@zstack137:~# ceph -s
  cluster:
    id:     9e085d6a-77f3-41f1-8f6d-71fadc9c011b
    health: HEALTH_WARN
            Reduced data availability: 1 pg peering
            1 pools have too many placement groups
            1 pools have pg_num > pgp_num

  services:
    mon: 3 daemons, quorum zstack136,zstack135,zstack137 (age 5h)
    mgr: zstack136(active, since 5h), standbys: zstack135
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 5h), 9 in (since 3d); 2 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 716 objects, 2.7 GiB
    usage:   8.4 GiB used, 442 GiB / 450 GiB avail
    pgs:     0.518% pgs not active
             16/2148 objects misplaced (0.745%)
             190 active+clean
             2   active+recovering
             1   remapped+peering

  io:
    recovery: 2.0 MiB/s, 0 objects/s

root@zstack137:~# ceph health
HEALTH_WARN Reduced data availability: 1 pg peering; 1 pools have too many placement groups; 1 pools have pg_num > pgp_num
root@zstack137:~# ceph health detail
HEALTH_WARN 1 pools have too many placement groups; 1 pools have pg_num > pgp_num
[WRN] POOL_TOO_MANY_PGS: 1 pools have too many placement groups
    Pool storagepool has 128 placement groups, should have 32
[WRN] SMALLER_PGP_NUM: 1 pools have pg_num > pgp_num
    pool storagepool pg_num 128 > pgp_num 32
root@zstack137:~#

ある程度時間が経過したあと

root@zstack137:~# ceph health detail
HEALTH_WARN 1 pools have too many placement groups; 1 pools have pg_num > pgp_num
[WRN] POOL_TOO_MANY_PGS: 1 pools have too many placement groups
    Pool storagepool has 128 placement groups, should have 32
[WRN] SMALLER_PGP_NUM: 1 pools have pg_num > pgp_num
    pool storagepool pg_num 128 > pgp_num 32
root@zstack137:~# ceph pg dump | awk '
BEGIN { IGNORECASE = 1 }
 /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
 /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
 up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
 for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
}
END {
 printf("\n");
 printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
 for (i in poollist) printf("--------"); printf("----------------\n");
 for (i in osdlist) { printf("osd.%i\t", i); sum=0;
   for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
 for (i in poollist) printf("--------"); printf("----------------\n");
 printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]); printf("|\n");
}'
dumped all

pool :  3       2       1       4       | SUM
------------------------------------------------
osd.3   4       5       1       15      | 25
osd.8   4       6       0       16      | 26
osd.6   2       4       0       16      | 22
osd.5   6       4       0       4       | 14
osd.2   3       3       0       11      | 17
osd.1   4       3       0       13      | 20
osd.4   1       1       0       17      | 19
osd.0   5       2       0       20      | 27
osd.7   3       4       0       16      | 23
------------------------------------------------
SUM :   32      32      1       128     |
root@zstack137:~# ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr             452.0k                3.0        449.9G  0.0000                                  1.0       1              on         False
cephfs_data          0                 3.0        449.9G  0.0000                                  1.0      32              on         False
cephfs_metadata  66203                 3.0        449.9G  0.0000                                  4.0      32              on         False
storagepool       2681M                3.0        449.9G  0.0175                                  1.0     128              warn       False
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.09735719383752e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x 4.43030785390874e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x    128 x             x             32 x warn              x                          x                           x replicated_rule x    0.018471721559763 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~#

pg_numを減らせる?

root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 128
root@zstack137:~# ceph osd pool set storagepool pg_num 32
set pool 4 pg_num to 32
root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 128
root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 124
root@zstack137:~#

徐々に減ってる

ステータスはHEALTH_OLに変わった

root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 119
root@zstack137:~# ceph health detail
HEALTH_OK
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.10063592223742e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x 4.43499772018185e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x    117 x             x             32 x warn              x                          x                           x replicated_rule x   0.0184909123927355 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~#

「ceph osd pool autoscale-status」の方のPG_NUMは即反映

root@zstack137:~# ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr             452.0k                3.0        449.9G  0.0000                                  1.0       1              on         False
cephfs_data          0                 3.0        449.9G  0.0000                                  1.0      32              on         False
cephfs_metadata  66203                 3.0        449.9G  0.0000                                  4.0      32              on         False
storagepool       2705M                3.0        449.9G  0.0176                                  1.0      32              warn       False
root@zstack137:~#

しばらく実行したらHEALTH_WARNになったときもあったが、比較的すぐにHEALTH_OKに戻ったりした。

root@zstack137:~# ceph health detail
HEALTH_WARN Reduced data availability: 2 pgs inactive, 2 pgs peering
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs peering
    pg 4.22 is stuck peering for 2d, current state peering, last acting [6,5,2]
    pg 4.62 is stuck peering for 6h, current state peering, last acting [6,5,2]
root@zstack137:~#

しばらく時間がたって変更が終わったあとに状態をとってみた

root@zstack137:~# ceph health detail
HEALTH_OK
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.13595910483855e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x  4.4855224246021e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x     32 x             x             32 x warn              x                          x                           x replicated_rule x   0.0186976287513971 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~# ceph -s
  cluster:
    id:     9e085d6a-77f3-41f1-8f6d-71fadc9c011b
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum zstack136,zstack135,zstack137 (age 6h)
    mgr: zstack136(active, since 6h), standbys: zstack135
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 6h), 9 in (since 3d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 716 objects, 2.7 GiB
    usage:   8.6 GiB used, 441 GiB / 450 GiB avail
    pgs:     97 active+clean

root@zstack137:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    450 GiB  441 GiB  8.7 GiB   8.7 GiB       1.94
TOTAL  450 GiB  441 GiB  8.7 GiB   8.7 GiB       1.94

--- POOLS ---
POOL             ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr              1    1  449 KiB        2  1.3 MiB      0    137 GiB
cephfs_data       2   32      0 B        0      0 B      0    137 GiB
cephfs_metadata   3   32   35 KiB       22  194 KiB      0    137 GiB
storagepool       4   32  2.7 GiB      692  8.0 GiB   1.89    137 GiB
root@zstack137:~#

とりあえず対処できた模様?

Windows Updateしたらsamba ADへの接続がおかしくなった


sambaで作ったsamba AD環境がある。

最近、新しい仮想マシン作ってドメイン参加したあと、PowerShellで「Test-ComputerSecureChannel」を実行したら「False」と出た。

repairしても状況が変わらない

(ADサーバと該当マシン間の通信がセキュアチャネルを使って行われていないという状態)

いまつかっているsambaバージョンを確認すると4.18.2

# /usr/local/samba/sbin/smbd --version
Version 4.18.2
#

sambaのリリース履歴を見るとNetAppでも問題になったセキュリティ修正関連での修正が出ている。

CVE-2023-34967CVE-2022-2127CVE-2023-34968CVE-2023-34966 and CVE-2023-3347.

直接のバグとしては「Bug 15418 – secure channel faulty since Windows 10/11 update 07/2023」が該当している模様

バグであるならアップデートするしかないな、というわけで、samba 4.18.5へアップデート

展開して、make;make install; systemctl restart samba-ad-dc っと

# /usr/local/samba/sbin/smbd --version
Version 4.18.5
#

で、repairすると、今度は成功しました