Proxmox VEのcephストレージ環境の動作を確認するためESXi上に RAM 16GBの仮想マシンを4台作ってテスト中(+1台 corosync qnetdサーバがいてProxmox VEクラスタの維持に使用)
で、あるタイミングから、各ノード上のosdのdownが多発するようになった
root@pve37:~# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.62549 - 480 GiB 210 GiB 204 GiB 178 KiB 5.4 GiB 270 GiB 43.65 1.00 - root default
-3 0.15637 - 160 GiB 59 GiB 58 GiB 47 KiB 1.4 GiB 101 GiB 36.92 0.85 - host pve36
0 hdd 0.03909 1.00000 40 GiB 15 GiB 14 GiB 7 KiB 403 MiB 25 GiB 36.27 0.83 36 up osd.0
1 hdd 0.03909 1.00000 40 GiB 18 GiB 17 GiB 13 KiB 332 MiB 22 GiB 44.03 1.01 52 up osd.1
2 hdd 0.03909 1.00000 40 GiB 11 GiB 10 GiB 18 KiB 337 MiB 29 GiB 26.46 0.61 27 up osd.2
3 hdd 0.03909 1.00000 40 GiB 16 GiB 16 GiB 9 KiB 393 MiB 24 GiB 40.91 0.94 48 up osd.3
-5 0.15637 - 160 GiB 67 GiB 66 GiB 75 KiB 1.6 GiB 93 GiB 41.95 0.96 - host pve37
4 hdd 0.03909 1.00000 40 GiB 19 GiB 18 GiB 24 KiB 443 MiB 21 GiB 46.87 1.07 51 up osd.4
5 hdd 0.03909 1.00000 40 GiB 11 GiB 11 GiB 21 KiB 201 MiB 29 GiB 28.58 0.65 30 up osd.5
6 hdd 0.03909 1.00000 40 GiB 16 GiB 16 GiB 12 KiB 294 MiB 24 GiB 39.51 0.91 40 up osd.6
7 hdd 0.03909 1.00000 40 GiB 21 GiB 20 GiB 18 KiB 693 MiB 19 GiB 52.84 1.21 61 up osd.7
-7 0.15637 - 80 GiB 49 GiB 47 GiB 36 KiB 1.3 GiB 31 GiB 60.91 1.40 - host pve38
8 hdd 0.03909 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.8
9 hdd 0.03909 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.9
10 hdd 0.03909 1.00000 40 GiB 20 GiB 20 GiB 17 KiB 415 MiB 20 GiB 51.02 1.17 53 up osd.10
11 hdd 0.03909 1.00000 40 GiB 28 GiB 27 GiB 19 KiB 922 MiB 12 GiB 70.80 1.62 73 up osd.11
-9 0.15637 - 80 GiB 35 GiB 34 GiB 20 KiB 1.1 GiB 45 GiB 43.27 0.99 - host pve39
12 hdd 0.03909 1.00000 40 GiB 20 GiB 20 GiB 7 KiB 824 MiB 20 GiB 50.81 1.16 63 up osd.12
13 hdd 0.03909 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.13
14 hdd 0.03909 1.00000 40 GiB 14 GiB 14 GiB 13 KiB 303 MiB 26 GiB 35.72 0.82 0 down osd.14
15 hdd 0.03909 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down osd.15
TOTAL 480 GiB 210 GiB 204 GiB 183 KiB 5.4 GiB 270 GiB 43.65
MIN/MAX VAR: 0.61/1.62 STDDEV: 11.55
root@pve37:~#
pve38のosd.8とosd.9がdownになっているので、pve38にログインしてプロセスを確認すると、–id 8 と –id 9のceph-osdサービスが起動していないので、これらを再起動する
root@pve38:~# ps -ef|grep osd
ceph 1676 1 1 12:14 ? 00:02:01 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
ceph 1681 1 2 12:14 ? 00:02:45 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph
root 30916 30893 0 14:10 pts/0 00:00:00 grep osd
root@pve38:~# systemctl restart ceph-osd@8
root@pve38:~# systemctl restart ceph-osd@9
root@pve38:~#
しばらく待つとupになる
root@pve38:~# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.62549 - 600 GiB 227 GiB 221 GiB 229 KiB 5.6 GiB 373 GiB 37.84 1.00 - root default
-3 0.15637 - 160 GiB 53 GiB 52 GiB 47 KiB 1.4 GiB 107 GiB 33.11 0.88 - host pve36
0 hdd 0.03909 1.00000 40 GiB 13 GiB 12 GiB 7 KiB 403 MiB 27 GiB 31.50 0.83 28 up osd.0
1 hdd 0.03909 1.00000 40 GiB 16 GiB 16 GiB 13 KiB 332 MiB 24 GiB 40.30 1.07 47 up osd.1
2 hdd 0.03909 1.00000 40 GiB 9.7 GiB 9.4 GiB 18 KiB 337 MiB 30 GiB 24.21 0.64 22 up osd.2
3 hdd 0.03909 1.00000 40 GiB 15 GiB 14 GiB 9 KiB 393 MiB 25 GiB 36.41 0.96 41 up osd.3
-5 0.15637 - 160 GiB 61 GiB 59 GiB 75 KiB 1.6 GiB 99 GiB 37.89 1.00 - host pve37
4 hdd 0.03909 1.00000 40 GiB 16 GiB 15 GiB 24 KiB 443 MiB 24 GiB 39.75 1.05 41 up osd.4
5 hdd 0.03909 1.00000 40 GiB 10 GiB 10 GiB 21 KiB 201 MiB 30 GiB 25.52 0.67 26 up osd.5
6 hdd 0.03909 1.00000 40 GiB 14 GiB 13 GiB 12 KiB 278 MiB 26 GiB 34.26 0.91 32 up osd.6
7 hdd 0.03909 1.00000 40 GiB 21 GiB 20 GiB 18 KiB 693 MiB 19 GiB 52.02 1.37 52 up osd.7
-7 0.15637 - 120 GiB 57 GiB 55 GiB 54 KiB 1.5 GiB 63 GiB 47.17 1.25 - host pve38
8 hdd 0.03909 1.00000 40 GiB 14 GiB 14 GiB 18 KiB 132 MiB 26 GiB 35.75 0.94 30 up osd.8
9 hdd 0.03909 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 22 up osd.9
10 hdd 0.03909 1.00000 40 GiB 18 GiB 18 GiB 17 KiB 419 MiB 22 GiB 45.92 1.21 42 up osd.10
11 hdd 0.03909 1.00000 40 GiB 24 GiB 23 GiB 19 KiB 939 MiB 16 GiB 59.84 1.58 42 up osd.11
-9 0.15637 - 160 GiB 57 GiB 56 GiB 53 KiB 1.2 GiB 103 GiB 35.51 0.94 - host pve39
12 hdd 0.03909 1.00000 40 GiB 15 GiB 14 GiB 7 KiB 841 MiB 25 GiB 37.05 0.98 37 up osd.12
13 hdd 0.03909 1.00000 40 GiB 16 GiB 16 GiB 16 KiB 144 MiB 24 GiB 39.70 1.05 42 up osd.13
14 hdd 0.03909 1.00000 40 GiB 14 GiB 14 GiB 16 KiB 84 MiB 26 GiB 35.82 0.95 39 up osd.14
15 hdd 0.03909 1.00000 40 GiB 12 GiB 12 GiB 14 KiB 127 MiB 28 GiB 29.48 0.78 30 up osd.15
TOTAL 600 GiB 227 GiB 221 GiB 236 KiB 5.6 GiB 373 GiB 37.84
MIN/MAX VAR: 0/1.58 STDDEV: 12.91
root@pve38:~#
が・・・またしばらくすると、他のosdが落ちる、などしていた
RedHat Ceph Storage 7 トラブルシューティングガイドの「第5章 Ceph OSD のトラブルシューティング」5.1.7. OSDS のフラップ を確認すると、osdに指定されているディスクが遅いから、ということになるようだ。
osd_heartbeat_grace_time というパラメータをデフォルトの20秒から変更すると、タイムアウトまでの値を緩和できるのかな、と思ったのだが、どうやって設定するのかが不明…
ceph.orgのOSD Setting を見ると /etc/ceph/ceph.conf (PVEの場合、 /etc/pve/ceph.conf )に追加すればいいのかな?というところなんだけど、OSD Config Reference , Configuring Monitor/OSD Interaction を見ても osd_heartbeat_grace_time というパラメータが無い…(osd_heartbeat_grace ならあった)
RedHatドキュメントの続きに書いてある「この問題を解決するには、以下を行います。」のところを見ると、「ceph osd set noup」「ceph osd set nodown」を設定して、OSDをdownおよびupとしてマークするのを停止する、とある。
試しにnoup,nodownの療法を設定してみたところ、OSDサービスを起動してもceph osd df treeで確認するとdownのままとなっていた。
まあ、upになったとしてもupのマークを付けないのが「noup」だから当然ですね・・・
そんなわけで、「ceph osd unset noup」「ceph osd set nodown」でdownにしない、という設定を入れてみた
設定を入れると「ceph osd stat」での状態確認で「flags nodown」と表示されるようになる。
root@pve38:~# ceph osd stat
16 osds: 16 up (since 62m), 16 in (since 62m); epoch: e4996
flags nodown
root@pve38:~#
とりあえず、これで一時的なごまかしはできた。
ただ、これは、OSDで使用しているディスクが壊れたとしても downにならない、ということでもある。
なので、「nodown」フラグを設定しっぱなしで使う、というのはとても不適切となる。
ちゃんとした対処を行うためには、具体的に何が問題になっているのかを「ceph health detail」を実行して、具体的にSlow OSD heartbeats がどれくらい遅いのかを確認する
root@pve38:~# ceph health detail
HEALTH_WARN nodown flag(s) set; Slow OSD heartbeats on back (longest 5166.450ms); Slow OSD heartbeats on front (longest 5467.151ms)
[WRN] OSDMAP_FLAGS: nodown flag(s) set
[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 5166.450ms)
Slow OSD heartbeats on back from osd.13 [] to osd.8 [] 5166.450 msec
Slow OSD heartbeats on back from osd.13 [] to osd.0 [] 3898.044 msec
Slow OSD heartbeats on back from osd.12 [] to osd.9 [] 3268.881 msec
Slow OSD heartbeats on back from osd.10 [] to osd.3 [] 2610.064 msec possibly improving
Slow OSD heartbeats on back from osd.12 [] to osd.8 [] 2588.321 msec
Slow OSD heartbeats on back from osd.6 [] to osd.14 [] 2565.141 msec
Slow OSD heartbeats on back from osd.8 [] to osd.7 [] 2385.851 msec possibly improving
Slow OSD heartbeats on back from osd.13 [] to osd.11 [] 2324.505 msec
Slow OSD heartbeats on back from osd.8 [] to osd.12 [] 2305.474 msec possibly improving
Slow OSD heartbeats on back from osd.14 [] to osd.11 [] 2275.033 msec
Truncated long network list. Use ceph daemon mgr.# dump_osd_network for more information
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 5467.151ms)
Slow OSD heartbeats on front from osd.13 [] to osd.8 [] 5467.151 msec
Slow OSD heartbeats on front from osd.13 [] to osd.0 [] 3956.364 msec
Slow OSD heartbeats on front from osd.12 [] to osd.9 [] 3513.493 msec
Slow OSD heartbeats on front from osd.12 [] to osd.8 [] 2657.999 msec
Slow OSD heartbeats on front from osd.6 [] to osd.14 [] 2657.486 msec
Slow OSD heartbeats on front from osd.10 [] to osd.3 [] 2610.558 msec possibly improving
Slow OSD heartbeats on front from osd.8 [] to osd.7 [] 2436.661 msec possibly improving
Slow OSD heartbeats on front from osd.14 [] to osd.11 [] 2351.914 msec
Slow OSD heartbeats on front from osd.14 [] to osd.10 [] 2351.812 msec
Slow OSD heartbeats on front from osd.13 [] to osd.11 [] 2335.698 msec
Truncated long network list. Use ceph daemon mgr.# dump_osd_network for more information
root@pve38:~#
osd.7のログが出てるpve37にログインして /var/log/ceph/ceph-osd.7.log から「no replay from」と「osd.8」でgrep をかけてログを確認
おそらく「Slow OSD heartbeats on front from osd.8 [] to osd.7 [] 2436.661 msec」に相当するあたりがコレなのかな?というところ
2024-11-14T14:46:05.457+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:02.037605+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:06.454+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:02.037605+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:07.467+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:07.338127+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:08.418+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:07.338127+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:09.371+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:09.038264+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:10.416+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:09.038264+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:11.408+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:11.338592+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
oldset deadlineにある時刻と、その前にある時刻の差は20秒なので、 osd_heartbeat_grace もしくは osd_heartbeat_grace_time のデフォルト値 20 が効いてるんだろうなぁ、と推定できる
設定手法について記載を探してみたのだがなかなかない
Ceph Block Device 3rd Party Integration » Ceph iSCSI Gateway » iSCSI Gateway Requirements に下記のような設定例がある
[osd]
osd heartbeat grace = 20
osd heartbeat interval = 5
また、下記のように個別OSDに対して値を設定することも可能であるようだ
ceph tell osd.* config set osd_heartbeat_grace 20
ceph tell osd.* config set osd_heartbeat_interval 5
ceph daemon osd.0 config set osd_heartbeat_grace 20
ceph daemon osd.0 config set osd_heartbeat_interval 5
ceph tellの書式を確認すると「ceph tell osd.* config get osd_heartbeat_grace」で値がとれる模様
root@pve37:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
"osd_heartbeat_grace": "20"
}
osd.1: {
"osd_heartbeat_grace": "20"
}
osd.2: {
"osd_heartbeat_grace": "20"
}
osd.3: {
"osd_heartbeat_grace": "20"
}
osd.4: {
"osd_heartbeat_grace": "20"
}
osd.5: {
"osd_heartbeat_grace": "20"
}
osd.6: {
"osd_heartbeat_grace": "20"
}
osd.7: {
"osd_heartbeat_grace": "20"
}
osd.8: {
"osd_heartbeat_grace": "20"
}
osd.9: {
"osd_heartbeat_grace": "20"
}
osd.10: {
"osd_heartbeat_grace": "20"
}
osd.11: {
"osd_heartbeat_grace": "20"
}
osd.12: {
"osd_heartbeat_grace": "20"
}
osd.13: {
"osd_heartbeat_grace": "20"
}
osd.14: {
"osd_heartbeat_grace": "20"
}
osd.15: {
"osd_heartbeat_grace": "20"
}
root@pve37:~#
とりあえず「ceph tell osd.* config set osd_heartbeat_grace 30」と実行し、30に設定してみる
root@pve37:~# ceph tell osd.* config set osd_heartbeat_grace 30
osd.0: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.1: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.2: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.3: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.4: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.5: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.6: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.7: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.8: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.9: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.10: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.11: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.12: {
"success": "osd_heartbeat_grace = '' (not observed, change may require restart) "
}
osd.13: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.14: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
osd.15: {
"success": "osd_delete_sleep = '' osd_delete_sleep_hdd = '' osd_delete_sleep_hybrid = '' osd_delete_sleep_ssd = '' osd_heartbeat_grace = '' (not observed, change may require restart) osd_max_backfills = '' osd_pg_delete_cost = '' (not observed, change may require restart) osd_recovery_max_active = '' osd_recovery_max_active_hdd = '' osd_recovery_max_active_ssd = '' osd_recovery_sleep = '' osd_recovery_sleep_hdd = '' osd_recovery_sleep_hybrid = '' osd_recovery_sleep_ssd = '' osd_scrub_sleep = '' osd_snap_trim_sleep = '' osd_snap_trim_sleep_hdd = '' osd_snap_trim_sleep_hybrid = '' osd_snap_trim_sleep_ssd = '' "
}
root@pve37:~#
すべて「”success”」ではあるので設定変更は完了しているのだと思うが、応答が2種類あるのはなんなのだろうか?
設定が変更されたかどうかを確認
root@pve37:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
"osd_heartbeat_grace": "30"
}
osd.1: {
"osd_heartbeat_grace": "30"
}
osd.2: {
"osd_heartbeat_grace": "30"
}
osd.3: {
"osd_heartbeat_grace": "30"
}
osd.4: {
"osd_heartbeat_grace": "30"
}
osd.5: {
"osd_heartbeat_grace": "30"
}
osd.6: {
"osd_heartbeat_grace": "30"
}
osd.7: {
"osd_heartbeat_grace": "30"
}
osd.8: {
"osd_heartbeat_grace": "30"
}
osd.9: {
"osd_heartbeat_grace": "30"
}
osd.10: {
"osd_heartbeat_grace": "30"
}
osd.11: {
"osd_heartbeat_grace": "30"
}
osd.12: {
"osd_heartbeat_grace": "30"
}
osd.13: {
"osd_heartbeat_grace": "30"
}
osd.14: {
"osd_heartbeat_grace": "30"
}
osd.15: {
"osd_heartbeat_grace": "30"
}
root@pve37:~#
とはいえ、set時の出力に「(not observed, change may require restart)」とあるとおり、ceph-osdの再起動が必須であるようだ
/etc/pve/ceph.conf に変更したパラメータは反映されてない模様なので、 osd.4~osd.7があるサーバを再起動してからもう一度値を確認してみたら、20に戻っていた。
root@pve38:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
"osd_heartbeat_grace": "30"
}
osd.1: {
"osd_heartbeat_grace": "30"
}
osd.2: {
"osd_heartbeat_grace": "30"
}
osd.3: {
"osd_heartbeat_grace": "30"
}
osd.4: {
"osd_heartbeat_grace": "20"
}
osd.5: {
"osd_heartbeat_grace": "20"
}
osd.6: {
"osd_heartbeat_grace": "20"
}
osd.7: {
"osd_heartbeat_grace": "20"
}
osd.8: {
"osd_heartbeat_grace": "30"
}
osd.9: {
"osd_heartbeat_grace": "30"
}
osd.10: {
"osd_heartbeat_grace": "30"
}
osd.11: {
"osd_heartbeat_grace": "30"
}
osd.12: {
"osd_heartbeat_grace": "30"
}
osd.13: {
"osd_heartbeat_grace": "30"
}
osd.14: {
"osd_heartbeat_grace": "30"
}
osd.15: {
"osd_heartbeat_grace": "30"
}
root@pve38:~#
/etc/pve/ceph.conf の最後に下記を追加
[osd]
osd heartbeat grace = 30
設定後、再起動してから確認すると、想定通り30になっているのを確認。そもそも、osd_heartbeat_grace についてはceph tellコマンドでの設定変更後、再起動しないでも大丈夫、というやつなんでは?
root@pve38:~# ceph tell osd.* config get osd_heartbeat_grace
osd.0: {
"osd_heartbeat_grace": "30"
}
osd.1: {
"osd_heartbeat_grace": "30"
}
osd.2: {
"osd_heartbeat_grace": "30"
}
osd.3: {
"osd_heartbeat_grace": "30"
}
osd.4: {
"osd_heartbeat_grace": "30"
}
osd.5: {
"osd_heartbeat_grace": "30"
}
osd.6: {
"osd_heartbeat_grace": "30"
}
osd.7: {
"osd_heartbeat_grace": "30"
}
osd.8: {
"osd_heartbeat_grace": "30"
}
osd.9: {
"osd_heartbeat_grace": "30"
}
osd.10: {
"osd_heartbeat_grace": "30"
}
osd.11: {
"osd_heartbeat_grace": "30"
}
osd.12: {
"osd_heartbeat_grace": "30"
}
osd.13: {
"osd_heartbeat_grace": "30"
}
osd.14: {
"osd_heartbeat_grace": "30"
}
osd.15: {
"osd_heartbeat_grace": "30"
}
root@pve38:~#