vmadmin@hpevm:~$ virsh net-list --all
Name State Autostart Persistent
----------------------------------------
vmadmin@hpevm:~$
再起動後の実行結果
vmadmin@hpevm:~$ virsh net-list --all
Name State Autostart Persistent
--------------------------------------------
default active yes yes
vmadmin@hpevm:~$
クラウド名:「HPE VM Essentials環境」と「vSphere環境」の2種類を登録できる。
クラスター名:vSphereのクラスターとほぼ同じ意味合いでのクラスター。この下に実際の物理Ubuntuサーバ with HPE VMを登録する
初回ログイン時の設定項目について
初回ログイン時に設定が求められる項目として以下がある
マスターテナント名: 適当になんか名前を設定
マスターユーザーの作成で、主管理ユーザを作成。メールアドレスも必須
初期セットアップは、Install Morpheus で入力したものを指定
最後にライセンス登録。評価版の時はなにも入力しない
以上で初期セットアップ終了
HPE VM Managerセットアップ後のvirsh net-list –allを確認すると、設定が変わっている。
vmadmin@hpevm:~$ virsh net-list --all
Name State Autostart Persistent
-----------------------------------------------
default active yes yes
Management active yes yes
vmadmin@hpevm:~$
Management Net Interface、Compute Net Interface、Overlay net Interfaceに使用するネットワークインタフェース名を入れる。全部同じインタフェースを使用しても動作した。
(Overlay net Interfaceを空欄で進めたら指定していないはずの eno0デバイスがないというエラーが出たので、指定しないとダメっぽい)
タグVLANを使う場合はCompute Net Interfaceに使うインタフェース名と、COMPUTE VLANSにタグVLANの値を列挙する。
レイアウトは「HPE VM 1.1 Cluster on Existing Ubuntu 22.04」と「HPE VM 1.1 HCI Ceph Cluster on Existing Ubuntu 22.04」の選択肢になっているのだが、HCI構成の場合の要求要件がわからないのでまだ手を付けていない
vmadmin@hpevm:~$ virsh net-list --all
Name State Autostart Persistent
-----------------------------------------------
Compute active yes yes
Management active yes yes
vmadmin@hpevm:~$
ちゃんとした対処を行うためには、具体的に何が問題になっているのかを「ceph health detail」を実行して、具体的にSlow OSD heartbeats がどれくらい遅いのかを確認する
root@pve38:~# ceph health detail
HEALTH_WARN nodown flag(s) set; Slow OSD heartbeats on back (longest 5166.450ms); Slow OSD heartbeats on front (longest 5467.151ms)
[WRN] OSDMAP_FLAGS: nodown flag(s) set
[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 5166.450ms)
Slow OSD heartbeats on back from osd.13 [] to osd.8 [] 5166.450 msec
Slow OSD heartbeats on back from osd.13 [] to osd.0 [] 3898.044 msec
Slow OSD heartbeats on back from osd.12 [] to osd.9 [] 3268.881 msec
Slow OSD heartbeats on back from osd.10 [] to osd.3 [] 2610.064 msec possibly improving
Slow OSD heartbeats on back from osd.12 [] to osd.8 [] 2588.321 msec
Slow OSD heartbeats on back from osd.6 [] to osd.14 [] 2565.141 msec
Slow OSD heartbeats on back from osd.8 [] to osd.7 [] 2385.851 msec possibly improving
Slow OSD heartbeats on back from osd.13 [] to osd.11 [] 2324.505 msec
Slow OSD heartbeats on back from osd.8 [] to osd.12 [] 2305.474 msec possibly improving
Slow OSD heartbeats on back from osd.14 [] to osd.11 [] 2275.033 msec
Truncated long network list. Use ceph daemon mgr.# dump_osd_network for more information
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 5467.151ms)
Slow OSD heartbeats on front from osd.13 [] to osd.8 [] 5467.151 msec
Slow OSD heartbeats on front from osd.13 [] to osd.0 [] 3956.364 msec
Slow OSD heartbeats on front from osd.12 [] to osd.9 [] 3513.493 msec
Slow OSD heartbeats on front from osd.12 [] to osd.8 [] 2657.999 msec
Slow OSD heartbeats on front from osd.6 [] to osd.14 [] 2657.486 msec
Slow OSD heartbeats on front from osd.10 [] to osd.3 [] 2610.558 msec possibly improving
Slow OSD heartbeats on front from osd.8 [] to osd.7 [] 2436.661 msec possibly improving
Slow OSD heartbeats on front from osd.14 [] to osd.11 [] 2351.914 msec
Slow OSD heartbeats on front from osd.14 [] to osd.10 [] 2351.812 msec
Slow OSD heartbeats on front from osd.13 [] to osd.11 [] 2335.698 msec
Truncated long network list. Use ceph daemon mgr.# dump_osd_network for more information
root@pve38:~#
おそらく「Slow OSD heartbeats on front from osd.8 [] to osd.7 [] 2436.661 msec」に相当するあたりがコレなのかな?というところ
2024-11-14T14:46:05.457+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:02.037605+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:06.454+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:02.037605+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:07.467+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:07.338127+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:08.418+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:07.338127+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:09.371+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:09.038264+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:10.416+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:09.038264+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
2024-11-14T14:46:11.408+0900 7e72364006c0 -1 osd.7 4996 heartbeat_check: no reply from 172.17.44.38:6802 osd.8 since back 2024-11-14T14:46:11.338592+0900 front 2024-11-14T14:45:41.850539+0900 (oldest deadline 2024-11-14T14:46:05.334473+0900)
for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`; do for vmid in `pvesh ls /nodes/$hostname/qemu/|awk '{ print $2 }'`; do pvesh create /nodes/$hostname/qemu/$vmid/status/shutdown; done; done
実行例
root@pve36:~# for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`
> do for vmid in `pvesh ls /nodes/$hostname/qemu/|awk '{ print $2 }'`
> do
> pvesh create /nodes/$hostname/qemu/$vmid/status/shutdown
> done
> done
VM 102 not running
Requesting HA stop for VM 102
UPID:pve36:00014A47:0018FC0A:6731A591:hastop:102:root@pam:
VM 100 not running
Requesting HA stop for VM 100"UPID:pve37:00013A71:0018FDDC:6731A597:hastop:100:root@pam:"
VM 101 not running
Requesting HA stop for VM 101"UPID:pve38:000144BF:00192F03:6731A59D:hastop:101:root@pam:"
Requesting HA stop for VM 103"UPID:pve38:000144DC:0019305D:6731A5A0:hastop:103:root@pam:"
root@pve36:~#
これで、停止することを確認
次に仮想マシンの停止確認
root@pve36:~# pvesh get /nodes/pve36/qemu
lqqqqqqqqqwqqqqqqwqqqqqqwqqqqqqwqqqqqqqqqqqwqqqqqqqqqqwqqqqqqqqqwqqqqqwqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqk
x status x vmid x cpus x lock x maxdisk x maxmem x name x pid x qmpstatus x running-machine x running-qemu x tags x uptime x
tqqqqqqqqqnqqqqqqnqqqqqqnqqqqqqnqqqqqqqqqqqnqqqqqqqqqqnqqqqqqqqqnqqqqqnqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqu
x stopped x 102 x 2 x x 32.00 GiB x 2.00 GiB x testvm2 x x x x x x 0s x
mqqqqqqqqqvqqqqqqvqqqqqqvqqqqqqvqqqqqqqqqqqvqqqqqqqqqqvqqqqqqqqqvqqqqqvqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqj
root@pve36:~#
for hostname in `pvesh ls /nodes/|awk '{ print $2 }'`; do for vmid in `pvesh ls /nodes/$hostname/lxc/|awk '{ print $2 }'`; do pvesh create /nodes/$hostname/lxc/$vmid/status/shutdown;done;done
root@pve36:~# ceph fs get cephfs
Filesystem 'cephfs' (1)
fs_name cephfs
epoch 65
flags 12 joinable allow_snaps allow_multimds_snaps
created 2024-11-05T14:29:45.941671+0900
modified 2024-11-11T11:04:06.223151+0900
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
max_xattr_size 65536
required_client_features {}
last_failure 0
last_failure_osd_epoch 3508
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=784113}
failed
damaged
stopped
data_pools [3]
metadata_pool 4
inline_data disabled
balancer
bal_rank_mask -1
standby_count_wanted 1
[mds.pve36{0:784113} state up:active seq 29 addr [v2:172.17.44.36:6800/1472122357,v1:172.17.44.36:6801/1472122357] compat {c=[1],r=[1],i=[7ff]}]
root@pve36:~#
cluster_downはない?
root@pve36:~# ceph mds stat
cephfs:1 {0=pve36=up:active} 1 up:standby
root@pve36:~# ceph mds compat show
compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
root@pve36:~#
んー? ceph fsのcluster_down を実行してみる
root@pve36:~# ceph fs set cephfs cluster_down true
cephfs marked not joinable; MDS cannot join as newly active. WARNING: cluster_down flag is deprecated and will be removed in a future version. Please use "joinable".
root@pve36:~#
ceph fs set FS_NAME max_mds 1
ceph fs fail FS_NAME
ceph status
ceph fs set FS_NAME joinable false
IBM手順の方だとmax_mdsの操作は行わずに実施している
root@pve36:~# ceph fs status
cephfs - 4 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active pve36 Reqs: 0 /s 21 20 16 23
POOL TYPE USED AVAIL
cephfs_metadata metadata 244M 122G
cephfs_data data 31.8G 122G
STANDBY MDS
pve37
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
root@pve36:~# ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 failed
POOL TYPE USED AVAIL
cephfs_metadata metadata 244M 122G
cephfs_data data 31.8G 122G
STANDBY MDS
pve37
pve36
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
root@pve36:~# echo $?
0
root@pve36:~# ceph fs set cephfs joinable false
cephfs marked not joinable; MDS cannot join as newly active.
root@pve36:~# echo $?
0
root@pve36:~# ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 failed
POOL TYPE USED AVAIL
cephfs_metadata metadata 244M 122G
cephfs_data data 31.8G 122G
STANDBY MDS
pve37
pve36
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)
root@pve36:~#
…止めてしまったら、dfコマンドが実行できなくなるので注意
root@pve36:~# ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3599
root@pve36:~# ceph osd set noout
noout is set
root@pve36:~# ceph osd set norecover
norecover is set
root@pve36:~# ceph osd set norebalance
norebalance is set
root@pve36:~# ceph osd set nobackfill
nobackfill is set
root@pve36:~# ceph osd set nodown
nodown is set
root@pve36:~# ceph osd set pause
pauserd,pausewr is set
root@pve36:~# ceph osd stat
16 osds: 16 up (since 6h), 16 in (since 3d); epoch: e3605
flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
root@pve36:~#
root@pve36:~# ha-manager status
quorum OK
master pve39 (active, Mon Nov 11 18:24:13 2024)
lrm pve36 (idle, Mon Nov 11 18:24:15 2024)
lrm pve37 (idle, Mon Nov 11 18:24:18 2024)
lrm pve38 (idle, Mon Nov 11 18:24:18 2024)
lrm pve39 (idle, Mon Nov 11 18:24:15 2024)
service vm:100 (pve37, stopped)
service vm:101 (pve38, stopped)
service vm:102 (pve36, stopped)
service vm:103 (pve38, stopped)
root@pve36:~# ha-manager config
vm:100
state stopped
vm:101
state stopped
vm:102
state stopped
vm:103
state stopped
root@pve36:~#