HPE VMEのクラスタ制御関連のメモ

HPE VMEクラスタを作った場合、実際にクラスタ状態を制御してるのはなんなのか、というメモ

問題があった時に遭遇したものを書いてるだけなので、他にも仕組みがあるかもしれない

GFS2ファイルシステム周り

複数サーバでGFS2ファイルシステムを使う場合、制御に pacemaker+corosyncを利用していた。

サーバ3台構成で組んだ場合に、GFS2構築後に、停止状態から起動させると、しばらくGFS2ファイルシステムがマウントされていない状態になっている

これはpacemaker+corosyncにより所属サーバの過半数から得票がとれなければ正常に動作していないと判断され、GFS2ファイルシステムをマウントしない、ということから来ている。

GFS2ファイルシステムのマウント制御について確認する場合、まずは「sudo pcs status」コマンドを実行して、確認をする。下記は3サーバ構成で全て起動している場合のもの

pcuser@hpevme1:~$ sudo pcs status
Cluster name: 2rouf0rh0q94m4i
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: hpevme2 (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Wed Mar  4 10:57:03 2026 on hpevme1
  * Last change:  Fri Feb 27 17:56:49 2026 by root via cibadmin on hpevme3
  * 3 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ hpevme1 hpevme2 hpevme3 ]

Full List of Resources:
  * Clone Set: dlm-clone [dlm]:
    * Started: [ hpevme1 hpevme2 hpevme3 ]
  * hpevm_gfs2_scsi     (stonith:fence_scsi_hpevm):      Started hpevme1
  * Clone Set: gfs2datastore_1ffa9-clone [gfs2datastore_1ffa9]:
    * Started: [ hpevme1 hpevme2 hpevme3 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
pcuser@hpevme1:~$

“Daemon Status”に表示されているようにcorosync, pacemakerが関連していることがわかる

「sudo pcs status –full」を実行すると詳細表示

pcuser@hpevme1:~$ sudo pcs status --full
Cluster name: 2rouf0rh0q94m4i
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: hpevme2 (2) (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Wed Mar  4 10:57:07 2026 on hpevme1
  * Last change:  Fri Feb 27 17:56:49 2026 by root via cibadmin on hpevme3
  * 3 nodes configured
  * 7 resource instances configured

Node List:
  * Node hpevme1 (1): online, feature set 3.17.4
  * Node hpevme2 (2): online, feature set 3.17.4
  * Node hpevme3 (3): online, feature set 3.17.4

Full List of Resources:
  * Clone Set: dlm-clone [dlm]:
    * dlm       (ocf:pacemaker:controld):        Started hpevme2
    * dlm       (ocf:pacemaker:controld):        Started hpevme1
    * dlm       (ocf:pacemaker:controld):        Started hpevme3
  * hpevm_gfs2_scsi     (stonith:fence_scsi_hpevm):      Started hpevme1
  * Clone Set: gfs2datastore_1ffa9-clone [gfs2datastore_1ffa9]:
    * gfs2datastore_1ffa9       (ocf:heartbeat:Filesystem):      Started hpevme2
    * gfs2datastore_1ffa9       (ocf:heartbeat:Filesystem):      Started hpevme1
    * gfs2datastore_1ffa9       (ocf:heartbeat:Filesystem):      Started hpevme3

Migration Summary:

Fencing History:
  * unfencing of hpevme2 successful: delegate=hpevme2, client=pacemaker-fenced.2051, origin=hpevme3, completed='2026-03-04 10:50:33.927998 +09:00'
  * unfencing of hpevme1 successful: delegate=hpevme1, client=pacemaker-controld.1993, origin=hpevme2, completed='2026-03-04 10:50:33.860998 +09:00'
  * unfencing of hpevme3 successful: delegate=hpevme3, client=pacemaker-controld.1993, origin=hpevme2, completed='2026-03-04 10:50:33.718998 +09:00'

Tickets:

PCSD Status:
  hpevme1: Online
  hpevme2: Online
  hpevme3: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
pcuser@hpevme1:~$

pcsコマンドの下層に、corosyncがある、という構造になっているので、corosync側のステータスを確認することもできる

過半数をどうやって調査しているか、という部分はcorosyncのquorumで行っているので、そのステータスを確認するために「sudo corosync-quorumtool -s」を実行

pcuser@hpevme1:~$ sudo corosync-quorumtool -s
Quorum information
------------------
Date:             Wed Mar  4 11:04:48 2026
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          1.3d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
         1          1 hpevme1 (local)
         2          1 hpevme2
         3          1 hpevme3
pcuser@hpevme1:~$

corosyncで行う各サーバ間の通信がどのIPで行われているかは「sudo corosync-cfgtool -n」

pcuser@hpevme1:~$ sudo corosync-cfgtool -n
Local node ID 1, transport knet
nodeid: 2 reachable
   LINK: 0 udp (192.168.1.51->192.168.1.52) enabled connected mtu: 1397

nodeid: 3 reachable
   LINK: 0 udp (192.168.1.51->192.168.1.53) enabled connected mtu: 1397

pcuser@hpevme1:~$ sudo corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0 udp
        addr    = 192.168.1.51
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
                nodeid:          3:     connected
pcuser@hpevme1:~$

なお、上記例では「192.168.1.0/24」のネットワークを使っているが、このIPアドレス帯はマネージメント用として各物理サーバおよびHPE VME Managerサーバに使用しているものとなる。(Webアクセスや、sshアクセスで使うもの)

corosync用のネットワークを別のサブネットに設定する、といった設定GUIはv8.0.13でもなかったような・・・


サーバ1台止めた場合のコマンド出力

「sudo pcs status」

pcuser@hpevme2:~$ sudo pcs status
[sudo] password for pcuser:
Cluster name: 2rouf0rh0q94m4i
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: hpevme2 (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Wed Mar  4 13:11:33 2026 on hpevme2
  * Last change:  Fri Feb 27 17:56:49 2026 by root via cibadmin on hpevme3
  * 3 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ hpevme2 hpevme3 ]
  * OFFLINE: [ hpevme1 ]

Full List of Resources:
  * Clone Set: dlm-clone [dlm]:
    * Started: [ hpevme2 hpevme3 ]
    * Stopped: [ hpevme1 ]
  * hpevm_gfs2_scsi     (stonith:fence_scsi_hpevm):      Started hpevme2
  * Clone Set: gfs2datastore_1ffa9-clone [gfs2datastore_1ffa9]:
    * Started: [ hpevme2 hpevme3 ]
    * Stopped: [ hpevme1 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
pcuser@hpevme2:~$

「sudo pcs status –full」

pcuser@hpevme2:~$ sudo pcs status --full
Cluster name: 2rouf0rh0q94m4i
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: hpevme2 (2) (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Wed Mar  4 13:12:26 2026 on hpevme2
  * Last change:  Fri Feb 27 17:56:49 2026 by root via cibadmin on hpevme3
  * 3 nodes configured
  * 7 resource instances configured

Node List:
  * Node hpevme1 (1): OFFLINE
  * Node hpevme2 (2): online, feature set 3.17.4
  * Node hpevme3 (3): online, feature set 3.17.4

Full List of Resources:
  * Clone Set: dlm-clone [dlm]:
    * dlm       (ocf:pacemaker:controld):        Started hpevme2
    * dlm       (ocf:pacemaker:controld):        Started hpevme3
    * dlm       (ocf:pacemaker:controld):        Stopped
  * hpevm_gfs2_scsi     (stonith:fence_scsi_hpevm):      Started hpevme2
  * Clone Set: gfs2datastore_1ffa9-clone [gfs2datastore_1ffa9]:
    * gfs2datastore_1ffa9       (ocf:heartbeat:Filesystem):      Started hpevme2
    * gfs2datastore_1ffa9       (ocf:heartbeat:Filesystem):      Started hpevme3
    * gfs2datastore_1ffa9       (ocf:heartbeat:Filesystem):      Stopped

Migration Summary:

Fencing History:
  * reboot of hpevme1 successful: delegate=hpevme2, client=stonith-api.30408, origin=hpevme2, completed='2026-03-04 13:05:43.082713 +09:00'
  * unfencing of hpevme2 successful: delegate=hpevme2, client=pacemaker-fenced.2051, origin=hpevme3, completed='2026-03-04 10:50:33.986998 +09:00'
  * unfencing of hpevme1 successful: delegate=hpevme1, client=pacemaker-controld.1993, origin=hpevme2, completed='2026-03-04 10:50:33.916998 +09:00'
  * unfencing of hpevme3 successful: delegate=hpevme3, client=pacemaker-controld.1993, origin=hpevme2, completed='2026-03-04 10:50:33.773998 +09:00'

Tickets:

PCSD Status:
  hpevme1: Offline
  hpevme2: Online
  hpevme3: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
pcuser@hpevme2:~$

「sudo corosync-quorumtool -s」

pcuser@hpevme2:~$ sudo corosync-quorumtool -s
Quorum information
------------------
Date:             Wed Mar  4 13:13:02 2026
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          2.41
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
         2          1 hpevme2 (local)
         3          1 hpevme3
pcuser@hpevme2:~$

「sudo corosync-cfgtool -n」

pcuser@hpevme2:~$ sudo corosync-cfgtool -n
Local node ID 2, transport knet
nodeid: 3 reachable
   LINK: 0 udp (192.168.1.52->192.168.1.53) enabled connected mtu: 1397

pcuser@hpevme2:~$ sudo corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0 udp
        addr    = 192.168.1.52
        status:
                nodeid:          1:     disconnected
                nodeid:          2:     localhost
                nodeid:          3:     connected
pcuser@hpevme2:~$

HPE VMEでのiSCSIストレージ登録とGFS2データストア作成

HPE VME ver 8.0.13環境を作ってみたところ、以前と比べてGFS2データストア作成プロセスが改善されていたのでメモ

(1) HPE VME側でiSCSIターゲットIP登録

[インフラストラクチャ]-[クラスター]で該当クラスタを選択し、[ストレージ]-[iSCSI]にてiSCSIストレージのターゲットIPアドレスを登録

これを行うと、しばらくすると各サーバの /etc/iscsi/initiatorname.iscsi  の InitiatorName に設定されている名前で、iSCSIストレージに対してアクセスが実施される

(2) iSCSIストレージ側でInitiatorNameを登録

iSCSIストレージ側で、各サーバの /etc/iscsi/initiatorname.iscsi  の InitiatorName の名前を登録し、アクセスを許可する

NetAppの例

しばらく待っても接続ステータスが変更されない場合は、該当のVMEサーバにログインして「sudo iscsiadm -m session –rescan」を実行してスキャンを行う

pcuser@hpevme1:~$ sudo iscsiadm -m session --rescan
Rescanning session [sid: 1, target: iqn.1992-08.com.netapp:sn.588844e7ec3411f0a4bd000c292a75e7:vs.8, portal: 192.168.3.35,3260]
Rescanning session [sid: 2, target: iqn.1992-08.com.netapp:sn.588844e7ec3411f0a4bd000c292a75e7:vs.8, portal: 192.168.2.35,3260]
pcuser@hpevme1:~$

(3) HPE VME側でディスクが認識されているか確認

[インフラストラクチャ]-[クラスター]で該当クラスタを選択し、[ストレージ]-[データストア]にて「追加」をクリックして表示される「データストアの追加」にて

「Type:GFS2 Pool (Global File System 2)」を選択し、「BLOCK DEVICE」の選択を確認する

上記の様に「/dev/mapper/~」というディスクが認識されていれば、マルチパスで認識されているiSCSIディスクとなる。

mapperというデバイスが認識されていない場合、iSCSIマルチパスが動作しているかを確認する

まずはディスクデバイスが認識されているかを「sudo iscsiadm -m session -P 3」を実行して、「Attached SCSI devices:」の後に「scsi ??? Channel 00 ID 0 Lun :1」といった形でディスクが認識されていることを確認

pcuser@hpevme1:~$ sudo iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 2.1.9
Target: iqn.1992-08.com.netapp:sn.588844e7ec3411f0a4bd000c292a75e7:vs.8 (non-flash)
        Current Portal: 192.168.3.35:3260,1028
        Persistent Portal: 192.168.3.35:3260,1028
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme1:42939
                Iface IPaddress: 192.168.3.51
                Iface HWaddress: default
                Iface Netdev: default
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 33 State: running
                scsi33 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sdc          State: running
        Current Portal: 192.168.2.35:3260,1027
        Persistent Portal: 192.168.2.35:3260,1027
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-12.com.hpe:hpevme1:42939
                Iface IPaddress: 192.168.2.51
                Iface HWaddress: default
                Iface Netdev: default
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 1048576
                ImmediateData: Yes
                InitialR2T: Yes
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 34 State: running
                scsi34 Channel 00 Id 0 Lun: 1
                        Attached scsi disk sdb          State: running
pcuser@hpevme1:~$

これが認識されていないようであればiSCSIストレージ側のLUNマッピング設定やイニシエータのマッピング設定を見直す

マルチパスが動作しているかを確認する場合は「sudo multipath -ll」を実行して確認

pcuser@hpevme1:~$ sudo multipath -ll
3600a09807770457a795d5a554c634a58 dm-1 NETAPP,LUN C-Mode
size=100G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 34:0:0:1 sdb 8:16 active ready running
  `- 33:0:0:1 sdc 8:32 active ready running
pcuser@hpevme1:~$

上記の様に、ツリーが表示されていればマルチパスが動作している

以前のバージョンではmultipathの設定を手動で行っていたが、ver 8.0.13では不要だった。

(4) HPE VME側でGFS2ファイルシステムの作成

[インフラストラクチャ]-[クラスター]で該当クラスタを選択し、[ストレージ]-[データストア]にて「追加」をクリックして表示される「データストアの追加」にて

「Type:GFS2 Pool (Global File System 2)」を選択し、「BLOCK DEVICE」の選択/dev/mapper/で始まるマルチパスデバイスを指定

しばらくファイルシステム作成が実施される

ver8.0.13時点ではファイルシステムが完成しても通知はなかったので、リロードなどして表示を更新する

これでGFS2データストアは作成できた。

sambaで作ったActive DirectoryではGet-ADDomainコマンドなどはエラーになる

sambaで作ったActive Directoryサーバ環境で、Windows ServerからPowerShellのGet-ADDomainコマンドを実行してみたところエラーとなった

PS C:\Users\administrator.ADSAMPLE> Get-ADDomain
Get-ADDomain : Active Directory Web サービスが実行されている状態で既定のサーバーを検索することはできません。
発生場所 行:1 文字:1
+ Get-ADDomain
+ ~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (ADSAMPLE:ADDomain) [Get-ADDomain], ADServerDownException
    + FullyQualifiedErrorId : ActiveDirectoryServer:1355,Microsoft.ActiveDirectory.Management.Commands.GetADDomain

PS C:\Users\administrator.ADSAMPLE>

サーバ名を指定すればいけるかな?と「Get-ADDomain -Server サーバ名」にしてもエラー

PS C:\Users\administrator.ADSAMPLE> Get-ADDomain -Server adsample.local
Get-ADDomain : サーバーと通信できません。サーバーが存在しないか、現在ダウンしているか、サーバー上で Active Directory Web サービスが実行されていない可能性があります。
発生場所 行:1 文字:1
+ Get-ADDomain -Server adsample.local
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (:) [Get-ADDomain], ADServerDownException
    + FullyQualifiedErrorId : ActiveDirectoryServer:0,Microsoft.ActiveDirectory.Management.Commands.GetADDomain

PS C:\Users\administrator.ADSAMPLE>

今度は「Active Directory Web サービスが実行されていない」ことがエラーの原因とされている。

Active Directory Web サービス について確認すると、どうやらsamba では提供されていないようだ。

2021年4月13日に作成されたsamba wikiのページ「ADWS / AD Powershell compatibility」に、「Samba does not support many of the AD PowerShell commands that manipulate AD objects,」とある

そこによると、2016年頃に「samba-adws」というPowerShellコマンドを使えるようにするプロジェクトが立ち上がって、開発中とのこと。

masterブランチの最終更新は2018年12月であるが、「garming-main」ブランチを見ると、2024年9月頃までなんかやっていたようで、それは https://github.com/GSam/samba-adws にておもに開発してたようで、それは https://github.com/GSam/samba で公開されているパッチ版で利用できるようだ

ただ、どちらにせよ、1年以上更新はされていないようだ

このため、PowerShellのActive Directory関連のコマンドレット群はsamba環境で使用できない、ということになるようだ

vCenterに作成したユーザのパスワード有効期限はデフォルト90日間

vSphere 8.0環境で、vCenter上に作成した新規ユーザのパスワード有効期限は、標準設定のままだと90日間となっている。

バックアップ専用に新規ユーザを作成する場合など、特殊なユーザに対してだけ、有効期限設定を無効化したい場合は、VCSA仮想マシンに対してsshログインして、設定を行う必要がある。

ドキュメント:vSphere IaaS Control Plane 7.0 「dir-cli コマンド リファレンス

(1)VCSA仮想マシンにsshでログイン

sshでアクセスし、rootユーザでログイン

(2)shellモードに移行

ログインすると「Command>」というプロンプト
そこに「shell」と入力し、UNIXコマンドが利用できるようにする

Connected to service

    * List APIs: "help api list"
    * List Plugins: "help pi list"
    * Launch BASH: "shell"

Command> shell
Shell access is granted to root
root@vcsa [ ~ ]#

(3)dir-cliコマンドで現在のアカウント状態を確認

/usr/lib/vmware-vmafd/bin/dir-cli user find-by-name –account アカウント名 –level 2
「Password never expires:」が「FALSE」となっているとパスワード有効期限設定が有効で「Password expiry」にある日付で無効化される状態です。

root@vcsa [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account backupuser --level 2
Enter password for administrator@vsphere.local:
Account: backupuser
UPN: backupuser@VSPHERE.LOCAL
Account disabled: FALSE
Account locked: FALSE
Password never expires: FALSE
Password expired: FALSE
Password expiry: 874 day(s) 18 hour(s) 17 minute(s) 48 second(s)
root@vcsa [ ~ ]#

(4)dir-cliコマンドで パスワード有効期限を無効とします

/usr/lib/vmware-vmafd/bin/dir-cli user modify –account アカウント名 –password-never-expires

root@vcsa [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli user modify --account backupuser --password-never-expires
Enter password for administrator@vsphere.local:
Password set to never expire for [backupuser].
root@vcsa [ ~ ]#

(5)dir-cliコマンドでアカウント状態が変更されたことを確認

/usr/lib/vmware-vmafd/bin/dir-cli user find-by-name –account アカウント名 –level 2
「Password never expires:」が「TRUE」となっているとパスワード有効期限設定が無効です
「Password expiry:N/A」と有効期限も未設定となっています

root@vcsa [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account backupuser --level 2
Enter password for administrator@vsphere.local:
Account: backupuser
UPN: backupuser@VSPHERE.LOCAL
Account disabled: FALSE
Account locked: FALSE
Password never expires: TRUE
Password expired: FALSE
Password expiry: N/A
root@vcsa [ ~ ]#

ちなみにlevel 2オプションなしで実行した場合は下記情報しかみれません

root@vcsa [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name –account backupuser
Enter password for administrator@vsphere.local:
Account: backupuser
UPN: backupuser@VSPHERE.LOCAL
root@vcsa [ ~ ]#

2サーバでcephを組んで見た(未解決)

Proxmox VEの2サーバ+ corosync qdeviceサーバの3サーバ構成でProxmox VEクラスタを作った際に、cephを組めるのかな?と実験してみた

Proxmox VEサーバは CPU6コア、メモリ16GB、システムディスク120GBで作成し、ceph用ストレージとして16GBディスクを3個で稼働させた。

とりあえず動いてる

モニタとマネージャは各サーバに1個ずつ設定した。

ただ、1台止めた場合に、cephは使えなくなる状態である。

後述の2ノードに均等に同じデータを持たす設定としても、ceph環境での多数決で過半数を問えるようにするには、仮想でもいいのでもう1ノード立てないと実現できないので、どうしようかなぁ・・・という状態となっている。

とりあえずcephストレージに仮想マシンを配置した場合の動作確認には使えるので、とりあえずはこれでいいか、としているが、

実は2ノードだとデータがミラー構成になるので、確保したディスク容量の1/2以下しか使えないのに対して、3ノードであれば、1/2~2/3の間程度が使える計算となるのでそっちの構成の方が良かったかなぁ・・と思わなくもない

詳細確認

まず「ceph health」と「ceph health detail」を実行して確認

root@proxmoxa:~# ceph health
HEALTH_WARN clock skew detected on mon.proxmoxb; Degraded data redundancy: 28/90 objects degraded (31.111%), 25 pgs degraded, 128 pgs undersized
root@proxmoxa:~# ceph health detail
HEALTH_WARN clock skew detected on mon.proxmoxb; Degraded data redundancy: 39/123 objects degraded (31.707%), 35 pgs degraded, 128 pgs undersized
[WRN] MON_CLOCK_SKEW: clock skew detected on mon.proxmoxb
    mon.proxmoxb clock skew 0.305298s > max 0.05s (latency 0.00675958s)
[WRN] PG_DEGRADED: Degraded data redundancy: 39/123 objects degraded (31.707%), 35 pgs degraded, 128 pgs undersized
    pg 2.0 is stuck undersized for 26m, current state active+undersized, last acting [3,1]
    pg 2.1 is stuck undersized for 26m, current state active+undersized, last acting [2,5]
    pg 2.2 is stuck undersized for 26m, current state active+undersized, last acting [5,1]
    pg 2.3 is stuck undersized for 26m, current state active+undersized, last acting [5,2]
    pg 2.4 is stuck undersized for 26m, current state active+undersized+degraded, last acting [1,4]
    pg 2.5 is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.6 is stuck undersized for 26m, current state active+undersized, last acting [1,3]
    pg 2.7 is stuck undersized for 26m, current state active+undersized+degraded, last acting [3,2]
    pg 2.8 is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.9 is stuck undersized for 26m, current state active+undersized, last acting [1,4]
    pg 2.a is stuck undersized for 26m, current state active+undersized+degraded, last acting [1,4]
    pg 2.b is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.c is stuck undersized for 26m, current state active+undersized, last acting [2,3]
    pg 2.d is stuck undersized for 26m, current state active+undersized, last acting [1,3]
    pg 2.e is stuck undersized for 26m, current state active+undersized+degraded, last acting [2,3]
    pg 2.f is stuck undersized for 26m, current state active+undersized, last acting [4,0]
    pg 2.10 is stuck undersized for 26m, current state active+undersized, last acting [2,4]
    pg 2.11 is stuck undersized for 26m, current state active+undersized, last acting [4,1]
    pg 2.1c is stuck undersized for 26m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.1e is stuck undersized for 26m, current state active+undersized+degraded, last acting [2,5]
    pg 2.1f is stuck undersized for 26m, current state active+undersized+degraded, last acting [0,3]
    pg 2.20 is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,1]
    pg 2.21 is stuck undersized for 26m, current state active+undersized, last acting [2,4]
    pg 2.22 is stuck undersized for 26m, current state active+undersized, last acting [3,2]
    pg 2.23 is stuck undersized for 26m, current state active+undersized, last acting [0,3]
    pg 2.24 is stuck undersized for 26m, current state active+undersized, last acting [5,1]
    pg 2.25 is stuck undersized for 26m, current state active+undersized, last acting [4,1]
    pg 2.26 is stuck undersized for 26m, current state active+undersized, last acting [5,2]
    pg 2.27 is stuck undersized for 26m, current state active+undersized, last acting [3,0]
    pg 2.28 is stuck undersized for 26m, current state active+undersized, last acting [2,3]
    pg 2.29 is stuck undersized for 26m, current state active+undersized+degraded, last acting [3,1]
    pg 2.2a is stuck undersized for 26m, current state active+undersized, last acting [5,0]
    pg 2.2b is stuck undersized for 26m, current state active+undersized, last acting [2,4]
    pg 2.2c is stuck undersized for 26m, current state active+undersized, last acting [2,5]
    pg 2.2d is stuck undersized for 26m, current state active+undersized, last acting [5,2]
    pg 2.2e is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2f is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,0]
    pg 2.30 is stuck undersized for 26m, current state active+undersized+degraded, last acting [4,0]
    pg 2.31 is stuck undersized for 26m, current state active+undersized, last acting [0,5]
    pg 2.32 is stuck undersized for 26m, current state active+undersized, last acting [5,1]
    pg 2.33 is stuck undersized for 26m, current state active+undersized, last acting [3,1]
    pg 2.34 is stuck undersized for 26m, current state active+undersized+degraded, last acting [5,0]
    pg 2.35 is stuck undersized for 26m, current state active+undersized, last acting [1,3]
    pg 2.36 is stuck undersized for 26m, current state active+undersized, last acting [1,4]
    pg 2.37 is stuck undersized for 26m, current state active+undersized, last acting [3,1]
    pg 2.38 is stuck undersized for 26m, current state active+undersized+degraded, last acting [0,5]
    pg 2.39 is stuck undersized for 26m, current state active+undersized, last acting [1,5]
    pg 2.7d is stuck undersized for 26m, current state active+undersized, last acting [0,4]
    pg 2.7e is stuck undersized for 26m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7f is stuck undersized for 26m, current state active+undersized+degraded, last acting [4,1]
root@proxmoxa:~#

続いて「ceph -s」

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            clock skew detected on mon.proxmoxb
            Degraded data redundancy: 120/366 objects degraded (32.787%), 79 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 27m)
    mgr: proxmoxa(active, since 34m), standbys: proxmoxb
    osd: 6 osds: 6 up (since 28m), 6 in (since 29m); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 122 objects, 436 MiB
    usage:   1.0 GiB used, 95 GiB / 96 GiB avail
    pgs:     120/366 objects degraded (32.787%)
             2/366 objects misplaced (0.546%)
             79 active+undersized+degraded
             49 active+undersized
             1  active+clean+remapped

  io:
    client:   15 KiB/s rd, 8.4 MiB/s wr, 17 op/s rd, 13 op/s wr

root@proxmoxa:~#

clock skew detected

まずは「clock skew detected」について確認

[WRN] MON_CLOCK_SKEW: clock skew detected on mon.proxmoxb
    mon.proxmoxb clock skew 0.305298s > max 0.05s (latency 0.00675958s)

MON_CLOCK_SKEW」にある通りサーバ間の時刻に差がある、というもの

mon_clock_drift_allowed が標準では 0.05秒で設定されているものに対して「mon.proxmoxb clock skew 0.305298s」となっているため警告となっている。

今回の検証環境はESXi 8.0 Free版の上に立てているので、全体的な処理パワーが足りずに遅延になっているのではないかと思われるため無視する

設定として無視する場合はproxmox wikiの Ceph Configuration にあるように「ceph config コマンド」で行う

現在の値を確認

root@proxmoxa:~# ceph config get mon mon_clock_drift_allowed
0.050000
root@proxmoxa:~#

設定を変更、今回は0.5ぐらいにしておくか

root@proxmoxa:~# ceph config set mon mon_clock_drift_allowed 0.5
root@proxmoxa:~# ceph config get mon mon_clock_drift_allowed
0.500000
root@proxmoxa:~#

メッセージが消えたことを確認

コマンドを実行しても消えていることを確認

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 427/1287 objects degraded (33.178%), 124 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 40m)
    mgr: proxmoxa(active, since 47m), standbys: proxmoxb
    osd: 6 osds: 6 up (since 41m), 6 in (since 41m); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 429 objects, 1.6 GiB
    usage:   3.4 GiB used, 93 GiB / 96 GiB avail
    pgs:     427/1287 objects degraded (33.178%)
             2/1287 objects misplaced (0.155%)
             124 active+undersized+degraded
             4   active+undersized
             1   active+clean+remapped

  io:
    client:   685 KiB/s rd, 39 KiB/s wr, 7 op/s rd, 2 op/s wr

root@proxmoxa:~#

ceph helth detailからも消えたことを確認

root@proxmoxa:~# ceph health
HEALTH_WARN Degraded data redundancy: 426/1284 objects degraded (33.178%), 124 pgs degraded, 128 pgs undersized
root@proxmoxa:~# ceph health detail
HEALTH_WARN Degraded data redundancy: 422/1272 objects degraded (33.176%), 124 pgs degraded, 128 pgs undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 422/1272 objects degraded (33.176%), 124 pgs degraded, 128 pgs undersized
    pg 2.0 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.1 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.3 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,2]
    pg 2.4 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.5 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.6 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,3]
    pg 2.7 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,2]
    pg 2.8 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.9 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.a is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.b is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.c is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,3]
    pg 2.d is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,3]
    pg 2.e is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,3]
    pg 2.f is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,0]
    pg 2.10 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,4]
    pg 2.11 is active+undersized+degraded, acting [4,1]
    pg 2.1c is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,0]
    pg 2.1e is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,5]
    pg 2.1f is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,3]
    pg 2.20 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.21 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,4]
    pg 2.22 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,2]
    pg 2.23 is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,3]
    pg 2.24 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.25 is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,1]
    pg 2.26 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,2]
    pg 2.27 is stuck undersized for 39m, current state active+undersized, last acting [3,0]
    pg 2.28 is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,3]
    pg 2.29 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.2a is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2b is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,4]
    pg 2.2c is stuck undersized for 39m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2d is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,2]
    pg 2.2e is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2f is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.30 is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,0]
    pg 2.31 is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,5]
    pg 2.32 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,1]
    pg 2.33 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.34 is stuck undersized for 39m, current state active+undersized+degraded, last acting [5,0]
    pg 2.35 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,3]
    pg 2.36 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,4]
    pg 2.37 is stuck undersized for 39m, current state active+undersized+degraded, last acting [3,1]
    pg 2.38 is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,5]
    pg 2.39 is stuck undersized for 39m, current state active+undersized+degraded, last acting [1,5]
    pg 2.7d is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7e is stuck undersized for 39m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7f is stuck undersized for 39m, current state active+undersized+degraded, last acting [4,1]
root@proxmoxa:~#

PG_DEGRADED: Degraded data redundancy

たくさん出ているやつについて調査

まずは PG_DEGRADED を確認・・・

osdがdownしているわけではないので、参考にならなそう

とりあえず関連しそうな状態を確認

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 803/2415 objects degraded (33.251%), 128 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 89m)
    mgr: proxmoxa(active, since 96m), standbys: proxmoxb
    osd: 6 osds: 6 up (since 90m), 6 in (since 91m); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 805 objects, 3.1 GiB
    usage:   6.4 GiB used, 90 GiB / 96 GiB avail
    pgs:     803/2415 objects degraded (33.251%)
             2/2415 objects misplaced (0.083%)
             128 active+undersized+degraded
             1   active+clean+remapped

  io:
    client:   20 KiB/s rd, 13 MiB/s wr, 15 op/s rd, 29 op/s wr

root@proxmoxa:~#
root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  2/6 objects misplaced (33.333%)

pool cephpool id 2
  826/2478 objects degraded (33.333%)
  client io 14 KiB/s rd, 8.4 MiB/s wr, 14 op/s rd, 15 op/s wr

root@proxmoxa:~#
root@proxmoxa:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.06
pool 2 'cephpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 39 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.41
        removed_snaps_queue [2~1]

root@proxmoxa:~#

現状のcephpoolは pg_num=128, pgp_num=128 で作成されている

autoscaleの設定を見てみる

root@proxmoxa:~# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr      452.0k                3.0        98280M  0.0000                                  1.0       1              on         False
cephpool   2234M       20480M   3.0        98280M  0.6252                                  1.0     128              on         False
root@proxmoxa:~#

How I Built a 2-Node HA Proxmox Cluster with Ceph, Podman, and a Raspberry Pi (Yes, It Works)」にやりたいことがあるっぽい

このページでは「ceph config set osd osd_default_size 2」と「ceph config set osd osd_default_min_size 1」を実行しているが、ceph config getで確認してみると、値はない模様

root@proxmoxa:~# ceph config get osd osd_default_size
Error ENOENT: unrecognized key 'osd_default_size'
root@proxmoxa:~# ceph config get osd osd_default_min_size
Error ENOENT: unrecognized key 'osd_default_min_size'
root@proxmoxa:~#

設定出来たりしないかを念のため確認してみたが、エラーとなった

root@proxmoxa:~# ceph config set osd osd_default_size 2
Error EINVAL: unrecognized config option 'osd_default_size'
root@proxmoxa:~# ceph config set osd osd_default_min_size 1
Error EINVAL: unrecognized config option 'osd_default_min_size'
root@proxmoxa:~#

osd_pool_default_sizeとosd_pool_default_min_sizeならばあるので、そちらを設定してみることにした

root@proxmoxa:~# ceph config get osd osd_pool_default_size
3
root@proxmoxa:~# ceph config get osd osd_pool_default_min_size
0
root@proxmoxa:~#
root@proxmoxa:~# ceph config set osd osd_pool_default_size 2
root@proxmoxa:~# ceph config set osd osd_pool_default_min_size 1
root@proxmoxa:~# ceph config get osd osd_pool_default_size
2
root@proxmoxa:~# ceph config get osd osd_pool_default_min_size
1
root@proxmoxa:~#

状態に変化はなさそう

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 885/2661 objects degraded (33.258%), 128 pgs degraded, 128 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 2h)
    mgr: proxmoxa(active, since 2h), standbys: proxmoxb
    osd: 6 osds: 6 up (since 2h), 6 in (since 2h); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   7.1 GiB used, 89 GiB / 96 GiB avail
    pgs:     885/2661 objects degraded (33.258%)
             2/2661 objects misplaced (0.075%)
             128 active+undersized+degraded
             1   active+clean+remapped

root@proxmoxa:~# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr      452.0k                3.0        98280M  0.0000                                  1.0       1              on         False
cephpool   2319M       20480M   3.0        98280M  0.6252                                  1.0     128              on         False
root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  2/6 objects misplaced (33.333%)

pool cephpool id 2
  885/2655 objects degraded (33.333%)
  client io 170 B/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.06
pool 2 'cephpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 39 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.41
        removed_snaps_queue [2~1]

root@proxmoxa:~#

ceph osd pool get コマンドで、各プールのsizeとmin_sizeを確認

root@proxmoxa:~# ceph osd pool get cephpool size
size: 3
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 2
root@proxmoxa:~#

設定を変更

root@proxmoxa:~# ceph osd pool set cephpool size 2
set pool 2 size to 2
root@proxmoxa:~# ceph osd pool set cephpool min_size 1
set pool 2 min_size to 1
root@proxmoxa:~# ceph osd pool get cephpool size
size: 2
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 1
root@proxmoxa:~#

状態確認すると、ceph health がHEALTH_OKになっている

root@proxmoxa:~# ceph health
HEALTH_OK
root@proxmoxa:~# ceph health detail
HEALTH_OK
root@proxmoxa:~#

他のステータスは?と確認してみると、問題無く見える

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_OK

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 2h)
    mgr: proxmoxa(active, since 2h), standbys: proxmoxb
    osd: 6 osds: 6 up (since 2h), 6 in (since 2h); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   7.1 GiB used, 89 GiB / 96 GiB avail
    pgs:     2/1776 objects misplaced (0.113%)
             128 active+clean
             1   active+clean+remapped

  io:
    recovery: 1.3 MiB/s, 0 objects/s

root@proxmoxa:~# ceph osd pool autoscale-status
POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr      452.0k                3.0        98280M  0.0000                                  1.0       1              on         False
cephpool   3479M       20480M   2.0        98280M  0.4168                                  1.0     128              on         False
root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  2/6 objects misplaced (33.333%)

pool cephpool id 2
  nothing is going on

root@proxmoxa:~# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.06
pool 2 'cephpool' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 42 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.17

root@proxmoxa:~#

GUIもHEALTH_OK

障害テスト

片側を停止してどうなるか?

PVEのクラスタ側は生きている

root@proxmoxa:~# ha-manager status
quorum OK
master proxmoxa (active, Wed Jan 21 17:43:39 2026)
lrm proxmoxa (active, Wed Jan 21 17:43:40 2026)
lrm proxmoxb (old timestamp - dead?, Wed Jan 21 17:43:08 2026)
service vm:100 (proxmoxa, started)
root@proxmoxa:~# pvecm status
Cluster information
-------------------
Name:             cephcluster
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jan 21 17:44:11 2026
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.3b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.2.64 (local)
0x00000000          1            Qdevice
root@proxmoxa:~#

しかし、cephのステータスは死んでいる

「ceph helth」コマンドを実行してみると返事が返ってこない

root@proxmoxa:~# ceph health

ダメそうなので、停止したノードを復帰

ceph osd poolの.mgrについてもsizeとmin_sizeを変更

root@proxmoxa:~# ceph osd pool ls
.mgr
cephpool
root@proxmoxa:~# ceph osd pool get .mgr size
size: 3
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 2
root@proxmoxa:~# ceph osd pool set .mgr size 2
set pool 1 size to 2
root@proxmoxa:~# ceph osd pool set .mgr min_size 1
set pool 1 min_size to 1
root@proxmoxa:~# ceph osd pool get .mgr size
size: 2
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 1
root@proxmoxa:~#

で、先のブログにあるようにmonパラメータも変更するため、現在値を確認

root@proxmoxa:~# ceph config get mon mon_osd_min_down_reporters
2
root@proxmoxa:~# ceph config get mon mon_osd_down_out_interval
600
root@proxmoxa:~# ceph config get mon mon_osd_report_timeout
900
root@proxmoxa:~#

これをそれぞれ変更

root@proxmoxa:~# ceph config set mon mon_osd_min_down_reporters 1
root@proxmoxa:~# ceph config set mon mon_osd_down_out_interval 120
root@proxmoxa:~# ceph config set mon mon_osd_report_timeout 90
root@proxmoxa:~# ceph config get mon mon_osd_min_down_reporters
1
root@proxmoxa:~# ceph config get mon mon_osd_down_out_interval
120
root@proxmoxa:~# ceph config get mon mon_osd_report_timeout
90
root@proxmoxa:~#

・・・相変わらずceph -sで応答がなくなる

cephを維持するための3番目のノードをどう作成する?

先ほどの記事の「Faking a Third Node with a Containerized MON」にコンテナとして3つめのceph monを起動させる話が書いてあった

Proxmox VEフォーラムの「3rd Ceph MON on external QDevice (Podman) – 4-node / 2-site cluster」からProxmox VE wikiの「Stretch Cluster」ではceph monではなく「tie-breaker node」を立てるとある

またStretch Clusterでは、先ほど変更したOSDのsize=4, min_size=2 として、2つのノードに2個のレプリカを保証する設定としていた。

とりあえず、OSDのsize/min_sizeを変更する

root@proxmoxa:~# ceph osd pool ls
.mgr
cephpool
root@proxmoxa:~# ceph osd pool get .mgr size
size: 2
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 1
root@proxmoxa:~# ceph osd pool set .mgr size 4
set pool 1 size to 4
root@proxmoxa:~# ceph osd pool set .mgr min_size 2
set pool 1 min_size to 2
root@proxmoxa:~# ceph osd pool get .mgr size
size: 4
root@proxmoxa:~# ceph osd pool get .mgr min_size
min_size: 2
root@proxmoxa:~# ceph osd pool get cephpool size
size: 2
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 1
root@proxmoxa:~# ceph osd pool set cephpool size 4
set pool 2 size to 4
root@proxmoxa:~# ceph osd pool set cephpool min_size 2
set pool 2 min_size to 2
root@proxmoxa:~# ceph osd pool get cephpool size
size: 4
root@proxmoxa:~# ceph osd pool get cephpool min_size
min_size: 2
root@proxmoxa:~#

この状態でceph osd pool statsを取ると先ほどまで33.333%だったものが50.0% なった

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects degraded (50.000%)

pool cephpool id 2
  1770/3540 objects degraded (50.000%)

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 6h)
    mgr: proxmoxb(active, since 6h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 6h), 6 in (since 24h)

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   7.0 GiB used, 89 GiB / 96 GiB avail
    pgs:     1774/3548 objects degraded (50.000%)
             129 active+undersized+degraded

root@proxmoxa:~#
root@proxmoxa:~# ceph health
HEALTH_WARN Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized
root@proxmoxa:~# ceph health detail
HEALTH_WARN Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 1774/3548 objects degraded (50.000%), 129 pgs degraded, 129 pgs undersized
    pg 1.0 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.0 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.1 is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.3 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,2]
    pg 2.4 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.5 is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,0]
    pg 2.6 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,3]
    pg 2.7 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,2]
    pg 2.8 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.9 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.a is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.b is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.c is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.d is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.e is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.f is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,0]
    pg 2.10 is active+undersized+degraded, acting [2,4]
    pg 2.1c is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.1e is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,5]
    pg 2.1f is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,3]
    pg 2.20 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.21 is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,4]
    pg 2.22 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,2]
    pg 2.23 is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,3]
    pg 2.24 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.25 is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,2]
    pg 2.26 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,2]
    pg 2.27 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,0]
    pg 2.28 is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,3]
    pg 2.29 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.2a is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2b is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,4]
    pg 2.2c is stuck undersized for 5m, current state active+undersized+degraded, last acting [2,5]
    pg 2.2d is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,2]
    pg 2.2e is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.2f is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.30 is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,0]
    pg 2.31 is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,5]
    pg 2.32 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,1]
    pg 2.33 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.34 is stuck undersized for 5m, current state active+undersized+degraded, last acting [5,0]
    pg 2.35 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,3]
    pg 2.36 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,4]
    pg 2.37 is stuck undersized for 5m, current state active+undersized+degraded, last acting [3,1]
    pg 2.38 is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,5]
    pg 2.39 is stuck undersized for 5m, current state active+undersized+degraded, last acting [1,5]
    pg 2.7d is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7e is stuck undersized for 5m, current state active+undersized+degraded, last acting [0,4]
    pg 2.7f is stuck undersized for 5m, current state active+undersized+degraded, last acting [4,1]
root@proxmoxa:~#

このときのceph osd treeは下記の状態

root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default
-5         0.04678      host proxmoxa
 3    ssd  0.01559          osd.3          up   1.00000  1.00000
 4    ssd  0.01559          osd.4          up   1.00000  1.00000
 5    ssd  0.01559          osd.5          up   1.00000  1.00000
-3         0.04678      host proxmoxb
 0    ssd  0.01559          osd.0          up   1.00000  1.00000
 1    ssd  0.01559          osd.1          up   1.00000  1.00000
 2    ssd  0.01559          osd.2          up   1.00000  1.00000
root@proxmoxa:~#

次にCRUSH Structureを2個作る

root@proxmoxa:~# ceph osd crush add-bucket room1 room
added bucket room1 type room to crush map
root@proxmoxa:~# ceph osd crush add-bucket room2 room
added bucket room2 type room to crush map
root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-8               0  room room2
-7               0  room room1
-1         0.09357  root default
-5         0.04678      host proxmoxa
 3    ssd  0.01559          osd.3          up   1.00000  1.00000
 4    ssd  0.01559          osd.4          up   1.00000  1.00000
 5    ssd  0.01559          osd.5          up   1.00000  1.00000
-3         0.04678      host proxmoxb
 0    ssd  0.01559          osd.0          up   1.00000  1.00000
 1    ssd  0.01559          osd.1          up   1.00000  1.00000
 2    ssd  0.01559          osd.2          up   1.00000  1.00000
root@proxmoxa:~#

で、移動?

root@proxmoxa:~# ceph osd crush move room1 root=default
moved item id -7 name 'room1' to location {root=default} in crush map
root@proxmoxa:~# ceph osd crush move room2 root=default
moved item id -8 name 'room2' to location {root=default} in crush map
root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default
-5         0.04678      host proxmoxa
 3    ssd  0.01559          osd.3          up   1.00000  1.00000
 4    ssd  0.01559          osd.4          up   1.00000  1.00000
 5    ssd  0.01559          osd.5          up   1.00000  1.00000
-3         0.04678      host proxmoxb
 0    ssd  0.01559          osd.0          up   1.00000  1.00000
 1    ssd  0.01559          osd.1          up   1.00000  1.00000
 2    ssd  0.01559          osd.2          up   1.00000  1.00000
-7               0      room room1
-8               0      room room2
root@proxmoxa:~#

次にノードをそれぞれ別のroomに移動

root@proxmoxa:~# ceph osd crush move proxmoxa room=room1
moved item id -5 name 'proxmoxa' to location {room=room1} in crush map
root@proxmoxa:~# ceph osd crush move proxmoxb room=room2
moved item id -3 name 'proxmoxb' to location {room=room2} in crush map
root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default
-7         0.04678      room room1
-5         0.04678          host proxmoxa
 3    ssd  0.01559              osd.3          up   1.00000  1.00000
 4    ssd  0.01559              osd.4          up   1.00000  1.00000
 5    ssd  0.01559              osd.5          up   1.00000  1.00000
-8         0.04678      room room2
-3         0.04678          host proxmoxb
 0    ssd  0.01559              osd.0          up   1.00000  1.00000
 1    ssd  0.01559              osd.1          up   1.00000  1.00000
 2    ssd  0.01559              osd.2          up   1.00000  1.00000
root@proxmoxa:~#

CRUSH ruleを作成

root@proxmoxa:~# ceph osd getcrushmap > crush.map.bin
25
root@proxmoxa:~# ls -l crush.map.bin
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
root@proxmoxa:~# crushtool -d crush.map.bin -o crush.map.txt
root@proxmoxa:~# ls -l crush.map*
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
-rw-r--r-- 1 root root 1779 Jan 22 16:31 crush.map.txt
root@proxmoxa:~#

crush.map.bin はバイナリファイルなので、crushtoolでテキストにしたものを作成

現状の内容は下記だった

root@proxmoxa:~# cat crush.map.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host proxmoxa {
        id -5           # do not change unnecessarily
        id -6 class ssd         # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item osd.3 weight 0.01559
        item osd.4 weight 0.01559
        item osd.5 weight 0.01559
}
room room1 {
        id -7           # do not change unnecessarily
        id -10 class ssd                # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item proxmoxa weight 0.04678
}
host proxmoxb {
        id -3           # do not change unnecessarily
        id -4 class ssd         # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 0.01559
        item osd.1 weight 0.01559
        item osd.2 weight 0.01559
}
room room2 {
        id -8           # do not change unnecessarily
        id -9 class ssd         # do not change unnecessarily
        # weight 0.04678
        alg straw2
        hash 0  # rjenkins1
        item proxmoxb weight 0.04678
}
root default {
        id -1           # do not change unnecessarily
        id -2 class ssd         # do not change unnecessarily
        # weight 0.09357
        alg straw2
        hash 0  # rjenkins1
        item room1 weight 0.04678
        item room2 weight 0.04678
}

# rules
rule replicated_rule {
        id 0
        type replicated
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map
root@proxmoxa:~#

テキストファイルの最後に replicated_stretch_rule を追加。idは、テキストを見て他にあるruleのidの次の番号を設定する

root@proxmoxa:~# cp crush.map.txt crush-new.map.txt
root@proxmoxa:~# vi crush-new.map.txt
root@proxmoxa:~# diff -u crush.map.txt crush-new.map.txt
--- crush.map.txt       2026-01-22 16:31:44.837979276 +0900
+++ crush-new.map.txt   2026-01-22 16:35:12.146622553 +0900
@@ -87,3 +87,13 @@
 }

 # end crush map
+
+rule replicated_stretch_rule {
+        id 1
+        type replicated
+        step take default
+        step choose firstn 0 type room
+        step chooseleaf firstn 2 type host
+        step emit
+}
+
root@proxmoxa:~#

作成したファイルをcephに読み込ませる

root@proxmoxa:~# ls -l
total 12
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
-rw-r--r-- 1 root root 1779 Jan 22 16:31 crush.map.txt
-rw-r--r-- 1 root root 1977 Jan 22 16:35 crush-new.map.txt
root@proxmoxa:~# crushtool -c crush-new.map.txt -o crush-new.map.bin
root@proxmoxa:~# ls -l
total 16
-rw-r--r-- 1 root root 1104 Jan 22 16:31 crush.map.bin
-rw-r--r-- 1 root root 1779 Jan 22 16:31 crush.map.txt
-rw-r--r-- 1 root root 1195 Jan 22 16:36 crush-new.map.bin
-rw-r--r-- 1 root root 1977 Jan 22 16:35 crush-new.map.txt
root@proxmoxa:~# ceph osd setcrushmap -i crush-new.map.bin
26
root@proxmoxa:~#

そうするとcrush ruleが追加される

root@proxmoxa:~# ceph osd crush rule ls
replicated_rule
replicated_stretch_rule
root@proxmoxa:~#

別にosd treeは変わってない模様

root@proxmoxa:~# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         0.09354  root default
-7         0.04677      room room1
-5         0.04677          host proxmoxa
 3    ssd  0.01558              osd.3          up   1.00000  1.00000
 4    ssd  0.01558              osd.4          up   1.00000  1.00000
 5    ssd  0.01558              osd.5          up   1.00000  1.00000
-8         0.04677      room room2
-3         0.04677          host proxmoxb
 0    ssd  0.01558              osd.0          up   1.00000  1.00000
 1    ssd  0.01558              osd.1          up   1.00000  1.00000
 2    ssd  0.01558              osd.2          up   1.00000  1.00000
root@proxmoxa:~#
root@proxmoxa:~# ceph health
HEALTH_WARN Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 7h)
    mgr: proxmoxb(active, since 7h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 7h), 6 in (since 25h); 97 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   10 GiB used, 86 GiB / 96 GiB avail
    pgs:     448/3548 objects degraded (12.627%)
             1276/3548 objects misplaced (35.964%)
             59 active+undersized+remapped
             34 active+clean+remapped
             32 active+undersized+degraded
             4  active+clean

root@proxmoxa:~#

移動していかない?

以前と同じようにpgp_numを128から32に変えてみる

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  448/3540 objects degraded (12.655%)
  1272/3540 objects misplaced (35.932%)

root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~# ceph osd pool set cephpool pgp_num 32
set pool 2 pgp_num to 32
root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~#

かわっていかない・・・

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  448/3540 objects degraded (12.655%)
  1272/3540 objects misplaced (35.932%)
  client io 170 B/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized
            1 pools have pg_num > pgp_num

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 7h)
    mgr: proxmoxb(active, since 7h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 7h), 6 in (since 25h); 97 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   10 GiB used, 86 GiB / 96 GiB avail
    pgs:     448/3548 objects degraded (12.627%)
             1276/3548 objects misplaced (35.964%)
             59 active+undersized+remapped
             34 active+clean+remapped
             32 active+undersized+degraded
             4  active+clean

  io:
    client:   170 B/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~#

「1 pools have pg_num > pgp_num」とでているなら、pg_numもかえてみるか?

root@proxmoxa:~# ceph osd pool set cephpool pg_num 32
set pool 2 pg_num to 32
root@proxmoxa:~#
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 448/3548 objects degraded (12.627%), 32 pgs degraded, 91 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 7h)
    mgr: proxmoxb(active, since 7h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 7h), 6 in (since 25h); 97 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   10 GiB used, 86 GiB / 96 GiB avail
    pgs:     448/3548 objects degraded (12.627%)
             1276/3548 objects misplaced (35.964%)
             59 active+undersized+remapped
             34 active+clean+remapped
             32 active+undersized+degraded
             4  active+clean

root@proxmoxa:~#

しばらく待ったものの変化はない

crush ruleが適用されているのか?

6.3. CRUSH ルールが作成され、プールが正しい CRUSH ルールに設定されていることの確認

現状のルールのrule idを確認

root@proxmoxa:~# ceph osd crush rule dump | grep -E "rule_(id|name)"
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "rule_id": 1,
        "rule_name": "replicated_stretch_rule",
root@proxmoxa:~#

実際のpoolに設定されているルールのIDを確認

root@proxmoxa:~# ceph osd dump|grep cephpool
pool 2 'cephpool' replicated size 4 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 133 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.22
root@proxmoxa:~#

「crush_rule 0」とあるので、変更されてないっぽい

既存poolにcrush ruleを適用する方法をRedHatドキュメントから

root@proxmoxa:~# ceph osd pool get cephpool crush_rule
crush_rule: replicated_rule
root@proxmoxa:~# ceph osd pool set cephpool crush_rule replicated_stretch_rule
set pool 2 crush_rule to replicated_stretch_rule
root@proxmoxa:~# ceph osd pool get cephpool crush_rule
crush_rule: replicated_stretch_rule
root@proxmoxa:~# ceph osd dump|grep cephpool
pool 2 'cephpool' replicated size 4 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 134 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_bytes 21474836480 application rbd read_balance_score 1.22
root@proxmoxa:~#

変更できた

うーん・・・・?

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  194/3540 objects degraded (5.480%)
  1562/3540 objects misplaced (44.124%)

root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 194/3548 objects degraded (5.468%), 14 pgs degraded, 83 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 8h)
    mgr: proxmoxb(active, since 8h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 8h), 6 in (since 26h); 115 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   11 GiB used, 85 GiB / 96 GiB avail
    pgs:     194/3548 objects degraded (5.468%)
             1566/3548 objects misplaced (44.138%)
             69 active+undersized+remapped
             45 active+clean+remapped
             14 active+undersized+degraded
             1  active+clean

root@proxmoxa:~# ceph osd pool stats
pool .mgr id 1
  4/8 objects misplaced (50.000%)

pool cephpool id 2
  194/3540 objects degraded (5.480%)
  1562/3540 objects misplaced (44.124%)
  client io 1.4 KiB/s wr, 0 op/s rd, 0 op/s wr

root@proxmoxa:~#

手順: PG カウントの増加」にpg_numとpgp_numをかえる、という話があって、pgp_numを4にしてたので実行してみた

root@proxmoxa:~# ceph osd pool get cephpool pg_num
pg_num: 128
root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~# ceph osd pool set cephpool pgp_num 4
set pool 2 pgp_num to 4
root@proxmoxa:~# ceph osd pool get cephpool pgp_num
pgp_num: 128
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 194/3548 objects degraded (5.468%), 14 pgs degraded, 83 pgs undersized
            1 pools have pg_num > pgp_num

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 9h)
    mgr: proxmoxb(active, since 9h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 9h), 6 in (since 27h); 115 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   11 GiB used, 85 GiB / 96 GiB avail
    pgs:     194/3548 objects degraded (5.468%)
             1566/3548 objects misplaced (44.138%)
             69 active+undersized+remapped
             45 active+clean+remapped
             14 active+undersized+degraded
             1  active+clean

root@proxmoxa:~#

「1 pools have pg_num > pgp_num」という出力がでるようになってしまった

じゃあ、pg_num も4にしてみる

root@proxmoxa:~# ceph osd pool set cephpool pg_num 4
set pool 2 pg_num to 4
root@proxmoxa:~# ceph osd pool get cephpool pg_num
pg_num: 128
root@proxmoxa:~# ceph -s
  cluster:
    id:     26b59237-5bed-45fe-906e-aa3b13033b86
    health: HEALTH_WARN
            Degraded data redundancy: 194/3548 objects degraded (5.468%), 14 pgs degraded, 83 pgs undersized

  services:
    mon: 2 daemons, quorum proxmoxa,proxmoxb (age 9h)
    mgr: proxmoxb(active, since 9h), standbys: proxmoxa
    osd: 6 osds: 6 up (since 9h), 6 in (since 27h); 115 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 887 objects, 3.4 GiB
    usage:   11 GiB used, 85 GiB / 96 GiB avail
    pgs:     194/3548 objects degraded (5.468%)
             1566/3548 objects misplaced (44.138%)
             69 active+undersized+remapped
             45 active+clean+remapped
             14 active+undersized+degraded
             1  active+clean

root@proxmoxa:~#

関係なさそう