CentOS7からOracle Linux9へWebサーバを置き換えたメモ 2024年4月


Solaris 2.5.1時代に原型を作ったperl CGIも存在してるWebサーバは時代を経てCentOS4→CentOS7→Oracle Autonomous Linux 7(OCI上)と移転しつつ運用していた。ただ、それもいい加減置き換えるかとOracle Linux9(OCI上)へ移行した時のメモ書き

個人ユーザディレクトリに置いたファイルのWeb公開

各ユーザのホームディレクトリにwebとかpublic_htmlとかのディレクトリを作ってファイルを置いたけど見れない場合

/var/log/httpd/access_log での出力例

xxx.xxx.xxx.xxx - - [25/Apr/2024:13:24:45 +0900] "GET / HTTP/1.1" 403 3539 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

/var/log/httpd/error_log での出力例

[Thu Apr 25 13:24:45.157549 2024] [core:error] [pid 6267:tid 6455] (13)Permission denied: [client 118.238.215.174:55006] AH00035: access to /index.html denied (filesystem path '/home/todoroki/web/index.html') because search permissions are missing on a component of the path

/var/log/audit/audit.logでの出力内容

time->Thu Apr 25 13:24:45 2024
type=PROCTITLE msg=audit(1714019085.156:779): proctitle=2F7573722F7362696E2F6874747064002D44464F524547524F554E44
type=SYSCALL msg=audit(1714019085.156:779): arch=c000003e syscall=262 success=no exit=-13 a0=ffffff9c a1=7fe6a4017568 a2=7fe6b6ffc7c0 a3=100 items=0 ppid=6263 pid=6267 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="httpd" exe="/usr/sbin/httpd" subj=system_u:system_r:httpd_t:s0 key=(null)
type=AVC msg=audit(1714019085.156:779): avc:  denied  { getattr } for  pid=6267 comm="httpd" path="/home/osakanataro/web/index.html" dev="dm-0" ino=17989882 scontext=system_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:httpd_user_content_t:s0 tclass=file permissive=0

この時の対処としては該当ディレクトリに対してSELinuxのhttpd_sys_rw_content_tラベルを書いてあげる、というものとした

# chcon -t httpd_sys_rw_content_t -R /home/osakanataro/web/
#

SSLUseStapling onだとうまく動かない?

Oracle Cloud上にサーバを立てて、Mozilla SSL Configuration Generator で作った設定ファイルを/etc/httpd/conf.d/ssl-mozilla.conf で設定したところ、/var/log/httpd/error_log に下記のような出力があり、うまく動かなかった。

[Thu Apr 25 11:53:34.401905 2024] [core:notice] [pid 4789:tid 4789] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Thu Apr 25 11:53:34.406356 2024] [suexec:notice] [pid 4789:tid 4789] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Apr 25 11:53:34.415259 2024] [ssl:warn] [pid 4789:tid 4789] AH01909: xxxxxxx.subnet.vcn.oraclevcn.com:443:0 server certificate does NOT include an ID which matches the server name
[Thu Apr 25 11:53:34.415362 2024] [ssl:error] [pid 4789:tid 4789] AH02217: ssl_stapling_init_cert: can't retrieve issuer certificate! [subje
ct: CN=ドメイン名 / issuer: CN=R3,O=Let's Encrypt,C=US / serial:040D1B85397F73xxxxxxxxxxxx / notbefore: Apr 25 01:32:39 2024
 GMT / notafter: Jul 24 01:32:38 2024 GMT]
[Thu Apr 25 11:53:34.415367 2024] [ssl:error] [pid 4789:tid 4789] AH02604: Unable to configure certificate xxxxxxx.subnet.vcn.oracl
evcn.com:443:0 for stapling
[Thu Apr 25 11:53:34.415875 2024] [ssl:emerg] [pid 4789:tid 4789] AH02311: Fatal error initialising mod_ssl, exiting. See /var/log/httpd/pbm
.osakana.net-error_log for more information
AH00016: Configuration Failed

すぐに解決できそうにないなぁ、とエラー内に「ssl_stapling_init_cert」とあるので”SSLUseStapling on”設定が問題に違いない、と、該当設定をコメントアウトしたところ、とりあえず動くようになった。

ただ、ひとしきり設定調整したあと、改めてSSLUseStapling on設定を入れてみたところ、今度は動作した・・・なぜ?

SSLCertificateFileに指定するファイルは何が適切か?

前述の Mozilla SSL Configuration Generator で出力した設定では “curl https://ssl-config.mozilla.org/ffdhe2048.txt” でダウンロードしたファイルを保存して SSLCertificateFile で指定しろ、とある。

でも、SSLCertificateFile って、SSL証明書で発行したやつを指定するんじゃないの?と思って調べると Apache Module mod_sslマニュアルの”SSLCertificateFile Directive”に書いてあった。

特に「DH parameter interoperability with primes > 1024 bit」の時の取り扱いとして、FAQ「Why do I get handshake failures with Java-based clients when using a certificate with more than 1024 bits?」にもあるような形で「DH PARAMETERS」としての指定が認められているようだった。

なるほど

s3fsでエラーが出てた

s3fs-fuse をインストールして /etc/passwd-s3fs に設定書いて/etc/fstabで自動マウントを設定してみたところエラーが・・・

# mount -a
s3fs: There is no enough disk space for used as cache(or temporary) directory by s3fs. Requires 3061.600 MB, already has 2580.949 MB.
# df -h
Filesystem                  Size  Used Avail Use% Mounted on
devtmpfs                    4.0M     0  4.0M   0% /dev
tmpfs                       475M     0  475M   0% /dev/shm
tmpfs                       190M  5.4M  185M   3% /run
/dev/mapper/ocivolume-root   30G   27G  2.6G  92% /
/dev/mapper/ocivolume-oled   15G  329M   15G   3% /var/oled
/dev/sda2                   2.0G  342M  1.7G  18% /boot
/dev/sda1                   100M  6.2M   94M   7% /boot/efi
tmpfs                        95M  4.0K   95M   1% /run/user/982
tmpfs                        95M  4.0K   95M   1% /run/user/1000
#

/var/oled を使うように設定できないかな。もしくはキャッシュ無効にするか?

FAQを見ると「-o use_cache=/tmp」でキャッシュディレクトリを指定できる。「-o use_cache=””」でキャッシュ無効化となるようだ.。

が、「use_cache=””」「use_cache=disable」「use_cache」を試してみたが、相変わらず容量警告となる。

「use_cache=/var/oled/s3fs/」とするとキャッシュディレクトリ指定はきちんと反映されていたので、とりあえずこちらをキャッシュとして設定することで回避とした。

perl CGIを動かす

[Thu Apr 25 19:06:57.798078 2024] [cgid:error] [pid 2555:tid 2693] [client 118.238.215.174:52675] AH01241: error spawning CGI child: exec of '/home/user/web/chat/comchatq.cgi' failed (Permission denied): /home/user/web/chat/comchatq.cgi
[Thu Apr 25 19:06:57.798743 2024] [cgid:error] [pid 2555:tid 2693] [client 118.238.215.174:52675] End of script output before headers: comchatq.cgi

以下を実行

chcon -t httpd_sys_script_exec_t *.cgi

エラーは下記に変わる

[Thu Apr 25 19:10:21.631944 2024] [cgid:error] [pid 4785:tid 4836] [client 118.238.215.174:53013] Can't open perl script "/home/user/web/chat/comchatq.cgi": Permission denied: /home/user/web/chat/comchatq.cgi
[Thu Apr 25 19:10:21.631989 2024] [cgid:error] [pid 4785:tid 4836] [client 118.238.215.174:53013] End of script output before headers: comchatq.cgi

/var/log/audit/audit.logの出力

type=SYSCALL msg=audit(1714040034.887:220): arch=c000003e syscall=257 success=no exit=-13 a0=ffffff9c a1=55b4c3978cc0 a2=80000 a3=0 items=0 ppid=2551 pid=7775 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="comchatq.cgi" exe="/usr/bin/perl" subj=system_u:system_r:httpd_sys_script_t:s0 key=(null)ARCH=x86_64 SYSCALL=openat AUID="unset" UID="apache" GID="apache" EUID="apache" SUID="apache" FSUID="apache" EGID="apache" SGID="apache" FSGID="apache"

type=AVC msg=audit(1714044210.729:383): avc:  denied  { search } for  pid=9361 comm="comchatq.cgi" name="ユーザ名" dev="dm-0" ino=16778313 scontext=system_u:system_r:httpd_sys_script_t:s0 tcontext=unconfined_u:object_r:user_home_dir_t:s0 tclass=dir permissive=0
type=SYSCALL msg=audit(1714044210.729:383): arch=c000003e syscall=257 success=no exit=-13 a0=ffffff9c a1=5601baca2c00 a2=80000 a3=0 items=0 ppid=2551 pid=9361 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="comchatq.cgi" exe="/usr/bin/perl" subj=system_u:system_r:httpd_sys_script_t:s0 key=(null)ARCH=x86_64 SYSCALL=openat AUID="unset" UID="apache" GID="apache" EUID="apache" SUID="apache" FSUID="apache" EGID="apache" SGID="apache" FSGID="apache"

このときの「ausearch -m AVC|grep denied|grep comchat」の結果

type=AVC msg=audit(1714044410.481:392): avc:  denied  { search } for  pid=9803 comm="comchatq.cgi" name="ユーザ名" dev="dm-0" ino=16778313 scontext=system_u:system_r:httpd_sys_script_t:s0 tcontext=unconfined_u:object_r:user_home_dir_t:s0 tclass=dir permissive=0

めんどくさくなってきたので「setenforce Permissive」で一時的にSELinux緩和

# getenforce
Enforcing
# setenforce Permissive
# getenforce
Permissive
#

そしてモジュール化するため「ausearch -m AVC|grep denied|grep cgi | audit2allow -M comchatq」を実行

# ausearch -m AVC|grep denied|grep cgi | audit2allow -M comchatq
******************** IMPORTANT ***********************
To make this policy package active, execute:

semodule -i comchatq.pp

# ls -l comchatq*
-rw-r--r--. 1 root root 1329 Apr 26 15:23 comchatq.pp
-rw-r--r--. 1 root root  586 Apr 26 15:23 comchatq.te
# cat comchatq.te

module comchatq 1.0;

require {
        type user_home_dir_t;
        type httpd_sys_rw_content_t;
        type httpd_t;
        type httpd_sys_script_t;
        class file { execute execute_no_trans };
        class dir search;
}

#============= httpd_sys_script_t ==============

#!!!! This avc can be allowed using one of the these booleans:
#     httpd_enable_homedirs, httpd_read_user_content
allow httpd_sys_script_t user_home_dir_t:dir search;

#============= httpd_t ==============

#!!!! This avc can be allowed using the boolean 'httpd_unified'
allow httpd_t httpd_sys_rw_content_t:file { execute execute_no_trans };
#

そして、SELinuxモジュールの読み込み「semodule -i comchatq.pp」を実行(前後でsemodule -l|grep chatを実行して、読み込み済みSELinuxモジュールの出力の変化を確認)

# semodule -l|grep chat
# semodule -i|grep chat
# semodule -l|grep chat
comchatq
#

で、「setenforce Enforcing」を実行して元に戻して、念のため再起動もしておく

# setenforce Enforcing
# getenforce
Enforcing
#

jcode.plを使うperl CGIを動かせない?

自分で作ったCGIはjcode.pmを使うように仕様変更してたけど、イベントをやるにあたってほかから持ってきたCGIにはjcode.plを使ってるものがあった。

そんなjcode.plを使ってるperl CGIをperl v5.32.1で動かそうとしたら、以下のエラー

$ ./joyful.cgi
Can't use 'defined(%hash)' (Maybe you should just omit the defined()?) at ./jcode.pl line 684.
Compilation failed in require at ./joyful.cgi line 44.
$

たぶん昔ながらの外部ファイル読み込みが廃止されたんだろうなぁ、と調べてみると「Perl Hackers Hub 第46回 Perl 5.26で変わること(1)」に記載があった。

が・・・ここだと「require “./jcode.pl”;」と指定すれば大丈夫、とあるが、今回エラー出てるCGIはもともとその指定となっていた。

よく読むとdefinedがhashに使えなくなった、というjcode.pl内部の作りの問題を指摘されていた。

で・・・探すとそこらへんが修正されているjacode.pl というjcode.plを置き換えられるように作られたものがあったので、それを使って対処した。

$ curl -O https://cpan.metacpan.org/authors/id/I/IN/INA/Jacode/Jacode-2.13.4.31.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5408k  100 5408k    0     0  5803k      0 --:--:-- --:--:-- --:--:-- 5797k
$ tar xfz Jacode-2.13.4.31.tar.gz
$ cd Jacode-2.13.4.31/lib
$ ls -l
total 220
-rwxr-xr-x. 1 todoroki todoroki 217584 Mar 21  2023 jacode.pl
-rw-r--r--. 1 todoroki todoroki   2432 Mar 21  2023 Jacode.pm
$

で、このjacode.plをjcode.plにファイル名変更して置き換えたところ、特に問題なく動作した。

(SELinuxに関する問題はおそらく↑のSELinuxモジュールで一緒に対処できているっぽい)

lv viewerがない

Solaris時代からless/moreだと ShiftJIS/EUC-JP/UTF-8 の自動変換をしてくれないけど、 lv というコマンドならできる、というのでずっと使ってきた。(なぜlvを知ったかというと、 MS-DOS時代にアマチュア無線のBBS(RBBS)ソフト dNet にお世話になってたから。作者の人に浦和の宇宙科学館であったりしてた)

いつのまにかFedora EPELに収録されていて使いやすくなってたんだけど、EPEL8から収録されなくなった。

Fedora側では現在も続いているので、メンテナーがいないからEPEL8での提供が終わったのかな、といった感じ・・・

# curl -O https://kojipkgs.fedoraproject.org//packages/lv/4.51/52.fc40/src/lv-4.51-52.fc40.src.rpm
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  629k  100  629k    0     0  46005      0  0:00:14  0:00:14 --:--:-- 36245
# rpmbuild --rebuild lv-4.51-52.fc40.src.rpm
lv-4.51-52.fc40.src.rpm をインストール中です。
setting SOURCE_DATE_EPOCH=1706140800
エラー: ビルド依存性の失敗:
        ncurses-devel は lv-4.51-52.el9.x86_64 に必要とされています
#

というわけで「dnf install ncurses-devel」を実行した後に、再度「rpmbuild –rebuild lv-4.51-52.fc40.src.rpm」を実行して ~/rpmbuild/RPMS/x86_64/lv-4.51-52.el9.x86_64.rpm を出力

これをインストールして対処

proxmox 8.1.4で適当にcephストレージ作ったらWARNINGが出た件の対処


proxmox 8.1.4を3サーバで作って、cpehストレージ作ってみるかー、と適当に設定した。

基本は「Deploy Hyper-Converged Ceph Cluster」を見ながらやったんだけど、CephFSを作成するときに、手順だと「pveceph fs create –pg_num 128 –add-storage」と書いてあったんだけど、「pveceph fs create」だけで実行したらどうなるんだろ?と思ってやってみたところ、警告が出た

注:いろいろ対処方法を検討したところ、指定しない場合のデフォルト値も128だった。

HEALTH_WARN: 1 pools have too many placement groups
Pool storagepool has 128 placement groups, should have 32

調べてみるとproxmoxのフォーラムに「CEPH pools have too many placement groups」という若干古め(2020年のpoxmox 6.3時代)のものが見つかった。

「pveceph pool ls」で現在の設定を確認

root@zstack137:~# ceph -v
ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
root@zstack137:~# pveversion
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-8-pve)
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.08950029648258e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x 4.41906962578287e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x    128 x             x             32 x warn              x                          x                           x replicated_rule x   0.0184257291257381 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~#

「ceph osd pool autoscale-status」

root@zstack137:~# ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr             452.0k                3.0        449.9G  0.0000                                  1.0       1              on         False
cephfs_data          0                 3.0        449.9G  0.0000                                  1.0      32              on         False
cephfs_metadata  66203                 3.0        449.9G  0.0000                                  4.0      32              on         False
storagepool       2681M                3.0        449.9G  0.0175                                  1.0     128              warn       False
root@zstack137:~#

そういえば、むかし、cephをテスト構築した時もなんかあったな、と思い出して確認してみると2018年に「CephのOSD毎のPlacement Groupの数を確認する」というメモを残していた。

「ceph health」を実行してみると状況は違うようだった。

root@zstack137:~# ceph health
HEALTH_WARN 1 pools have too many placement groups

root@zstack137:~# ceph health detail
HEALTH_WARN 1 pools have too many placement groups
[WRN] POOL_TOO_MANY_PGS: 1 pools have too many placement groups
    Pool storagepool has 128 placement groups, should have 32
root@zstack137:~#
root@zstack137:~# ceph -s
  cluster:
    id:     9e085d6a-77f3-41f1-8f6d-71fadc9c011b
    health: HEALTH_WARN
            1 pools have too many placement groups

  services:
    mon: 3 daemons, quorum zstack136,zstack135,zstack137 (age 3h)
    mgr: zstack136(active, since 3h), standbys: zstack135
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 3h), 9 in (since 3d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 716 objects, 2.7 GiB
    usage:   8.3 GiB used, 442 GiB / 450 GiB avail
    pgs:     193 active+clean

root@zstack137:~#

とはいえ、「ceph pg dump」の出力結果を整形して表示する下記コマンドが実行できるか確認してみる。

ceph pg dump | awk '
BEGIN { IGNORECASE = 1 }
 /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
 /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
 up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
 for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
}
END {
 printf("\n");
 printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
 for (i in poollist) printf("--------"); printf("----------------\n");
 for (i in osdlist) { printf("osd.%i\t", i); sum=0;
   for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
 for (i in poollist) printf("--------"); printf("----------------\n");
 printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]); printf("|\n");
}'

無事実行できた。

root@zstack137:~# ceph pg dump | awk '
BEGIN { IGNORECASE = 1 }
 /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
 /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
 up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
 for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
}
END {
 printf("\n");
 printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
 for (i in poollist) printf("--------"); printf("----------------\n");
 for (i in osdlist) { printf("osd.%i\t", i); sum=0;
   for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
 for (i in poollist) printf("--------"); printf("----------------\n");
 printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]); printf("|\n");
}'
dumped all

pool :  3       2       1       4       | SUM
------------------------------------------------
osd.3   4       5       1       13      | 23
osd.8   4       6       0       12      | 22
osd.6   2       4       0       15      | 21
osd.5   6       4       0       16      | 26
osd.2   3       3       0       15      | 21
osd.1   4       3       0       10      | 17
osd.4   1       1       0       16      | 18
osd.0   5       2       0       10      | 17
osd.7   3       4       0       21      | 28
------------------------------------------------
SUM :   32      32      1       128     |
root@zstack137:~#

poolによって差がありすぎている?

中国語のページで「ceph使用问题积累」というところがあって「HEALTH_WARN:pools have too many placement groups」と「HEALTH_WARN: mons are allowing insecure global_id reclaim」についての対処方法が載っている。

前者については↑で出てきたproxmoxフォーラム記事を参照元として「ceph mgr module disable pg_autoscaler」を実行してauto scale機能を無効化する、とある

後者については「ceph config set mon auth_allow_insecure_global_id_reclaim false」となっていた。

module設定変える前に「ceph mgr module ls」で状態確認

root@zstack137:~# ceph mgr module ls
MODULE
balancer           on (always on)
crash              on (always on)
devicehealth       on (always on)
orchestrator       on (always on)
pg_autoscaler      on (always on)
progress           on (always on)
rbd_support        on (always on)
status             on (always on)
telemetry          on (always on)
volumes            on (always on)
iostat             on
nfs                on
restful            on
alerts             -
influx             -
insights           -
localpool          -
mirroring          -
osd_perf_query     -
osd_support        -
prometheus         -
selftest           -
snap_schedule      -
stats              -
telegraf           -
test_orchestrator  -
zabbix             -
root@zstack137:~#

SUSEのページにあるSUSE Enterprise Storage 7 DocumentationのAdministration and Operations Guide「12 Determine the cluster state」を見るといろいろな状態確認コマンドがあった。

root@zstack137:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85
TOTAL  450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85

--- POOLS ---
POOL             ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr              1    1  449 KiB        2  1.3 MiB      0    140 GiB
cephfs_data       2   32      0 B        0      0 B      0    140 GiB
cephfs_metadata   3   32   35 KiB       22  194 KiB      0    140 GiB
storagepool       4  128  2.6 GiB      692  7.9 GiB   1.84    140 GiB
root@zstack137:~#  ceph df detail
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85
TOTAL  450 GiB  442 GiB  8.3 GiB   8.3 GiB       1.85

--- POOLS ---
POOL             ID  PGS   STORED   (DATA)   (OMAP)  OBJECTS     USED   (DATA)   (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
.mgr              1    1  449 KiB  449 KiB      0 B        2  1.3 MiB  1.3 MiB      0 B      0    140 GiB            N/A          N/A    N/A         0 B          0 B
cephfs_data       2   32      0 B      0 B      0 B        0      0 B      0 B      0 B      0    140 GiB            N/A          N/A    N/A         0 B          0 B
cephfs_metadata   3   32   35 KiB   18 KiB   17 KiB       22  194 KiB  144 KiB   50 KiB      0    140 GiB            N/A          N/A    N/A         0 B          0 B
storagepool       4  128  2.6 GiB  2.6 GiB  3.0 KiB      692  7.9 GiB  7.9 GiB  9.1 KiB   1.84    140 GiB            N/A          N/A    N/A         0 B          0 B
root@zstack137:~# 

TOO_MANY_PGSの時の対処としていかが書かれている

TOO_MANY_PGS
The number of PGs in use is above the configurable threshold of mon_pg_warn_max_per_osd PGs per OSD. This can lead to higher memory usage for OSD daemons, slower peering after cluster state changes (for example OSD restarts, additions, or removals), and higher load on the Ceph Managers and Ceph Monitors.

While the pg_num value for existing pools cannot be reduced, the pgp_num value can. This effectively co-locates some PGs on the same sets of OSDs, mitigating some of the negative impacts described above. The pgp_num value can be adjusted with:

proxmox「Deploy Hyper-Converged Ceph Cluster」のあたりをみると PG Autoscale Modeはwarnで設定されるのが標準であるようだ。

cephのautomated scalingを見ると「ceph config set global mon_target_pg_per_osd 100」で値を設定することが書かれているが、現在値の確認方法が書いてない。

ceph config get <who> <key>というのはわかったのだが、whoの部分がなんなのかがわからなかった。(globalではなかった)

「ceph config dump」を実行したところ、いま標準値から変更されているところであろう設定が出てきて、whoに該当するものとしてmonがあった。であればmon_target_pg_per_osdのwhoはmonだろうと試すと現在値らしきものが確認できた。

root@zstack137:~# ceph config dump
WHO  MASK  LEVEL     OPTION                                 VALUE  RO
mon        advanced  auth_allow_insecure_global_id_reclaim  false
root@zstack137:~# ceph config get mon  mon_target_pg_per_osd
100
root@zstack137:~#

とりあえず、「ceph mgr module disable pg_autoscaler」を実行してみたのだが、変更不可だった

root@zstack137:~# ceph mgr module disable pg_autoscaler
Error EINVAL: module 'pg_autoscaler' cannot be disabled (always-on)
root@zstack137:~#

じゃあ、「ceph osd pool set storagepool pgp_num 32」を実行してpgp_numを128から32に変更してみる

root@zstack137:~# ceph osd pool stats
pool .mgr id 1
  nothing is going on

pool cephfs_data id 2
  nothing is going on

pool cephfs_metadata id 3
  nothing is going on

pool storagepool id 4
  nothing is going on

root@zstack137:~# ceph osd pool get storagepool pgp_num
pgp_num: 128
root@zstack137:~# ceph osd pool set storagepool pgp_num 32
set pool 4 pgp_num to 32
root@zstack137:~# ceph osd pool get storagepool pgp_num
pgp_num: 125
root@zstack137:~# ceph osd pool get storagepool pgp_num
pgp_num: 119
root@zstack137:~#

徐々に変更されていく模様

root@zstack137:~# ceph -s
  cluster:
    id:     9e085d6a-77f3-41f1-8f6d-71fadc9c011b
    health: HEALTH_WARN
            Reduced data availability: 1 pg peering
            1 pools have too many placement groups
            1 pools have pg_num > pgp_num

  services:
    mon: 3 daemons, quorum zstack136,zstack135,zstack137 (age 5h)
    mgr: zstack136(active, since 5h), standbys: zstack135
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 5h), 9 in (since 3d); 2 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 716 objects, 2.7 GiB
    usage:   8.4 GiB used, 442 GiB / 450 GiB avail
    pgs:     0.518% pgs not active
             16/2148 objects misplaced (0.745%)
             190 active+clean
             2   active+recovering
             1   remapped+peering

  io:
    recovery: 2.0 MiB/s, 0 objects/s

root@zstack137:~# ceph health
HEALTH_WARN Reduced data availability: 1 pg peering; 1 pools have too many placement groups; 1 pools have pg_num > pgp_num
root@zstack137:~# ceph health detail
HEALTH_WARN 1 pools have too many placement groups; 1 pools have pg_num > pgp_num
[WRN] POOL_TOO_MANY_PGS: 1 pools have too many placement groups
    Pool storagepool has 128 placement groups, should have 32
[WRN] SMALLER_PGP_NUM: 1 pools have pg_num > pgp_num
    pool storagepool pg_num 128 > pgp_num 32
root@zstack137:~#

ある程度時間が経過したあと

root@zstack137:~# ceph health detail
HEALTH_WARN 1 pools have too many placement groups; 1 pools have pg_num > pgp_num
[WRN] POOL_TOO_MANY_PGS: 1 pools have too many placement groups
    Pool storagepool has 128 placement groups, should have 32
[WRN] SMALLER_PGP_NUM: 1 pools have pg_num > pgp_num
    pool storagepool pg_num 128 > pgp_num 32
root@zstack137:~# ceph pg dump | awk '
BEGIN { IGNORECASE = 1 }
 /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
 /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
 up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
 for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
}
END {
 printf("\n");
 printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
 for (i in poollist) printf("--------"); printf("----------------\n");
 for (i in osdlist) { printf("osd.%i\t", i); sum=0;
   for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
 for (i in poollist) printf("--------"); printf("----------------\n");
 printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]); printf("|\n");
}'
dumped all

pool :  3       2       1       4       | SUM
------------------------------------------------
osd.3   4       5       1       15      | 25
osd.8   4       6       0       16      | 26
osd.6   2       4       0       16      | 22
osd.5   6       4       0       4       | 14
osd.2   3       3       0       11      | 17
osd.1   4       3       0       13      | 20
osd.4   1       1       0       17      | 19
osd.0   5       2       0       20      | 27
osd.7   3       4       0       16      | 23
------------------------------------------------
SUM :   32      32      1       128     |
root@zstack137:~# ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr             452.0k                3.0        449.9G  0.0000                                  1.0       1              on         False
cephfs_data          0                 3.0        449.9G  0.0000                                  1.0      32              on         False
cephfs_metadata  66203                 3.0        449.9G  0.0000                                  4.0      32              on         False
storagepool       2681M                3.0        449.9G  0.0175                                  1.0     128              warn       False
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.09735719383752e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x 4.43030785390874e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x    128 x             x             32 x warn              x                          x                           x replicated_rule x    0.018471721559763 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~#

pg_numを減らせる?

root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 128
root@zstack137:~# ceph osd pool set storagepool pg_num 32
set pool 4 pg_num to 32
root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 128
root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 124
root@zstack137:~#

徐々に減ってる

ステータスはHEALTH_OLに変わった

root@zstack137:~# ceph osd pool get storagepool pg_num
pg_num: 119
root@zstack137:~# ceph health detail
HEALTH_OK
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.10063592223742e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x 4.43499772018185e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x    117 x             x             32 x warn              x                          x                           x replicated_rule x   0.0184909123927355 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~#

「ceph osd pool autoscale-status」の方のPG_NUMは即反映

root@zstack137:~# ceph osd pool autoscale-status
POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr             452.0k                3.0        449.9G  0.0000                                  1.0       1              on         False
cephfs_data          0                 3.0        449.9G  0.0000                                  1.0      32              on         False
cephfs_metadata  66203                 3.0        449.9G  0.0000                                  4.0      32              on         False
storagepool       2705M                3.0        449.9G  0.0176                                  1.0      32              warn       False
root@zstack137:~#

しばらく実行したらHEALTH_WARNになったときもあったが、比較的すぐにHEALTH_OKに戻ったりした。

root@zstack137:~# ceph health detail
HEALTH_WARN Reduced data availability: 2 pgs inactive, 2 pgs peering
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs peering
    pg 4.22 is stuck peering for 2d, current state peering, last acting [6,5,2]
    pg 4.62 is stuck peering for 6h, current state peering, last acting [6,5,2]
root@zstack137:~#

しばらく時間がたって変更が終わったあとに状態をとってみた

root@zstack137:~# ceph health detail
HEALTH_OK
root@zstack137:~# pveceph pool ls
lqqqqqqqqqqqqqqqqqwqqqqqqwqqqqqqqqqqwqqqqqqqqwqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqqqqqqqqqqqwqqqqqqqqqqqqk
x Name            x Size x Min Size x PG Num x min. PG Num x Optimal PG Num x PG Autoscale Mode x PG Autoscale Target Size x PG Autoscale Target Ratio x Crush Rule Name x               %-Used x       Used x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x .mgr            x    3 x        2 x      1 x           1 x              1 x on                x                          x                           x replicated_rule x 3.13595910483855e-06 x    1388544 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_data     x    3 x        2 x     32 x             x             32 x on                x                          x                           x replicated_rule x                    0 x          0 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x cephfs_metadata x    3 x        2 x     32 x          16 x             16 x on                x                          x                           x replicated_rule x  4.4855224246021e-07 x     198610 x
tqqqqqqqqqqqqqqqqqnqqqqqqnqqqqqqqqqqnqqqqqqqqnqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqqqqqqqqqqqnqqqqqqqqqqqqu
x storagepool     x    3 x        2 x     32 x             x             32 x warn              x                          x                           x replicated_rule x   0.0186976287513971 x 8436679796 x
mqqqqqqqqqqqqqqqqqvqqqqqqvqqqqqqqqqqvqqqqqqqqvqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqj
root@zstack137:~# ceph -s
  cluster:
    id:     9e085d6a-77f3-41f1-8f6d-71fadc9c011b
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum zstack136,zstack135,zstack137 (age 6h)
    mgr: zstack136(active, since 6h), standbys: zstack135
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 9 up (since 6h), 9 in (since 3d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 716 objects, 2.7 GiB
    usage:   8.6 GiB used, 441 GiB / 450 GiB avail
    pgs:     97 active+clean

root@zstack137:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    450 GiB  441 GiB  8.7 GiB   8.7 GiB       1.94
TOTAL  450 GiB  441 GiB  8.7 GiB   8.7 GiB       1.94

--- POOLS ---
POOL             ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr              1    1  449 KiB        2  1.3 MiB      0    137 GiB
cephfs_data       2   32      0 B        0      0 B      0    137 GiB
cephfs_metadata   3   32   35 KiB       22  194 KiB      0    137 GiB
storagepool       4   32  2.7 GiB      692  8.0 GiB   1.89    137 GiB
root@zstack137:~#

とりあえず対処できた模様?

Oracle Cloud上のLinuxサーバからOracle Cloudのオブジェクトストレージをs3fsを使ってファイルシステムとして使う


Oracle Cloud上で運用しているファイルサーバのディスク使用率がいつのまにか96%を超えていた。

え?と思って調べてみたら、外部コンテンツを取り込む際に、コンテンツのバージョンも残しておこうと軽い気持ちで設定したgitが容量を使っていた。

遅くてもいいや、ということで、使っていない50GBのオブジェクトストレージ領域をs3fsでファイルシステムとして使うこととした。

参考文献
s3fs配布元
・Oracle Cloud Infrastructure Blog「Mounting an Object Storage Bucket as File System on Oracle Linux
・Oracle Cloud Infrastructureドキュメント「Amazon S3互換API
・Cloudii「Oracle Cloud オブジェクトストレージをOracle Linuxのファイルシステムとして直接マウントする方法。

基本的には、Oracle Cloud Infrastructure Blogの記述通りにやるだけなのだが、いろいろ勘違いしていて上手くいかなかった。

手順0: Oracle Cloud上でバケット作成

Oracle blog上では手順に書かれていないので手順0として書きます。

オブジェクトストレージにてバケットの作成を行います。

バケット作成後にバケットの詳細を確認し、一般のところにある「ネームスペース」もあとで使用します。

手順1: s3fs-fuseをインストール

epelを有効にしている状態であれば、「yum install s3fs-fuse」を実行するだけでインストール完了。

# yum install s3fs-fuse
読み込んだプラグイン:langpacks, ulninfo
依存性の解決をしています
--> トランザクションの確認を実行しています。
---> パッケージ s3fs-fuse.x86_64 0:1.91-1.el7 を インストール
--> 依存性の処理をしています: fuse >= 2.8.4 のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> 依存性の処理をしています: fuse-libs >= 2.8.4 のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> 依存性の処理をしています: libfuse.so.2(FUSE_2.2)(64bit) のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> 依存性の処理をしています: libfuse.so.2(FUSE_2.5)(64bit) のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> 依存性の処理をしています: libfuse.so.2(FUSE_2.6)(64bit) のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> 依存性の処理をしています: libfuse.so.2(FUSE_2.8)(64bit) のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> 依存性の処理をしています: libfuse.so.2()(64bit) のパッケージ: s3fs-fuse-1.91-1.el7.x86_64
--> トランザクションの確認を実行しています。
---> パッケージ fuse.x86_64 0:2.9.4-1.0.9.el7 を インストール
---> パッケージ fuse-libs.x86_64 0:2.9.4-1.0.9.el7 を インストール
--> 依存性解決を終了しました。

依存性を解決しました

===================================================================================================================================
 Package                     アーキテクチャー         バージョン                        リポジトリー                          容量
===================================================================================================================================
インストール中:
 s3fs-fuse                   x86_64                   1.91-1.el7                        ol7_developer_EPEL                   257 k
依存性関連でのインストールをします:
 fuse                        x86_64                   2.9.4-1.0.9.el7                   ol7_latest                            88 k
 fuse-libs                   x86_64                   2.9.4-1.0.9.el7                   ol7_latest                            97 k

トランザクションの要約
===================================================================================================================================
インストール  1 パッケージ (+2 個の依存関係のパッケージ)

総ダウンロード容量: 442 k
インストール容量: 1.1 M
Is this ok [y/d/N]: y
Downloading packages:
<略>
インストール:
  s3fs-fuse.x86_64 0:1.91-1.el7

依存性関連をインストールしました:
  fuse.x86_64 0:2.9.4-1.0.9.el7                                 fuse-libs.x86_64 0:2.9.4-1.0.9.el7

完了しました!
#

手順2: 認証情報の設定

Oracle Cloudにログインした状態で右上のユーザアイコンから[プロファイル]-[ユーザー設定]を選択します。

画面が変わって、下の方にある[リソース]-[顧客秘密キー]を選択します。

この「顧客秘密キー」がS3 compatibleとして使う場合の認証情報となります。

「秘密キーの生成」をクリックして、何か名前を決めて作成します。

次の画面で表示される「生成されたキー」は「SECRET_ACCESS_KEY」として使いますので、かならず「コピー」してください。

これを忘れた場合は再作成する必要があります。

なお、キーはこんな感じですね。

で・・・Oracle blogだとACCESS_KEY_IDは何を使えばいいのかハッキリ書いていないので、しばらく名前として設定したs3-accessを使ってアクセスを試みていました。

正しくは上記の「アクセスキー」のところの文字列を使います。

Oracle blogでは下記の様に個人ユーザのディレクトリに認証情報を配置しています。

$ echo ACCESS_KEY_ID:SECRET_ACCESS_KEY > ${HOME}/.passwd-s3fs
$ chmod 600 ${HOME}/.passwd-s3fs

今回は個人用ではなく起動時から使えるような形にしたいので /etc/passwd-s3fs として設定しました。

# echo ~b66e06:KZyW~~~BDho= >  /etc/passwd-s3fs 
# chmod 600  /etc/passwd-s3fs 
#

手順3: マウントする

Oracle blogでは個人権限でマウントするための下記が書かれています。

$ s3fs [bucket] [destination directory] -o endpoint=[region] -o passwd_file=${HOME}/.passwd-s3fs -o url=https://[namespace].compat.objectstorage.[region].oraclecloud.com/ -onomultipart -o use_path_request_style

最初はアクセスできることを検証するため、このコマンドで実行します。

各要素は以下のようになっています。

[bucket]=手順0で作成したバケット名
[destination directory]=ローカルLinuxのマウントポイント
[namespace]=手順0で作成したバケットの詳細で確認できるネームスペース

テストとしてホームディレクトリ内にあるs3fsというディレクトリにマウントするべく実行

$ s3fs tw~t ./s3fs -o endpoint=us-phoenix-1 -o passwd_file=/etc/passwd-s3fs -o url=https://ax~b7.compat.objectstorage.us-phoenix-1.oraclecloud.com -onomultipart -o use_path_request_style
fuse: failed to exec fusermount: Permission denied
$

これはfusermountに権限がないためマウントできないというもので、下記で対処します。

$ sudo chmod a+x /usr/bin/fusermount
$ 

再実行

$ s3fs tw~t ./s3fs -o endpoint=us-phoenix-1 -o passwd_file=/etc/passwd-s3fs -o url=https://ax~b7.compat.objectstorage.us-phoenix-1.oraclecloud.com -onomultipart -o use_path_request_style
$ df -h
ファイルシス   サイズ  使用  残り 使用% マウント位置
devtmpfs         460M     0  460M    0% /dev
tmpfs            486M  380K  486M    1% /dev/shm
tmpfs            486M   19M  467M    4% /run
tmpfs            486M     0  486M    0% /sys/fs/cgroup
/dev/sda3         39G   37G  2.0G   95% /
/dev/sda1        200M  7.4M  193M    4% /boot/efi
tmpfs             98M     0   98M    0% /run/user/0
tmpfs             98M     0   98M    0% /run/user/993
tmpfs             98M     0   98M    0% /run/user/1001
$

マウントされていない??/var/log/messages を確認すると認証情報の関連でマウントに失敗していました。

Jul  9 23:07:06 oralinux s3fs[22545]: s3fs version 1.91(unknown) : s3fs -o endpoint=us-phoenix-1 -o passwd_file=/etc/passwd-s3fs -o url=https://ax~fb7.compat.objectstorage.us-phoenix-1.oraclecloud.com -onomultipart -o use_path_request_style twsearch-git ./s3fs
Jul  9 23:07:06 oralinux  s3fs[22545]: Loaded mime information from /etc/mime.types
Jul  9 23:07:06 oralinux  s3fs[22546]: init v1.91(commit:unknown) with OpenSSL
Jul  9 23:07:06 oralinux  s3fs[22546]: s3fs.cpp:s3fs_check_service(3572): Failed to connect by sigv4, so retry to connect by signature version 2.
Jul  9 23:07:06 oralinux  s3fs[22546]: s3fs.cpp:s3fs_check_service(3584): Bad Request(host=https://ax~fb7.compat.objectstorage.us-phoenix-1.oraclecloud.com) - result of checking service.

これは /etc/passwd-s3fsに書いた access-key定義が誤っていた場合のログです。

修正して再実行すると今度はマウントできました。

$ s3fs tw~t ./s3fs -o endpoint=us-phoenix-1 -o passwd_file=/etc/passwd-s3fs -o url=https://ax~b7.compat.objectstorage.us-phoenix-1.oraclecloud.com -onomultipart -o use_path_request_style
$ df -h
ファイルシス   サイズ  使用  残り 使用% マウント位置
devtmpfs         460M     0  460M    0% /dev
tmpfs            486M  400K  486M    1% /dev/shm
tmpfs            486M   19M  467M    4% /run
tmpfs            486M     0  486M    0% /sys/fs/cgroup
/dev/sda3         39G   37G  1.9G   96% /
/dev/sda1        200M  7.4M  193M    4% /boot/efi
tmpfs             98M     0   98M    0% /run/user/0
tmpfs             98M     0   98M    0% /run/user/993
tmpfs             98M     0   98M    0% /run/user/1001
s3fs              16E     0   16E    0% /home/users/s3fs
$

成功した場合は /var/log/messagesは下記の様な感じでした

Jul  9 23:30:18 oralinux s3fs[23933]: s3fs version 1.91(unknown) : s3fs -o endpoint=us-phoenix-1 -o passwd_file=/etc/passwd-s3fs -o url=https://ax~fb7.compat.objectstorage.us-phoenix-1.oraclecloud.com -onomultipart -o use_path_request_style twsearch-git ./s3fs
Jul  9 23:30:18 oralinux s3fs[23933]: Loaded mime information from /etc/mime.types
Jul  9 23:30:18 oralinux s3fs[23936]: init v1.91(commit:unknown) with OpenSSL
Jul  9 23:30:26 oralinux systemd: Configuration file /etc/systemd/system/oracle-cloud-agent.service is marked executable. Please remove executable permission bits. Proceeding anyway.

ただ、この設定だとs3fsを実行したユーザだけがアクセスでき、他のユーザではアクセスできません。

これは「allow_other」というオプションをつけることでアクセスできるようになります。

手順4: /etc/fstab に書く

再起動してもマウントされるようにするには /etc/fstab に書きます。 (/etc/rc.local とかに書く、という手順は不適切です)

上記で使った例であれば /etc/fstab に下記の様に書きます。

tw~it /home/users/s3fs fuse.s3fs use_path_request_style,url=https://axd~fb7.compat.objectstorage.us-phoenix-1.oraclecloud.com,use_path_request_style,_netdev,allow_other

これで、再起動してもマウントされるようになりました。