PowerScale/Isilon/One FSシミュレータをvSphere環境上にたてて放置しておいたら、アラートが上がっている。
One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
各ノードのBayについて出ている。
[Cluster Management]-[Hardware Configuration]の[Drives]で該当ノードをスロットを確認するとたしかに「Empty」となっている。
sshアクセスして「isi status」の結果をとってみる。
isilon-1# isi status
Cluster Name: isilon
Cluster Health: [ ATTN]
Data Reduction: 1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage: HDD SSD Storage
Size: 403.2G (548.1G Raw) 0 (0 Raw)
VHS Size: 144.9G
Used: 7.3G (2%) 0 (n/a)
Avail: 395.9G (98%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
1|172.17.44.85 | OK | 0| 1.4M| 1.4M| 1.0G/57.6G( 2%)|(No Storage SSDs)
2|n/a |-A-- | 0| 224k| 224k| 791M/57.6G( 1%)|(No Storage SSDs)
3|172.17.44.87 | OK | 0| 0| 0| 1.2G/57.6G( 2%)|(No Storage SSDs)
4|172.17.44.88 | OK | 0| 526k| 526k|1022M/57.6G( 2%)|(No Storage SSDs)
5|172.17.44.89 | OK | 0|49.7k|49.7k| 922M/57.6G( 2%)|(No Storage SSDs)
6|172.17.44.86 | OK | 0|33.3k|33.3k| 1.3G/57.6G( 2%)|(No Storage SSDs)
7|n/a |-A-- | 0|60.1k|60.1k| 1.2G/57.6G( 2%)|(No Storage SSDs)
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 0| 2.2M| 2.2M| 7.3G/ 403G( 2%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
03/15 19:12:38 2 One or more drives (location(s) Bay 7, Bay 8, Bay ...
03/15 19:23:19 7 One or more drives (location(s) Bay 7, Bay 8, Bay ...
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
03/16 04:00:25 ShadowStoreProtect[63] Succeeded
03/16 02:00:13 WormQueue[62] Succeeded
03/16 00:01:00 ShadowStoreDelete[61] Succeeded
03/15 22:12:38 SnapshotDelete[60] Succeeded
03/15 22:02:43 FSAnalyze[59] Succeeded
03/15 22:01:12 SmartPools[58] Succeeded
03/15 20:00:26 ShadowStoreProtect[57] Succeeded
03/15 19:16:11 MultiScan[56] Succeeded
03/15 19:04:57 MultiScan[55] Succeeded
03/15 19:00:06 MultiScan[53] Succeeded
isilon-1#
イベントのアラートをとりあえず消すか、と、まずはイベントのeventgroup IDを確認するため「isi event events list –format=csv」を実行。
(「isi event events list」だと無駄なスペースが多くて探しにくいので、コンパクトなcsv出力にしています。)
isilon-1# isi event events list --format=csv
ID,Occurred,Sev,Lnn,"Eventgroup ID",Message
1.2,1646613393,W,1,1,"The SmartPools upgrade has not completed. Please contact PowerScale support and reference emc321047"
1.176,1646613393,U,1,1,
1.331,1646613875,I,-1,1024,"Resolving event group"
1.273,1646613695,W,-1,1024,"Node 2 is unprovisioned"
1.1120,1647338834,C,0,1051,"Resolved from PAPI"
1.356,1646614051,C,1,1051,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1119,1647338833,C,0,1052,"Resolved from PAPI"
2.314,1646614344,C,-1,1052,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1122,1647338838,C,0,1053,"Resolved from PAPI"
3.263,1646614413,C,-1,1053,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.802,1647310249,C,0,1054,"Resolved from PAPI"
4.254,1646614484,C,4,1054,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
4.269,1646665200,I,4,1060,"Heartbeat Event"
1.376,1646665200,I,1,1061,"Heartbeat Event"
3.304,1646751600,I,-1,1085,"Heartbeat Event"
4.296,1646751600,I,4,1086,"Heartbeat Event"
2.355,1646751600,I,-1,1087,"Heartbeat Event"
1.426,1646751600,I,1,1088,"Heartbeat Event"
2.384,1646838000,I,-1,1111,"Heartbeat Event"
4.325,1646838000,I,4,1112,"Heartbeat Event"
3.333,1646838000,I,-1,1113,"Heartbeat Event"
1.477,1646838000,I,1,1114,"Heartbeat Event"
2.412,1646924400,I,-1,1137,"Heartbeat Event"
1.527,1646924400,I,1,1138,"Heartbeat Event"
4.353,1646924400,I,4,1144,"Heartbeat Event"
3.361,1646924400,I,-1,1145,"Heartbeat Event"
2.441,1647010800,I,-1,1163,"Heartbeat Event"
4.382,1647010800,I,4,1164,"Heartbeat Event"
3.390,1647010800,I,-1,1165,"Heartbeat Event"
1.578,1647010800,I,1,1166,"Heartbeat Event"
1.635,1647097200,I,1,1194,"Heartbeat Event"
3.420,1647097200,I,-1,1200,"Heartbeat Event"
4.412,1647097200,I,4,1201,"Heartbeat Event"
2.471,1647097200,I,-1,1202,"Heartbeat Event"
2.500,1647183600,I,-1,1220,"Heartbeat Event"
1.686,1647183600,I,1,1221,"Heartbeat Event"
3.449,1647183600,I,-1,1227,"Heartbeat Event"
4.441,1647183600,I,4,1228,"Heartbeat Event"
2.529,1647270000,I,-1,1246,"Heartbeat Event"
3.478,1647270000,I,-1,1247,"Heartbeat Event"
4.470,1647270000,I,4,1248,"Heartbeat Event"
1.737,1647270000,I,1,1249,"Heartbeat Event"
1.1121,1647338837,C,0,1457,"Resolved from PAPI"
5.271,1647337391,C,5,1457,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1028,1647337630,I,3,1458,"Resolving event group"
1.1016,1647337569,W,3,1458,"Node 3 is unprovisioned"
1.1054,1647338168,I,6,1471,"Resolving event group"
1.1042,1647338049,W,6,1471,"Node 6 is unprovisioned"
1.1109,1647338661,C,0,1476,"Resolved from PAPI"
6.270,1647338209,C,3,1476,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1137,1647339264,I,7,1520,"Resolving event group"
1.1124,1647339143,C,7,1520,"Node 7 is offline"
8.270,1647339158,C,2,1523,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1138,1647339265,I,7,1525,"Node 7 is online (offline event 1.1124, Tue Mar 15 19:12:23 2022 to Tue Mar 15 19:14:24 2022)"
9.268,1647339799,C,7,1535,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
8.445,1647356400,I,2,1578,"Heartbeat Event"
7.297,1647356401,I,6,1579,"Heartbeat Event"
5.325,1647356400,I,5,1580,"Heartbeat Event"
6.306,1647356400,I,3,1581,"Heartbeat Event"
1.1201,1647356401,I,1,1582,"Heartbeat Event"
9.432,1647356400,I,7,1588,"Heartbeat Event"
4.589,1647356401,I,4,1589,"Heartbeat Event"
1.322,1646613815,I,1,4,"Resolving event group"
1.171,1646613514,W,1,4,"Node 1 is unprovisioned"
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total: 64
isilon-1#
出力の「Lnn」に注目してもらうと見えてくるのですが、isi statusで出ている2,7以外でもこのメッセージは出ていて、それはすでにResolveとしていたりします。
Lnn単位でフィルターするオプションはないようなので、grepします。
「isi status」で確認した時刻「03/15 19:23:19」を使います。
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
03/15 19:12:38 2 One or more drives (location(s) Bay 7, Bay 8, Bay ...
03/15 19:23:19 7 One or more drives (location(s) Bay 7, Bay 8, Bay ...
format=csvの時の時刻はunixtimeとなっているのでdateコマンドを使って変換します。ただし、OneFSのdateコマンドはBSD dateなのでオプションの違いに注意する必要があります。
isilon-1# date -j -f "%Y-%m-%d %H:%M:%S" "2022-03-15 19:12:38" +%s
1647339158
isilon-1#
unixtimeが判明したので、grepします。
isilon-1# isi event events list --format=csv|grep 1647339158
8.270,1647339158,C,2,1523,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
isilon-1#
出力は「ID,Occurred,Sev,Lnn,”Eventgroup ID”,Message」という順番なので
ID:8.270
Lnn:2
Eventgroup ID:1523
となります。
イベントの単品を確認する場合は「isi event events view <ID>」を実行します。
isilon-1# isi event events view 8.270
ID: 8.270
Eventgroup ID: 1523
Event Type: 100010011
Message: One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
Devid: 8
Lnn: 2
Time: 2022-03-15T19:12:38
Severity: critical
Value: 9.0
isilon-1#
Eventgroup IDベースで確認するのであれば「isi event view –id=<EventGroupID>」
isilon-1# isi event view --id=1523
ID: 1523
Started: 03/15 19:12
Causes Long: One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
Lnn: 2
Devid: 8
Last Event: 2022-03-15T19:12:38
Ignore: No
Ignore Time: Never
Resolved: No
Resolve Time: Never
Ended: --
Events: 1
Severity: critical
isilon-1#
解決するには「isi event modify –id=<EventGroupID> –resolved=true」を実行
isilon-1# isi event modify --id=1523 --resolved=true
isilon-1# isi status
Cluster Name: isilon
<略>
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
03/15 19:23:19 7 One or more drives (location(s) Bay 7, Bay 8, Bay ...
<略>
isilon-1#
該当するアラートが消えました。
同様に残っているもう1つも消します。
isilon-1# date -j -f "%Y-%m-%d %H:%M:%S" "2022-03-15 19:23:19" +%s
1647339799
isilon-1# isi event events list --format=csv|grep 1647339799
9.268,1647339799,C,7,1535,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
isilon-1# isi event view --id 1535
ID: 1535
Started: 03/15 19:23
Causes Long: One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
Lnn: 7
Devid: 9
Last Event: 2022-03-15T19:23:19
Ignore: No
Ignore Time: Never
Resolved: No
Resolve Time: Never
Ended: --
Events: 1
Severity: critical
isilon-1# isi event modify --id=1535 --resolved=true
isilon-1# isi status
Cluster Name: isilon
Cluster Health: [ ATTN]
Data Reduction: 1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage: HDD SSD Storage
Size: 403.2G (548.1G Raw) 0 (0 Raw)
VHS Size: 144.9G
Used: 7.4G (2%) 0 (n/a)
Avail: 395.8G (98%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
1|172.17.44.85 | OK | 551k| 4.7M| 5.3M| 1.0G/57.6G( 2%)|(No Storage SSDs)
2|n/a | OK | 0|14.5k|14.5k| 795M/57.6G( 1%)|(No Storage SSDs)
3|172.17.44.87 | OK | 0|33.2k|33.2k| 1.2G/57.6G( 2%)|(No Storage SSDs)
4|172.17.44.88 | OK | 0|49.7k|49.7k| 1.0G/57.6G( 2%)|(No Storage SSDs)
5|172.17.44.89 | OK | 0|22.4k|22.4k| 924M/57.6G( 2%)|(No Storage SSDs)
6|172.17.44.86 | OK | 0| 230k| 230k| 1.3G/57.6G( 2%)|(No Storage SSDs)
7|n/a |-A-- | 9.7k| 96.0| 9.8k| 1.2G/57.6G( 2%)|(No Storage SSDs)
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 561k| 5.1M| 5.6M| 7.4G/ 403G( 2%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
03/16 04:00:25 ShadowStoreProtect[63] Succeeded
03/16 02:00:13 WormQueue[62] Succeeded
03/16 00:01:00 ShadowStoreDelete[61] Succeeded
03/15 22:12:38 SnapshotDelete[60] Succeeded
03/15 22:02:43 FSAnalyze[59] Succeeded
03/15 22:01:12 SmartPools[58] Succeeded
03/15 20:00:26 ShadowStoreProtect[57] Succeeded
03/15 19:16:11 MultiScan[56] Succeeded
03/15 19:04:57 MultiScan[55] Succeeded
03/15 19:00:06 MultiScan[53] Succeeded
isilon-1#
あれ?「Critical Events」に何もないのに、「Cluster Health:ATTN」のまま?
大丈夫です。クラスタステータスの更新は少し時間が掛かっているだけでした。
isilon-1# isi status
Cluster Name: isilon
Cluster Health: [ OK ]
Data Reduction: 1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage: HDD SSD Storage
Size: 403.2G (548.1G Raw) 0 (0 Raw)
VHS Size: 144.9G
Used: 7.4G (2%) 0 (n/a)
Avail: 395.8G (98%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
1|172.17.44.85 | OK | 0| 200k| 200k| 1.0G/57.6G( 2%)|(No Storage SSDs)
2|n/a | OK | 0| 0| 0| 792M/57.6G( 1%)|(No Storage SSDs)
3|172.17.44.87 | OK | 0| 244k| 244k| 1.2G/57.6G( 2%)|(No Storage SSDs)
4|172.17.44.88 | OK | 0|24.9k|24.9k| 1.0G/57.6G( 2%)|(No Storage SSDs)
5|172.17.44.89 | OK | 0| 128k| 128k| 924M/57.6G( 2%)|(No Storage SSDs)
6|172.17.44.86 | OK | 0|33.3k|33.3k| 1.3G/57.6G( 2%)|(No Storage SSDs)
7|n/a | OK | 0| 175k| 175k| 1.2G/57.6G( 2%)|(No Storage SSDs)
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 0| 805k| 805k| 7.4G/ 403G( 2%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
03/16 04:00:25 ShadowStoreProtect[63] Succeeded
03/16 02:00:13 WormQueue[62] Succeeded
03/16 00:01:00 ShadowStoreDelete[61] Succeeded
03/15 22:12:38 SnapshotDelete[60] Succeeded
03/15 22:02:43 FSAnalyze[59] Succeeded
03/15 22:01:12 SmartPools[58] Succeeded
03/15 20:00:26 ShadowStoreProtect[57] Succeeded
03/15 19:16:11 MultiScan[56] Succeeded
03/15 19:04:57 MultiScan[55] Succeeded
03/15 19:00:06 MultiScan[53] Succeeded
isilon-1#