PowerScale/Isilon/One FSシミュレータをvSphere環境上にたてて放置しておいたら、アラートが上がっている。

One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
[Cluster Management]-[Hardware Configuration]の[Drives]で該当ノードをスロットを確認するとたしかに「Empty」となっている。

sshアクセスして「isi status」の結果をとってみる。
isilon-1# isi status
Cluster Name: isilon
Cluster Health: [ ATTN]
Data Reduction: 1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage: HDD SSD Storage
Size: 403.2G (548.1G Raw) 0 (0 Raw)
VHS Size: 144.9G
Used: 7.3G (2%) 0 (n/a)
Avail: 395.9G (98%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
1| | OK | 0| 1.4M| 1.4M| 1.0G/57.6G( 2%)|(No Storage SSDs)
2|n/a |-A-- | 0| 224k| 224k| 791M/57.6G( 1%)|(No Storage SSDs)
3| | OK | 0| 0| 0| 1.2G/57.6G( 2%)|(No Storage SSDs)
4| | OK | 0| 526k| 526k|1022M/57.6G( 2%)|(No Storage SSDs)
5| | OK | 0|49.7k|49.7k| 922M/57.6G( 2%)|(No Storage SSDs)
6| | OK | 0|33.3k|33.3k| 1.3G/57.6G( 2%)|(No Storage SSDs)
7|n/a |-A-- | 0|60.1k|60.1k| 1.2G/57.6G( 2%)|(No Storage SSDs)
Cluster Totals: | 0| 2.2M| 2.2M| 7.3G/ 403G( 2%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
03/15 19:12:38 2 One or more drives (location(s) Bay 7, Bay 8, Bay ...
03/15 19:23:19 7 One or more drives (location(s) Bay 7, Bay 8, Bay ...
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
03/16 04:00:25 ShadowStoreProtect[63] Succeeded
03/16 02:00:13 WormQueue[62] Succeeded
03/16 00:01:00 ShadowStoreDelete[61] Succeeded
03/15 22:12:38 SnapshotDelete[60] Succeeded
03/15 22:02:43 FSAnalyze[59] Succeeded
03/15 22:01:12 SmartPools[58] Succeeded
03/15 20:00:26 ShadowStoreProtect[57] Succeeded
03/15 19:16:11 MultiScan[56] Succeeded
03/15 19:04:57 MultiScan[55] Succeeded
03/15 19:00:06 MultiScan[53] Succeeded
イベントのアラートをとりあえず消すか、と、まずはイベントのeventgroup IDを確認するため「isi event events list –format=csv」を実行。
(「isi event events list」だと無駄なスペースが多くて探しにくいので、コンパクトなcsv出力にしています。)
isilon-1# isi event events list --format=csv
ID,Occurred,Sev,Lnn,"Eventgroup ID",Message
1.2,1646613393,W,1,1,"The SmartPools upgrade has not completed. Please contact PowerScale support and reference emc321047"
1.331,1646613875,I,-1,1024,"Resolving event group"
1.273,1646613695,W,-1,1024,"Node 2 is unprovisioned"
1.1120,1647338834,C,0,1051,"Resolved from PAPI"
1.356,1646614051,C,1,1051,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1119,1647338833,C,0,1052,"Resolved from PAPI"
2.314,1646614344,C,-1,1052,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1122,1647338838,C,0,1053,"Resolved from PAPI"
3.263,1646614413,C,-1,1053,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.802,1647310249,C,0,1054,"Resolved from PAPI"
4.254,1646614484,C,4,1054,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
4.269,1646665200,I,4,1060,"Heartbeat Event"
1.376,1646665200,I,1,1061,"Heartbeat Event"
3.304,1646751600,I,-1,1085,"Heartbeat Event"
4.296,1646751600,I,4,1086,"Heartbeat Event"
2.355,1646751600,I,-1,1087,"Heartbeat Event"
1.426,1646751600,I,1,1088,"Heartbeat Event"
2.384,1646838000,I,-1,1111,"Heartbeat Event"
4.325,1646838000,I,4,1112,"Heartbeat Event"
3.333,1646838000,I,-1,1113,"Heartbeat Event"
1.477,1646838000,I,1,1114,"Heartbeat Event"
2.412,1646924400,I,-1,1137,"Heartbeat Event"
1.527,1646924400,I,1,1138,"Heartbeat Event"
4.353,1646924400,I,4,1144,"Heartbeat Event"
3.361,1646924400,I,-1,1145,"Heartbeat Event"
2.441,1647010800,I,-1,1163,"Heartbeat Event"
4.382,1647010800,I,4,1164,"Heartbeat Event"
3.390,1647010800,I,-1,1165,"Heartbeat Event"
1.578,1647010800,I,1,1166,"Heartbeat Event"
1.635,1647097200,I,1,1194,"Heartbeat Event"
3.420,1647097200,I,-1,1200,"Heartbeat Event"
4.412,1647097200,I,4,1201,"Heartbeat Event"
2.471,1647097200,I,-1,1202,"Heartbeat Event"
2.500,1647183600,I,-1,1220,"Heartbeat Event"
1.686,1647183600,I,1,1221,"Heartbeat Event"
3.449,1647183600,I,-1,1227,"Heartbeat Event"
4.441,1647183600,I,4,1228,"Heartbeat Event"
2.529,1647270000,I,-1,1246,"Heartbeat Event"
3.478,1647270000,I,-1,1247,"Heartbeat Event"
4.470,1647270000,I,4,1248,"Heartbeat Event"
1.737,1647270000,I,1,1249,"Heartbeat Event"
1.1121,1647338837,C,0,1457,"Resolved from PAPI"
5.271,1647337391,C,5,1457,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1028,1647337630,I,3,1458,"Resolving event group"
1.1016,1647337569,W,3,1458,"Node 3 is unprovisioned"
1.1054,1647338168,I,6,1471,"Resolving event group"
1.1042,1647338049,W,6,1471,"Node 6 is unprovisioned"
1.1109,1647338661,C,0,1476,"Resolved from PAPI"
6.270,1647338209,C,3,1476,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1137,1647339264,I,7,1520,"Resolving event group"
1.1124,1647339143,C,7,1520,"Node 7 is offline"
8.270,1647339158,C,2,1523,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1138,1647339265,I,7,1525,"Node 7 is online (offline event 1.1124, Tue Mar 15 19:12:23 2022 to Tue Mar 15 19:14:24 2022)"
9.268,1647339799,C,7,1535,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
8.445,1647356400,I,2,1578,"Heartbeat Event"
7.297,1647356401,I,6,1579,"Heartbeat Event"
5.325,1647356400,I,5,1580,"Heartbeat Event"
6.306,1647356400,I,3,1581,"Heartbeat Event"
1.1201,1647356401,I,1,1582,"Heartbeat Event"
9.432,1647356400,I,7,1588,"Heartbeat Event"
4.589,1647356401,I,4,1589,"Heartbeat Event"
1.322,1646613815,I,1,4,"Resolving event group"
1.171,1646613514,W,1,4,"Node 1 is unprovisioned"
Total: 64
出力の「Lnn」に注目してもらうと見えてくるのですが、isi statusで出ている2,7以外でもこのメッセージは出ていて、それはすでにResolveとしていたりします。
「isi status」で確認した時刻「03/15 19:23:19」を使います。
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
03/15 19:12:38 2 One or more drives (location(s) Bay 7, Bay 8, Bay ...
03/15 19:23:19 7 One or more drives (location(s) Bay 7, Bay 8, Bay ...
format=csvの時の時刻はunixtimeとなっているのでdateコマンドを使って変換します。ただし、OneFSのdateコマンドはBSD dateなのでオプションの違いに注意する必要があります。
isilon-1# date -j -f "%Y-%m-%d %H:%M:%S" "2022-03-15 19:12:38" +%s
isilon-1# isi event events list --format=csv|grep 1647339158
8.270,1647339158,C,2,1523,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
出力は「ID,Occurred,Sev,Lnn,”Eventgroup ID”,Message」という順番なので
Eventgroup ID:1523
イベントの単品を確認する場合は「isi event events view <ID>」を実行します。
isilon-1# isi event events view 8.270
ID: 8.270
Eventgroup ID: 1523
Event Type: 100010011
Message: One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
Devid: 8
Lnn: 2
Time: 2022-03-15T19:12:38
Severity: critical
Value: 9.0
Eventgroup IDベースで確認するのであれば「isi event view –id=<EventGroupID>」
isilon-1# isi event view --id=1523
ID: 1523
Started: 03/15 19:12
Causes Long: One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
Lnn: 2
Devid: 8
Last Event: 2022-03-15T19:12:38
Ignore: No
Ignore Time: Never
Resolved: No
Resolve Time: Never
Ended: --
Events: 1
Severity: critical
解決するには「isi event modify –id=<EventGroupID> –resolved=true」を実行
isilon-1# isi event modify --id=1523 --resolved=true
isilon-1# isi status
Cluster Name: isilon
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
03/15 19:23:19 7 One or more drives (location(s) Bay 7, Bay 8, Bay ...
isilon-1# date -j -f "%Y-%m-%d %H:%M:%S" "2022-03-15 19:23:19" +%s
isilon-1# isi event events list --format=csv|grep 1647339799
9.268,1647339799,C,7,1535,"One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
isilon-1# isi event view --id 1535
ID: 1535
Started: 03/15 19:23
Causes Long: One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
Lnn: 7
Devid: 9
Last Event: 2022-03-15T19:23:19
Ignore: No
Ignore Time: Never
Resolved: No
Resolve Time: Never
Ended: --
Events: 1
Severity: critical
isilon-1# isi event modify --id=1535 --resolved=true
isilon-1# isi status
Cluster Name: isilon
Cluster Health: [ ATTN]
Data Reduction: 1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage: HDD SSD Storage
Size: 403.2G (548.1G Raw) 0 (0 Raw)
VHS Size: 144.9G
Used: 7.4G (2%) 0 (n/a)
Avail: 395.8G (98%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
1| | OK | 551k| 4.7M| 5.3M| 1.0G/57.6G( 2%)|(No Storage SSDs)
2|n/a | OK | 0|14.5k|14.5k| 795M/57.6G( 1%)|(No Storage SSDs)
3| | OK | 0|33.2k|33.2k| 1.2G/57.6G( 2%)|(No Storage SSDs)
4| | OK | 0|49.7k|49.7k| 1.0G/57.6G( 2%)|(No Storage SSDs)
5| | OK | 0|22.4k|22.4k| 924M/57.6G( 2%)|(No Storage SSDs)
6| | OK | 0| 230k| 230k| 1.3G/57.6G( 2%)|(No Storage SSDs)
7|n/a |-A-- | 9.7k| 96.0| 9.8k| 1.2G/57.6G( 2%)|(No Storage SSDs)
Cluster Totals: | 561k| 5.1M| 5.6M| 7.4G/ 403G( 2%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
03/16 04:00:25 ShadowStoreProtect[63] Succeeded
03/16 02:00:13 WormQueue[62] Succeeded
03/16 00:01:00 ShadowStoreDelete[61] Succeeded
03/15 22:12:38 SnapshotDelete[60] Succeeded
03/15 22:02:43 FSAnalyze[59] Succeeded
03/15 22:01:12 SmartPools[58] Succeeded
03/15 20:00:26 ShadowStoreProtect[57] Succeeded
03/15 19:16:11 MultiScan[56] Succeeded
03/15 19:04:57 MultiScan[55] Succeeded
03/15 19:00:06 MultiScan[53] Succeeded
あれ?「Critical Events」に何もないのに、「Cluster Health:ATTN」のまま?
isilon-1# isi status
Cluster Name: isilon
Cluster Health: [ OK ]
Data Reduction: 1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage: HDD SSD Storage
Size: 403.2G (548.1G Raw) 0 (0 Raw)
VHS Size: 144.9G
Used: 7.4G (2%) 0 (n/a)
Avail: 395.8G (98%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
1| | OK | 0| 200k| 200k| 1.0G/57.6G( 2%)|(No Storage SSDs)
2|n/a | OK | 0| 0| 0| 792M/57.6G( 1%)|(No Storage SSDs)
3| | OK | 0| 244k| 244k| 1.2G/57.6G( 2%)|(No Storage SSDs)
4| | OK | 0|24.9k|24.9k| 1.0G/57.6G( 2%)|(No Storage SSDs)
5| | OK | 0| 128k| 128k| 924M/57.6G( 2%)|(No Storage SSDs)
6| | OK | 0|33.3k|33.3k| 1.3G/57.6G( 2%)|(No Storage SSDs)
7|n/a | OK | 0| 175k| 175k| 1.2G/57.6G( 2%)|(No Storage SSDs)
Cluster Totals: | 0| 805k| 805k| 7.4G/ 403G( 2%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Time LNN Event
--------------- ---- -------------------------------------------------------
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
03/16 04:00:25 ShadowStoreProtect[63] Succeeded
03/16 02:00:13 WormQueue[62] Succeeded
03/16 00:01:00 ShadowStoreDelete[61] Succeeded
03/15 22:12:38 SnapshotDelete[60] Succeeded
03/15 22:02:43 FSAnalyze[59] Succeeded
03/15 22:01:12 SmartPools[58] Succeeded
03/15 20:00:26 ShadowStoreProtect[57] Succeeded
03/15 19:16:11 MultiScan[56] Succeeded
03/15 19:04:57 MultiScan[55] Succeeded
03/15 19:00:06 MultiScan[53] Succeeded