PowerScale でアラートの警告をCLIで消す


PowerScale/Isilon/One FSシミュレータをvSphere環境上にたてて放置しておいたら、アラートが上がっている。

One or more drives (location(s) Bay 7, Bay 8, Bay 9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.

各ノードのBayについて出ている。

[Cluster Management]-[Hardware Configuration]の[Drives]で該当ノードをスロットを確認するとたしかに「Empty」となっている。

sshアクセスして「isi status」の結果をとってみる。

isilon-1# isi status
Cluster Name: isilon
Cluster Health:     [ ATTN]
Data Reduction:     1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage:  HDD                 SSD Storage
Size:             403.2G (548.1G Raw) 0 (0 Raw)
VHS Size:         144.9G
Used:             7.3G (2%)           0 (n/a)
Avail:            395.9G (98%)        0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage
ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
  1|172.17.44.85   | OK  |    0| 1.4M| 1.4M| 1.0G/57.6G(  2%)|(No Storage SSDs)
  2|n/a            |-A-- |    0| 224k| 224k| 791M/57.6G(  1%)|(No Storage SSDs)
  3|172.17.44.87   | OK  |    0|    0|    0| 1.2G/57.6G(  2%)|(No Storage SSDs)
  4|172.17.44.88   | OK  |    0| 526k| 526k|1022M/57.6G(  2%)|(No Storage SSDs)
  5|172.17.44.89   | OK  |    0|49.7k|49.7k| 922M/57.6G(  2%)|(No Storage SSDs)
  6|172.17.44.86   | OK  |    0|33.3k|33.3k| 1.3G/57.6G(  2%)|(No Storage SSDs)
  7|n/a            |-A-- |    0|60.1k|60.1k| 1.2G/57.6G(  2%)|(No Storage SSDs)
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals:          |    0| 2.2M| 2.2M| 7.3G/ 403G(  2%)|(No Storage SSDs)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Events:
Time            LNN  Event
--------------- ---- -------------------------------------------------------
03/15 19:12:38  2    One or more drives (location(s) Bay  7, Bay  8, Bay ...
03/15 19:23:19  7    One or more drives (location(s) Bay  7, Bay  8, Bay ...


Cluster Job Status:

No running jobs.

No paused or waiting jobs.

No failed jobs.

Recent job results:
Time            Job                        Event
--------------- -------------------------- ------------------------------
03/16 04:00:25  ShadowStoreProtect[63]     Succeeded
03/16 02:00:13  WormQueue[62]              Succeeded
03/16 00:01:00  ShadowStoreDelete[61]      Succeeded
03/15 22:12:38  SnapshotDelete[60]         Succeeded
03/15 22:02:43  FSAnalyze[59]              Succeeded
03/15 22:01:12  SmartPools[58]             Succeeded
03/15 20:00:26  ShadowStoreProtect[57]     Succeeded
03/15 19:16:11  MultiScan[56]              Succeeded
03/15 19:04:57  MultiScan[55]              Succeeded
03/15 19:00:06  MultiScan[53]              Succeeded

isilon-1#

イベントのアラートをとりあえず消すか、と、まずはイベントのeventgroup IDを確認するため「isi event events list –format=csv」を実行。
(「isi event events list」だと無駄なスペースが多くて探しにくいので、コンパクトなcsv出力にしています。)

isilon-1# isi event events list --format=csv
ID,Occurred,Sev,Lnn,"Eventgroup ID",Message
1.2,1646613393,W,1,1,"The SmartPools upgrade has not completed. Please contact PowerScale support and reference emc321047"
1.176,1646613393,U,1,1,
1.331,1646613875,I,-1,1024,"Resolving event group"
1.273,1646613695,W,-1,1024,"Node 2 is unprovisioned"
1.1120,1647338834,C,0,1051,"Resolved from PAPI"
1.356,1646614051,C,1,1051,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1119,1647338833,C,0,1052,"Resolved from PAPI"
2.314,1646614344,C,-1,1052,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1122,1647338838,C,0,1053,"Resolved from PAPI"
3.263,1646614413,C,-1,1053,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.802,1647310249,C,0,1054,"Resolved from PAPI"
4.254,1646614484,C,4,1054,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
4.269,1646665200,I,4,1060,"Heartbeat Event"
1.376,1646665200,I,1,1061,"Heartbeat Event"
3.304,1646751600,I,-1,1085,"Heartbeat Event"
4.296,1646751600,I,4,1086,"Heartbeat Event"
2.355,1646751600,I,-1,1087,"Heartbeat Event"
1.426,1646751600,I,1,1088,"Heartbeat Event"
2.384,1646838000,I,-1,1111,"Heartbeat Event"
4.325,1646838000,I,4,1112,"Heartbeat Event"
3.333,1646838000,I,-1,1113,"Heartbeat Event"
1.477,1646838000,I,1,1114,"Heartbeat Event"
2.412,1646924400,I,-1,1137,"Heartbeat Event"
1.527,1646924400,I,1,1138,"Heartbeat Event"
4.353,1646924400,I,4,1144,"Heartbeat Event"
3.361,1646924400,I,-1,1145,"Heartbeat Event"
2.441,1647010800,I,-1,1163,"Heartbeat Event"
4.382,1647010800,I,4,1164,"Heartbeat Event"
3.390,1647010800,I,-1,1165,"Heartbeat Event"
1.578,1647010800,I,1,1166,"Heartbeat Event"
1.635,1647097200,I,1,1194,"Heartbeat Event"
3.420,1647097200,I,-1,1200,"Heartbeat Event"
4.412,1647097200,I,4,1201,"Heartbeat Event"
2.471,1647097200,I,-1,1202,"Heartbeat Event"
2.500,1647183600,I,-1,1220,"Heartbeat Event"
1.686,1647183600,I,1,1221,"Heartbeat Event"
3.449,1647183600,I,-1,1227,"Heartbeat Event"
4.441,1647183600,I,4,1228,"Heartbeat Event"
2.529,1647270000,I,-1,1246,"Heartbeat Event"
3.478,1647270000,I,-1,1247,"Heartbeat Event"
4.470,1647270000,I,4,1248,"Heartbeat Event"
1.737,1647270000,I,1,1249,"Heartbeat Event"
1.1121,1647338837,C,0,1457,"Resolved from PAPI"
5.271,1647337391,C,5,1457,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1028,1647337630,I,3,1458,"Resolving event group"
1.1016,1647337569,W,3,1458,"Node 3 is unprovisioned"
1.1054,1647338168,I,6,1471,"Resolving event group"
1.1042,1647338049,W,6,1471,"Node 6 is unprovisioned"
1.1109,1647338661,C,0,1476,"Resolved from PAPI"
6.270,1647338209,C,3,1476,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1137,1647339264,I,7,1520,"Resolving event group"
1.1124,1647339143,C,7,1520,"Node 7 is offline"
8.270,1647339158,C,2,1523,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
1.1138,1647339265,I,7,1525,"Node 7 is online (offline event 1.1124, Tue Mar 15 19:12:23 2022 to Tue Mar 15 19:14:24 2022)"
9.268,1647339799,C,7,1535,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
8.445,1647356400,I,2,1578,"Heartbeat Event"
7.297,1647356401,I,6,1579,"Heartbeat Event"
5.325,1647356400,I,5,1580,"Heartbeat Event"
6.306,1647356400,I,3,1581,"Heartbeat Event"
1.1201,1647356401,I,1,1582,"Heartbeat Event"
9.432,1647356400,I,7,1588,"Heartbeat Event"
4.589,1647356401,I,4,1589,"Heartbeat Event"
1.322,1646613815,I,1,4,"Resolving event group"
1.171,1646613514,W,1,4,"Node 1 is unprovisioned"
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total: 64                                                                                                                                            
isilon-1#

出力の「Lnn」に注目してもらうと見えてくるのですが、isi statusで出ている2,7以外でもこのメッセージは出ていて、それはすでにResolveとしていたりします。

Lnn単位でフィルターするオプションはないようなので、grepします。

「isi status」で確認した時刻「03/15 19:23:19」を使います。

Critical Events:
Time            LNN  Event
--------------- ---- -------------------------------------------------------
03/15 19:12:38  2    One or more drives (location(s) Bay  7, Bay  8, Bay ...
03/15 19:23:19  7    One or more drives (location(s) Bay  7, Bay  8, Bay ...

format=csvの時の時刻はunixtimeとなっているのでdateコマンドを使って変換します。ただし、OneFSのdateコマンドはBSD dateなのでオプションの違いに注意する必要があります。

isilon-1# date -j -f "%Y-%m-%d %H:%M:%S" "2022-03-15 19:12:38" +%s
1647339158
isilon-1#

unixtimeが判明したので、grepします。

isilon-1# isi event events list --format=csv|grep 1647339158
8.270,1647339158,C,2,1523,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
isilon-1#

出力は「ID,Occurred,Sev,Lnn,”Eventgroup ID”,Message」という順番なので
 ID:8.270
 Lnn:2
 Eventgroup ID:1523
となります。

イベントの単品を確認する場合は「isi event events view <ID>」を実行します。

isilon-1# isi event events view 8.270
           ID: 8.270
Eventgroup ID: 1523
   Event Type: 100010011
      Message: One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
        Devid: 8
          Lnn: 2
         Time: 2022-03-15T19:12:38
     Severity: critical
        Value: 9.0
isilon-1#

Eventgroup IDベースで確認するのであれば「isi event view –id=<EventGroupID>」

isilon-1# isi event view --id=1523
          ID: 1523
     Started: 03/15 19:12
 Causes Long: One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
         Lnn: 2
       Devid: 8
  Last Event: 2022-03-15T19:12:38
      Ignore: No
 Ignore Time: Never
    Resolved: No
Resolve Time: Never
       Ended: --
      Events: 1
    Severity: critical
isilon-1#

解決するには「isi event modify –id=<EventGroupID> –resolved=true」を実行

isilon-1# isi event modify --id=1523 --resolved=true
isilon-1# isi status
Cluster Name: isilon
<略>
Critical Events:
Time            LNN  Event
--------------- ---- -------------------------------------------------------
03/15 19:23:19  7    One or more drives (location(s) Bay  7, Bay  8, Bay ...

<略>
isilon-1# 

該当するアラートが消えました。

同様に残っているもう1つも消します。

isilon-1# date -j -f "%Y-%m-%d %H:%M:%S" "2022-03-15 19:23:19" +%s
1647339799
isilon-1# isi event events list --format=csv|grep 1647339799
9.268,1647339799,C,7,1535,"One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy."
isilon-1# isi event view --id 1535
          ID: 1535
     Started: 03/15 19:23
 Causes Long: One or more drives (location(s) Bay  7, Bay  8, Bay  9, Bay 10, Bay 11, Bay 12, Bay 13, Bay 14, Bay 15 / type(s) HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD, HDD) are not healthy.
         Lnn: 7
       Devid: 9
  Last Event: 2022-03-15T19:23:19
      Ignore: No
 Ignore Time: Never
    Resolved: No
Resolve Time: Never
       Ended: --
      Events: 1
    Severity: critical
isilon-1# isi event modify --id=1535 --resolved=true
isilon-1# isi status
Cluster Name: isilon
Cluster Health:     [ ATTN]
Data Reduction:     1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage:  HDD                 SSD Storage
Size:             403.2G (548.1G Raw) 0 (0 Raw)
VHS Size:         144.9G
Used:             7.4G (2%)           0 (n/a)
Avail:            395.8G (98%)        0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage
ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
  1|172.17.44.85   | OK  | 551k| 4.7M| 5.3M| 1.0G/57.6G(  2%)|(No Storage SSDs)
  2|n/a            | OK  |    0|14.5k|14.5k| 795M/57.6G(  1%)|(No Storage SSDs)
  3|172.17.44.87   | OK  |    0|33.2k|33.2k| 1.2G/57.6G(  2%)|(No Storage SSDs)
  4|172.17.44.88   | OK  |    0|49.7k|49.7k| 1.0G/57.6G(  2%)|(No Storage SSDs)
  5|172.17.44.89   | OK  |    0|22.4k|22.4k| 924M/57.6G(  2%)|(No Storage SSDs)
  6|172.17.44.86   | OK  |    0| 230k| 230k| 1.3G/57.6G(  2%)|(No Storage SSDs)
  7|n/a            |-A-- | 9.7k| 96.0| 9.8k| 1.2G/57.6G(  2%)|(No Storage SSDs)
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals:          | 561k| 5.1M| 5.6M| 7.4G/ 403G(  2%)|(No Storage SSDs)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Events:
Time            LNN  Event
--------------- ---- -------------------------------------------------------


Cluster Job Status:

No running jobs.

No paused or waiting jobs.

No failed jobs.

Recent job results:
Time            Job                        Event
--------------- -------------------------- ------------------------------
03/16 04:00:25  ShadowStoreProtect[63]     Succeeded
03/16 02:00:13  WormQueue[62]              Succeeded
03/16 00:01:00  ShadowStoreDelete[61]      Succeeded
03/15 22:12:38  SnapshotDelete[60]         Succeeded
03/15 22:02:43  FSAnalyze[59]              Succeeded
03/15 22:01:12  SmartPools[58]             Succeeded
03/15 20:00:26  ShadowStoreProtect[57]     Succeeded
03/15 19:16:11  MultiScan[56]              Succeeded
03/15 19:04:57  MultiScan[55]              Succeeded
03/15 19:00:06  MultiScan[53]              Succeeded

isilon-1#

あれ?「Critical Events」に何もないのに、「Cluster Health:ATTN」のまま?

大丈夫です。クラスタステータスの更新は少し時間が掛かっているだけでした。

isilon-1# isi status
Cluster Name: isilon
Cluster Health:     [  OK ]
Data Reduction:     1.08 : 1
Storage Efficiency: 0.27 : 1
Cluster Storage:  HDD                 SSD Storage
Size:             403.2G (548.1G Raw) 0 (0 Raw)
VHS Size:         144.9G
Used:             7.4G (2%)           0 (n/a)
Avail:            395.8G (98%)        0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage
ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size
---+---------------+-----+-----+-----+-----+-----------------+-----------------
  1|172.17.44.85   | OK  |    0| 200k| 200k| 1.0G/57.6G(  2%)|(No Storage SSDs)
  2|n/a            | OK  |    0|    0|    0| 792M/57.6G(  1%)|(No Storage SSDs)
  3|172.17.44.87   | OK  |    0| 244k| 244k| 1.2G/57.6G(  2%)|(No Storage SSDs)
  4|172.17.44.88   | OK  |    0|24.9k|24.9k| 1.0G/57.6G(  2%)|(No Storage SSDs)
  5|172.17.44.89   | OK  |    0| 128k| 128k| 924M/57.6G(  2%)|(No Storage SSDs)
  6|172.17.44.86   | OK  |    0|33.3k|33.3k| 1.3G/57.6G(  2%)|(No Storage SSDs)
  7|n/a            | OK  |    0| 175k| 175k| 1.2G/57.6G(  2%)|(No Storage SSDs)
---+---------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals:          |    0| 805k| 805k| 7.4G/ 403G(  2%)|(No Storage SSDs)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Events:
Time            LNN  Event
--------------- ---- -------------------------------------------------------


Cluster Job Status:

No running jobs.

No paused or waiting jobs.

No failed jobs.

Recent job results:
Time            Job                        Event
--------------- -------------------------- ------------------------------
03/16 04:00:25  ShadowStoreProtect[63]     Succeeded
03/16 02:00:13  WormQueue[62]              Succeeded
03/16 00:01:00  ShadowStoreDelete[61]      Succeeded
03/15 22:12:38  SnapshotDelete[60]         Succeeded
03/15 22:02:43  FSAnalyze[59]              Succeeded
03/15 22:01:12  SmartPools[58]             Succeeded
03/15 20:00:26  ShadowStoreProtect[57]     Succeeded
03/15 19:16:11  MultiScan[56]              Succeeded
03/15 19:04:57  MultiScan[55]              Succeeded
03/15 19:00:06  MultiScan[53]              Succeeded

isilon-1#

コメントを残す

メールアドレスが公開されることはありません。

このサイトはスパムを低減するために Akismet を使っています。コメントデータの処理方法の詳細はこちらをご覧ください