首页 科技问答 张文宁,某局点S5560X-EI下挂业务突然中断10min问题

张文宁,某局点S5560X-EI下挂业务突然中断10min问题

科技问答 864
1676538359,

组网及说明

/

告警信息

/

问题描述

ADCAMPUS组网,S5560X作为leaf设备,62910:00左右,市公司认证业务所在leaf下的终端存在业务报障。前方尝试拔掉leaf与spine的互联口agg10241/0/262/0/26)单独留一根线,没有恢复,约15分钟左右后自动恢复:

%@90496%Jun 29 08:35:33:208 2022 leaf_shigongsi DRVPLAT/4/SOFTCAR DROP:

PktType=UKNOWN_SMAC, SrcMAC=dc4a-3e59-0fd6, Dropped from interface=GigabitEthernet1/0/4 at Stage=61, StageCnt=5322, TotalCnt=23216, MaxRateInterface=GigabitEthernet1/0/14.

%@90497%Jun 29 10:06:00:779 2022 leaf_shigongsi LAGG/6/LAGG_INACTIVE_PHYSTATE: Member port XGE2/0/26 of aggregation group BAGG1024 changed to the inactive state, because the physical or line protocol state of the port was down.

%@90498%Jun 29 10:06:00:790 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Ten-GigabitEthernet2/0/26 changed to down.

%@90499%Jun 29 10:06:00:842 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Ten-GigabitEthernet2/0/26 changed to down.

%@90500%Jun 29 10:06:02:958 2022 leaf_shigongsi OPTMOD/4/MODULE_OUT: -Slot=2; Ten-GigabitEthernet2/0/26: Transceiver absent.

%@90501%Jun 29 10:06:05:657 2022 leaf_shigongsi LAGG/4/LACP_MAD_INTERFACE_CHANGE_STATE: LACP MAD function enabled on Bridge-Aggregation1024 changed to the faulty state.

%@90502%Jun 29 10:06:19:705 2022 leaf_shigongsi OPTMOD/4/MODULE_IN: -Slot=2; Ten-GigabitEthernet2/0/26: The transceiver is STACK_SFP_PLUS.

%@90503%Jun 29 10:06:32:439 2022 leaf_shigongsi LAGG/6/LAGG_INACTIVE_PHYSTATE: Member port XGE1/0/26 of aggregation group BAGG1024 changed to the inactive state, because the physical or line protocol state of the port was down.

%@90504%Jun 29 10:06:32:457 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Ten-GigabitEthernet1/0/26 changed to down.

%@90505%Jun 29 10:06:32:490 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Ten-GigabitEthernet1/0/26 changed to down.

%@90506%Jun 29 10:06:32:815 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation1024 changed to down.

%@90507%Jun 29 10:06:32:821 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation1024 changed to down.

%@90508%Jun 29 10:06:41:578 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Ten-GigabitEthernet2/0/26 changed to up.

%@90509%Jun 29 10:06:41:912 2022 leaf_shigongsi RADIUS/4/RADIUS_AUTH_SERVER_DOWN: RADIUS authentication server was blocked: server IP=192.168.7.220, port=1812, VPN instance=vpn-default.

%@90510%Jun 29 10:06:42:748 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Ten-GigabitEthernet2/0/26 changed to down.

%@90511%Jun 29 10:06:42:993 2022 leaf_shigongsi RADIUS/4/RADIUS_ACCT_SERVER_DOWN: RADIUS accounting server was blocked: server IP=192.168.7.220, port=1813, VPN instance=vpn-default.

过程分析

1、查看日志,设备存在ARP报文冲击的告警:

 %@90519%Jun 29 10:06:48:793 2022 leaf_shigongsi DRVPLAT/4/SOFTCAR DROP: -Slot=2;

PktType=UKNOWN_SMAC, SrcMAC=6c0b-846b-be0a, Dropped from interface=GigabitEthernet2/0/3 at Stage=1, StageCnt=1344, TotalCnt=14945, MaxRateInterface=GigabitEthernet2/0/4.

%@90520%Jun 29 10:06:49:536 2022 leaf_shigongsi OFP/5/OFP_DISCONNECT: Openflow instance 1,  controller 2 is disconnected.disconnected reason:Echo timeout.

%@90521%Jun 29 10:06:50:539 2022 leaf_shigongsi OFP/5/OFP_DISCONNECT: Openflow instance 1,  controller 1 is disconnected.disconnected reason:Echo timeout.

%@90522%Jun 29 10:06:50:848 2022 leaf_shigongsi OFP/5/OFP_FAIL_OPEN: Openflow instance 1 is in fail secure mode.

%@90523%Jun 29 10:06:51:122 2022 leaf_shigongsi DRVPLAT/4/DrvDebug: -Slot=2;

 Rx/Tx failure recovered between the CPU and switching chip on slot 2.

%@90524%Jun 29 10:06:55:186 2022 leaf_shigongsi DRVPLAT/4/DrvDebug: -Slot=2;

 Rx/Tx failure recovered between the CPU and switching chip on slot 2.

%@90525%Jun 29 10:06:59:237 2022 leaf_shigongsi OPTMOD/4/MODULE_OUT: Ten-GigabitEthernet1/0/26: Transceiver absent.

%@90526%Jun 29 10:07:06:630 2022 leaf_shigongsi ARP/6/ARP_PKTQUE_ALERT: The current size of the ARP_PKT queue has reached 4244. Please check the network environment.

 

 

%@90638%Jun 29 10:10:25:082 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation1024 changed to down.

%@90639%Jun 29 10:10:25:543 2022 leaf_shigongsi LLDP/5/LLDP_NEIGHBOR_AGE_OUT: -Slot=2; Nearest bridge agent neighbor aged out on port Ten-GigabitEthernet2/0/26 (IfIndex 89), neighbor's chassis ID is 542b-de70-6a00, port ID is Ten-GigabitEthernet1/4/0/1.

%@90640%Jun 29 10:10:27:646 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Ten-GigabitEthernet2/0/26 changed to up.

%@90641%Jun 29 10:10:27:387 2022 leaf_shigongsi LLDP/6/LLDP_CREATE_NEIGHBOR: -Slot=2; Nearest bridge agent neighbor created on port Ten-GigabitEthernet2/0/26 (IfIndex 89), neighbor's chassis ID is 542b-de70-6a00, port ID is Ten-GigabitEthernet2/4/0/1.

%@90642%Jun 29 10:10:27:858 2022 leaf_shigongsi LAGG/6/LAGG_ACTIVE: Member port XGE2/0/26 of aggregation group BAGG1024 changed to the active state.

%@90643%Jun 29 10:10:27:968 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation1024 changed to up.

%@90644%Jun 29 10:10:27:983 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation1024 changed to up.

%@90645%Jun 29 10:10:28:087 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Ten-GigabitEthernet2/0/26 changed to up.

%@90646%Jun 29 10:10:32:613 2022 leaf_shigongsi OPTMOD/4/MODULE_OUT: Ten-GigabitEthernet1/0/26: Transceiver absent.

%@90647%Jun 29 10:10:36:207 2022 leaf_shigongsi RADIUS/4/RADIUS_ACCT_SERVER_DOWN: RADIUS accounting server was blocked: server IP=192.168.7.220, port=1813, VPN instance=vpn-default.

%@90648%Jun 29 10:10:37:590 2022 leaf_shigongsi DRVPLAT/4/DrvDebug: -Slot=2;

 Rx/Tx failure recovered between the CPU and switching chip on slot 2.

%@90649%Jun 29 10:10:38:050 2022 leaf_shigongsi OPTMOD/4/MODULE_IN: Ten-GigabitEthernet1/0/26: The transceiver is STACK_SFP_PLUS.

%@90650%Jun 29 10:10:39:790 2022 leaf_shigongsi IFNET/3/PHY_UPDOWN: Physical state on the interface Ten-GigabitEthernet1/0/26 changed to up.

%@90651%Jun 29 10:10:39:892 2022 leaf_shigongsi LLDP/6/LLDP_CREATE_NEIGHBOR: Nearest bridge agent neighbor created on port Ten-GigabitEthernet1/0/26 (IfIndex 26), neighbor's chassis ID is 542b-de70-6a00, port ID is Ten-GigabitEthernet1/4/0/1.

%@90652%Jun 29 10:10:41:440 2022 leaf_shigongsi LAGG/6/LAGG_ACTIVE: Member port XGE1/0/26 of aggregation group BAGG1024 changed to the active state.

%@90653%Jun 29 10:10:42:254 2022 leaf_shigongsi IFNET/5/LINK_UPDOWN: Line protocol state on the interface Ten-GigabitEthernet1/0/26 changed to up.

%@90654%Jun 29 10:10:42:535 2022 leaf_shigongsi OFP/5/OFP_DISCONNECT: Openflow instance 1,  controller 2 is disconnected.disconnected reason:Echo timeout.

%@90655%Jun 29 10:10:43:806 2022 leaf_shigongsi DRVPLAT/4/DrvDebug: -Slot=2;

 Rx/Tx failure recovered between the CPU and switching chip on slot 2.

%@90656%Jun 29 10:10:44:541 2022 leaf_shigongsi OFP/5/OFP_DISCONNECT: Openflow instance 1,  controller 1 is disconnected.disconnected reason:Echo timeout.

%@90657%Jun 29 10:10:44:562 2022 leaf_shigongsi OFP/5/OFP_FAIL_OPEN: Openflow instance 1 is in fail secure mode.

进一步查看Arp模块的收发报文情况,基本都在200pps以上,有时会突发超过300

[leaf_shigongsi-probe]debug rxtx sof sh sl 1 30

ID  Type                RcvPps Rcv_All    DisPkt_All Pps  Dyn Swi Hash Am APps

30  ARP                 220    615662430  110942     600  S   On  SMAC 8

排查发现前几个端口比较多,建议排查一下对应的接入设备

[leaf_shigongsi-probe]debug rxtx soft 30 portdetail sl 1

 

Softcar Type  ARP  PortStatusFetchCnt=18418710

 

Port  Lvl  Atk  Packet/s  DisPkt/s  Pack_tol  DisP_tol  Pps  Prop  ENum/s  Eport

0     0    0    104       0         79584199  0         600  0  0   104    699

1     0    0    24        0         30194771  44        600  0  0   24     997

2     0    0    66        0         115171520  227       600  0  0   66     526

3     0    0    10        0         23947785  0         600  0  0   10     798

4     0    0    6         0         18762585  0         600  0  0   6      748

5     0    0    39        0         44471179  0         600  0  0   39     1084

 

  ===============display lldp neighbor-information list=============== 

Chassis ID : * -- -- Nearest nontpmr bridge neighbor

             # -- -- Nearest customer bridge neighbor

             Default -- -- Nearest bridge neighbor

Local Interface Chassis ID      Port ID                    System Name         

GE1/0/1         f875-88e3-9e30  XGigabitEthernet1/0/1      2F-S5720

GE1/0/2         f875-88e3-9e30  XGigabitEthernet3/0/1      2F-S5720

GE1/0/3         f875-88e3-9e40  XGigabitEthernet3/0/1      9F-S5720

GE1/0/4         446a-2eae-b6a0  XGigabitEthernet1/0/1      13F-5720

GE1/0/5         446a-2eae-b6a0  XGigabitEthernet3/0/1      13F-5720

GE1/0/6         f875-8868-8390  XGigabitEthernet1/0/1      18F-4320

经排查,基本确认是近期安装的内网通软件(类似于飞秋,基于ARP的)导致的大量arp冲击设备cpu,导致上述故障。

解决方法

1、 排查内网通软件,减少arp报文的发送。

CRM论坛(CRMbbs.com)——一个让用户更懂CRM的垂直性行业内容平台,CRM论坛致力于互联网、客户管理、销售管理、SCRM私域流量内容输出5年。 如果您有好的内容,欢迎向我们投稿,共建CRM多元化生态体系,创建CRM客户管理一体化生态解决方案。本文来源:知了社区基于知识共享署名-相同方式共享3.0中国大陆许可协议,某局点S5560X-EI下挂业务突然中断10min问题