首页 科技问答 苏亚东,某局点S6850 聚合口协议闪断问题

苏亚东,某局点S6850 聚合口协议闪断问题

科技问答 290
1676540055,

问题描述

两台S6850搭建DRNI,通过drbagg401和华为设备对接。现场10日出现bagg401闪断,具体查看对应时间的日志为成员口收到的lacp异常因此未选中。但是对端查看日志也是差不多的报错,两边都说对端发的有问题。由于只是之前的一次闪断,到今天也没有再次出现过,没法debug或者抓包看报文。

过程分析

1)对应端口并未物理down,只是协议lagg down,两台设备打印的报错还不一样,具体协议down的原因设备侧日志提示如下:

 

DR1:这台设备报的是无D标志位和key不对

 

%@6141%Oct 10 07:32:03:818 2022 DR1 LAGG/6/LAGG_INACTIVE_OPERSTATE: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because the peer port did not have the Synchronization flag.

%@6142%Oct 10 07:32:03:818 2022 DR1 LAGG/6/LAGG_INACTIVE_PARTNER_KEY_WRONG: Member port HGE1/0/28 of aggregation group BAGG401 changed to the inactive state, because the operational key of the peer port was different from that of the reference port.

%@6143%Oct 10 07:32:03:822 2022 DR1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.

%@6144%Oct 10 07:32:03:822 2022 DR1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/28 changed to down.

 

%@6148%Oct 10 07:32:03:855 2022 DR1 IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation401 changed to down.

%@6149%Oct 10 07:32:03:855 2022 DR1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation401 changed to down.

 

其中27down的原因是对端发的lacp报文中没有携带可聚合的标志位;28down的原因为对端端口的操作key和参考端口不同。

 

DR2:这台提示的是

 

%@13479%Oct 10 07:32:03:835 2022 DR2 LAGG/6/LAGG_INACTIVE_OTHER: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because other reason.

%@13480%Oct 10 07:32:03:835 2022 DR2 LAGG/6/LAGG_INACTIVE_OTHER: Member port HGE1/0/28 of aggregation group BAGG401 changed to the inactive state, because other reason.

%@13481%Oct 10 07:32:03:837 2022 DR2 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.

%@13482%Oct 10 07:32:03:838 2022 DR2 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/28 changed to down.

%@13483%Oct 10 07:32:03:842 2022 DR2 DRNI/6/DRNI_IFEVENT_DR_NOSELECTED: Local DR interface Bridge-Aggregation401 in DR group 401 does not have Selected member ports because the aggregate interface went down. Please check the aggregate link status.

%@13484%Oct 10 07:32:03:842 2022 DR2 DRNI/6/DRNI_IFEVENT_DR_GLOBALDOWN: The state of DR group 401 changed to down.

%@13485%Oct 10 07:32:03:880 2022 DR2 IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation401 changed to down.

2)基于上述信息来看似乎是对端设备发送的lacp报文存在问题,但是对端huawei设备的日志看也是指向我们设备发出的lacp存在异常:

 

对端设备打印日志如下:

Oct 10 2022 07:32:03+08:00 HW %%01LACP/3/LAG_DOWN_REASON_PDU(l)[258]:The member of the LACP mode Eth-Trunk interface went down because the local device received changed LACP PDU from partner. (TrunkName=Eth-Trunk27, PortName=100GE1/0/3, Reason=PartnerSyncFalse, OldParam=b1Synchronization:1, NewParam=b1Synchronization:0)

Oct 10 2022 07:32:03+08:00 HW %%01LACP/3/LAG_DOWN_REASON_PDU(l)[259]:The member of the LACP mode Eth-Trunk interface went down because the local device received changed LACP PDU from partner. (TrunkName=Eth-Trunk27, PortName=100GE0/0/4, Reason=PartnerSyncFalse, OldParam=b1Synchronization:1, NewParam=b1Synchronization:0)

Oct 10 2022 07:32:03+08:00 HW %%01LACP/3/OPTICAL_FIBER_MISCONNECT(l)[260]:The member of the LACP mode Eth-Trunk interface received an abnormal LACPDU, which may be caused by optical fiber misconnection. (TrunkName=Eth-Trunk27, PortName=100GE0/0/3, LocalParam=ActorOperPortKey:6993, PDUParam=PartnerKey:1089)

3)查看本端最新聚合信息如下,本地设备的maca8c9-8a34-c4e1,对端是b008-7565-4900

Aggregate Interface: Bridge-Aggregation401

Creation Mode: Manual

Aggregation Mode: Dynamic

Loadsharing Type: Shar

Management VLANs: None

System ID: 0xa, a8c9-8a36-c4e1

Local:

  Port         Status   Priority     Index   Oper-Key         Flag

HGE1/0/27(R)  S        32768    16392    40401         {ACDEF}

  HGE1/0/28    S        32768    16393    40401         {ACDEF}

Remote:

  Actor          Priority   Index    Oper-Key  SystemID             Flag  

  HGE1/0/27     32768    2        6993     0x8000, b008-7565-4900  {ACDEF}

  HGE1/0/28     32768    40       6993     0x8000, b008-7565-4900  {ACDEF}

 

System ID

设备ID(由系统的LACP优先级和系统的MAC地址共同构成)

 

4)查看选中端口收到的聚合报文,聚合震荡时收到了异常报文:

通过probe视图下的display system internal link-aggregation lacp packet interface te x/0/x count 20命令可以查看到设备收到的报文,中间有个错误报文

该异常报文的解析为:

SystemID对端为32768,本端为32768

SystemMAC对端为b008-7565-4900,本端为5825-7570-a3c0

详细如下:

[ZJHZ-IXP22-NET-PE-H3C-S6850-49-probe]display system internal link-aggregation lacp packet interface h 1/0/27 count 20

Data and Time: 10/10 07:32:03.841

Packet description:

Local:  SystemID=32768 SystemMAC=b008-7565-4900 Key=6993 Index=2 Priority=32768 Flag=13

Remote: SystemID=10 SystemMAC=a8c9-8a36-c4e1 Key=40401 Index=16392 Priority=32768 Flag=5     //正常对端发的lacp应该是这个

Data and Time: 10/10 07:32:03.807

Packet description:

Local:  SystemID=32768 SystemMAC=b008-7565-4900 Key=1089 Index=27 Priority=32768 Flag=61

Remote: SystemID=32768 SystemMAC=5825-7570-a3c0 Key=7745 Index=54 Priority=32768 Flag=61           //端口震荡时,对端设备报文发串了,把发给5825-7570-a3c0的报文发给了我们

 

 

display system internal link-aggregation lacp packet interface te 1/0/18 count 20

对应时间点报文无问题

[ZJHZ-IXP22-NET-PE-H3C-S6850-50-probe]display system internal link-aggregation lacp packet interface h 1/0/27 count 20

Aggregate interface: Bridge-Aggregation401

Data and Time: 10/10 07:32:04.003

Packet description:

Local:  SystemID=32768 SystemMAC=b008-7565-4900 Key=6993 Index=39 Priority=32768 Flag=61

Remote: SystemID=10 SystemMAC=a8c9-8a36-c4e1 Key=40401 Index=32776 Priority=32768 Flag=13

 

但是其他时间点也有异常报文,对应聚合端口也有震荡

Data and Time: 09/28 09:06:20.939

Packet description:

Local:  SystemID=32768 SystemMAC=b008-7565-4900 Key=1089 Index=27 Priority=32768 Flag=61

Remote: SystemID=32768 SystemMAC=5825-7570-a3c0 Key=7745 Index=54 Priority=32768 Flag=61

%@13458%Sep 28 09:06:20:961 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 LAGG/6/LAGG_INACTIVE_PARTNER_KEY_WRONG: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because the operational key of the peer port was different from that of the reference port.

%@13459%Sep 28 09:06:20:964 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.

 

Data and Time: 09/24 08:45:57.722

Packet description:

Local:  SystemID=32768 SystemMAC=b008-7565-4900 Key=1089 Index=31 Priority=32768 Flag=61

Remote: SystemID=32768 SystemMAC=5825-7570-a3c0 Key=7745 Index=58 Priority=32768 Flag=61

%@13330%Sep 24 08:45:57:730 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 LAGG/6/LAGG_INACTIVE_PARTNER_KEY_WRONG: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because the operational key of the peer port was different from that of the reference port.

%@13331%Sep 24 08:45:57:732 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.

解决方法

1、 排查对端设备发送异常lacp报文的原因。

CRM论坛(CRMbbs.com)——一个让用户更懂CRM的垂直性行业内容平台,CRM论坛致力于互联网、客户管理、销售管理、SCRM私域流量内容输出5年。 如果您有好的内容,欢迎向我们投稿,共建CRM多元化生态体系,创建CRM客户管理一体化生态解决方案。本文来源:知了社区基于知识共享署名-相同方式共享3.0中国大陆许可协议,某局点S6850 聚合口协议闪断问题