首页 科技问答 宣轩,3par8200 node 0 热重启

宣轩,3par8200 node 0 热重启

科技问答 268
1676539199,

问题描述

3PAR8200 node 0 热重启


过程分析

Release version 3.3.1 (MU5)

  Patches:  P126,P132,P135,P140,P146,P150,P151,P155,P156,P164,P170,P172,P173

 

  showeeprom显示node  02022-08-25 11:13:47 CST出现重启

  ndoe 0

  --------

        Board revision: 0920-200048.B4

              Assembly: FXN 2019/17 Serial 438870

         System serial: CN792305WP

            System W19: 0x23EDE

          BIOS version: 5.5.7

            OS version: 3.3.1.648

          Reset reason: ALIVE_L

             Last boot: 2022-08-25 11:13:47 CST

     Last cluster join: 2022-08-25 11:14:07 CST

            Last panic: 2022-08-25 11:08:47 CST

    Last panic request: Never

     Error ignore code: 00

           SMI context: 00

         Last HBA mode: 2a000000

            BIOS state: 80 ff 24 27 28 29 2a 2c

             TPD state: ff ff ff ff ff ff ff ff

  Code 128 (BIOS update) - Subcode 0x2050507 (2050404)    2022-08-12 21:53:23 CST

  Code 128 (BIOS update) - Subcode 0x2050404 (2050236)    2020-08-04 11:03:01 CST

  Code 61 (AC Power Loss) - Subcode 0x0 (0)               2020-04-14 16:22:50 CST

  Code 61 (AC Power Loss) - Subcode 0x0 (0)               2019-06-09 21:01:54 CST

 

  \INSPLO~4.194\var\core\nemoe\NODE0-~1\N0_fa_2022-08-25_11_08_48\显示在故障时间点CPU 0出现死锁61s,死锁的进程为kworker,当时运行的计划任务为rtc_timer周期任务

  3PAR(R) InForm(tm) OS 3.3.1.648 CN792305WP-0 ttyS0

  CN792305WP-0 login: [1084521.180755] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 61s! [kworker/5:1:1871]

  [1084521.201385] Kernel panic[5]: softlockup: hung tasks

  [1084521.211421] CPU: 5 PID: 1871 Comm: kworker/5:1 Tainted: P           O L ------------   3.10.0 #1

  [1084521.229231] Hardware name: HP Romley Platform, BIOS UDK_05.05.07 2019-11-01

  [1084521.243413] Workqueue: events rtc_timer_do_work

  [1084521.252765] Call Trace:

  [1084521.257961]  <IRQ>  [<ffffffff817688b6>] dump_stack+0x19/0x1b

  [1084521.269735]  [<ffffffff81767370>] panic+0x14f/0x27f

  [1084521.279770]  [<ffffffff811478c7>] watchdog_timer_fn+0x227/0x230

  [1084521.291879]  [<ffffffff811476a0>] ? watchdog_enable+0xa0/0xa0

  [1084521.303642]  [<ffffffff810ee18f>] __hrtimer_run_queues+0xaf/0x260

  [1084521.316098]  [<ffffffff8111bb9a>] ? ktime_get_update_offsets_now+0x5a/0x120

  [1084521.330280]  [<ffffffff810ee6f2>] hrtimer_interrupt+0xa2/0x1b0

  [1084521.342217]  [<ffffffff81096bbe>] local_apic_timer_interrupt+0x3e/0x70

  [1084521.355536]  [<ffffffff8177ce93>] smp_apic_timer_interrupt+0x43/0x60

  [1084521.368509]  [<ffffffff81779e42>] apic_timer_interrupt+0x162/0x170

  [1084521.381134]  <EOI>  [<ffffffff8176f3a5>] ? _raw_spin_unlock_irqrestore+0x15/0x20

  [1084521.396192]  [<ffffffff810f3ac4>] __wake_up+0x44/0x50

  [1084521.406573]  [<ffffffff8153f8ef>] rtc_handle_legacy_irq+0x9f/0xc0

  [1084521.419027]  [<ffffffff8153f948>] rtc_uie_update_irq+0x18/0x20

  [1084521.430962]  [<ffffffff8153fa87>] rtc_timer_do_work+0xd7/0x1d0

  [1084521.442897]  [<ffffffff810715ec>] ? __switch_to+0x12c/0x4f0

  [1084521.454314]  [<ffffffff8176d492>] ? __schedule+0x492/0xb00

  [1084521.465557]  [<ffffffff810e2c64>] process_one_work+0x1c4/0x4e0

  [1084521.477492]  [<ffffffff810e3d61>] worker_thread+0x121/0x430

  [1084521.488910]  [<ffffffff810e3c40>] ? manage_workers.isra.28+0x2b0/0x2b0

  [1084521.502228]  [<ffffffff810ea762>] kthread+0xc2/0xd0

  [1084521.512264]  [<ffffffff810ea6a0>] ? flush_kthread_worker+0x80/0x80

  [1084521.524890]  [<ffffffff81778e77>] ret_from_fork_nospec_begin+0x21/0x21

 [1084521.538210]  [<ffffffff810ea6a0>] ? flush_kthread_worker+0x80/0x80

  [1084521.550836] Kernel Offset: disabled


解决方法

问题的根源是rtc_timer_do_work()Linux内核函数的问题,当我们处理一些周期性任务时,陷入死循环,导致进入死锁。

预计在下一个大版本会进行修复;

临时规避措施:使用disable_soft_lockup.sh 脚本进行规避软件死锁问题

 操作方式:

 root 账户登录3PAR,进行下面的命令:

 cd /common/stbin

 ./disable_soft_lockup.sh --install 进行安装

 ./disable_soft_lockup.sh --verify 进行验证,kernel.softlockup_panic数值变成0

 


CRM论坛(CRMbbs.com)——一个让用户更懂CRM的垂直性行业内容平台,CRM论坛致力于互联网、客户管理、销售管理、SCRM私域流量内容输出5年。 如果您有好的内容,欢迎向我们投稿,共建CRM多元化生态体系,创建CRM客户管理一体化生态解决方案。本文来源:知了社区基于知识共享署名-相同方式共享3.0中国大陆许可协议,3par8200 node 0 热重启