linux篇

您所在的位置:网站首页 服务器无缘无故自动重启了怎么回事 linux篇

linux篇

2024-07-08 14:53:20| 来源: 网络整理| 查看: 265

背景:最近发现公司一台惠普服务器异常自动重启了,所以尝试排查下了原因。

排查步骤:

1、登录机器,执行last或uptime等命令,查看重启时间

​$ last | grep reboot reboot system boot 3.10.0-1160.24.1 Mon Oct 11 19:19 - 10:49 (15:30) reboot system boot 3.10.0-1160.24.1 Wed Oct 6 14:08 - 10:49 (5+20:41) reboot system boot 3.10.0-1160.24.1 Mon Oct 4 13:03 - 10:49 (7+21:46) reboot system boot 3.10.0-1160.24.1 Sun Oct 3 21:39 - 10:49 (8+13:10) reboot system boot 3.10.0-1160.24.1 Sun Oct 3 09:12 - 10:49 (9+01:37) reboot system boot 3.10.0-1160.24.1 Sat Sep 25 23:13 - 10:49 (16+11:36) $ uptime 10:53:27 up 15:34, 1 user, load average: 2.43, 1.74, 1.43

2、查看系统相关日志(如dmesg、/var/log/messages、kdump等)

dmesg:开机日志

$ dmesg | grep -Ei 'error|Fail' [ 0.000000] tsc: Fast TSC calibration failed [ 3.120763] pci 0000:12:00.1: BAR 6: failed to assign [mem size 0x00080000 pref] [ 3.178571] pci 0000:5c:00.0: BAR 6: failed to assign [mem size 0x00200000 pref] [ 3.223819] pci 0000:5d:00.1: BAR 6: failed to assign [mem size 0x00080000 pref] [ 3.240238] pci 0000:5d:00.2: BAR 6: failed to assign [mem size 0x00080000 pref] [ 3.256824] pci 0000:5d:00.3: BAR 6: failed to assign [mem size 0x00080000 pref] [ 3.366034] pci 0000:00:14.0: xHCI BIOS handoff failed (BIOS bug ?) 00012201 [ 4.635351] ioapic: probe of 0000:00:05.4 failed with error -22 [ 4.642051] ioapic: probe of 0000:11:05.4 failed with error -22 [ 4.648757] ioapic: probe of 0000:36:05.4 failed with error -22 [ 4.655459] ioapic: probe of 0000:5b:05.4 failed with error -22 [ 4.662176] ioapic: probe of 0000:80:05.4 failed with error -22 [ 4.668874] ioapic: probe of 0000:85:05.4 failed with error -22 [ 4.675576] ioapic: probe of 0000:ae:05.4 failed with error -22 [ 4.682278] ioapic: probe of 0000:d7:05.4 failed with error -22 [ 4.716010] ERST: Error Record Serialization Table (ERST) support is initialized. [ 6.058884] smartpqi: module verification failed: signature and/or required key missing - tainting kernel [24726.679793] tsar[94262]: segfault at fffffffffffffff0 ip 00007fd5cddf5dd6 sp 00007fff9aa2c608 error 5 in libc-2.17.so[7fd5cdca0000+1c3000] [24737.612788] tsar[95267]: segfault at fffffffffffffff0 ip 00007f9205c50dd6 sp 00007ffe55047368 error 5 in libc-2.17.so[7f9205afb000+1c3000] [24740.345420] tsar[95426]: segfault at fffffffffffffff0 ip 00007f99f20efdd6 sp 00007ffe70032fa8 error 5 in libc-2.17.so[7f99f1f9a000+1c3000]

/var/log/messages:系统日志

$ grep -Ei 'error|Fail' /var/log/messages Oct 11 19:19:35 kuyun.a01.host kernel: tsc: Fast TSC calibration failed Oct 11 19:19:35 kuyun.a01.host kernel: pci 0000:12:00.1: BAR 6: failed to assign [mem size 0x00080000 pref] Oct 11 19:19:35 kuyun.a01.host kernel: pci 0000:5c:00.0: BAR 6: failed to assign [mem size 0x00200000 pref] Oct 11 19:19:35 kuyun.a01.host kernel: pci 0000:5d:00.1: BAR 6: failed to assign [mem size 0x00080000 pref] Oct 11 19:19:35 kuyun.a01.host kernel: pci 0000:5d:00.2: BAR 6: failed to assign [mem size 0x00080000 pref] Oct 11 19:19:35 kuyun.a01.host kernel: pci 0000:5d:00.3: BAR 6: failed to assign [mem size 0x00080000 pref] Oct 11 19:19:35 kuyun.a01.host kernel: pci 0000:00:14.0: xHCI BIOS handoff failed (BIOS bug ?) 00012201 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:00:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:11:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:36:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:5b:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:80:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:85:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:ae:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ioapic: probe of 0000:d7:05.4 failed with error -22 Oct 11 19:19:35 kuyun.a01.host kernel: ERST: Error Record Serialization Table (ERST) support is initialized. Oct 11 19:19:35 kuyun.a01.host kernel: smartpqi: module verification failed: signature and/or required key missing - tainting kernel Oct 11 19:19:35 kuyun.a01.host systemd[1]: Failed to start Configure CPU turboboost. Oct 11 19:19:35 kuyun.a01.host systemd[1]: Unit cpunoturbo.service entered failed state. Oct 11 19:19:35 kuyun.a01.host systemd[1]: cpunoturbo.service failed. Oct 11 19:19:35 kuyun.a01.host syslog-ng[1144]: [2021-10-11T19:19:35.010289] Error resolving hostname; host='syslog.tbsite.net' Oct 11 19:19:35 kuyun.a01.host syslog-ng[1144]: [2021-10-11T19:19:35.010373] Initiating connection failed, reconnecting; time_reopen='10' Oct 11 19:19:39 kuyun.a01.host systemd[1562]: Failed at step EXEC spawning /home/staragent/bin/agent.sh: No such file or directory Oct 11 19:19:39 kuyun.a01.host systemd[1]: Failed to start StarAgent2.0. Oct 11 19:19:39 kuyun.a01.host systemd[1]: Unit staragentctl.service entered failed state. Oct 11 19:19:39 kuyun.a01.host systemd[1]: staragentctl.service failed. Oct 11 19:21:22 kuyun.a01.host useradd[9397]: failed adding user 'terminal', exit code: 9 Oct 12 02:11:08 kuyun.a01.host kernel: tsar[94262]: segfault at fffffffffffffff0 ip 00007fd5cddf5dd6 sp 00007fff9aa2c608 error 5 in libc-2.17.so[7fd5cdca0000+1c3000] Oct 12 02:11:19 kuyun.a01.host kernel: tsar[95267]: segfault at fffffffffffffff0 ip 00007f9205c50dd6 sp 00007ffe55047368 error 5 in libc-2.17.so[7f9205afb000+1c3000] Oct 12 02:11:22 kuyun.a01.host kernel: tsar[95426]: segfault at fffffffffffffff0 ip 00007f99f20efdd6 sp 00007ffe70032fa8 error 5 in libc-2.17.so[7f99f1f9a000+1c3000]

kdump:宕机日志

kdump服务的log日志文件路径在/var/crash/目录下,但当时没看到有日志生成。

$ grep -Ei 'fail|error' /var/crash//vmcore-dmesg.txt

从系统日志中看到内核有个报错:ERST: Error Record Serialization Table (ERST) support is initialized.

ERST报错可参考说明:https://access.redhat.com/solutions/527433

3、登录服务器的带外管理后台查看下相关日志

因为公司的这台惠普服务器有带外管理页面,所以就直接登录进去看了,带外里面能看到具体的一些硬件报错信息,很方便。

于是进入到带外管理后台的 Integrated Management Log 页面,果然看到有一个CPU类型的硬件报错信息,如下:

Uncorrectable Machine Check Exception (Processor 2, APIC ID 0x00000038, Bank 0x00000003, Status 0xBE000000'00800400, Address 0xFFFFFFFF'81637323, Misc 0xFFFFFFFF'81637323).

建议是:

Update the system firmware. If the issue persists, contact support.

Learn more:https://techlibrary.hpe.com/docs/enterprise/servers/gen10/ilo5/en/class0x0005code0x0003-gen10.html

结论就是,这个要找到服务器厂家的售后工程师,协助排查并修复。



【本文地址】

公司简介

联系我们

今日新闻


点击排行

实验室常用的仪器、试剂和
说到实验室常用到的东西,主要就分为仪器、试剂和耗
不用再找了,全球10大实验
01、赛默飞世尔科技(热电)Thermo Fisher Scientif
三代水柜的量产巅峰T-72坦
作者:寞寒最近,西边闹腾挺大,本来小寞以为忙完这
通风柜跟实验室通风系统有
说到通风柜跟实验室通风,不少人都纠结二者到底是不
集消毒杀菌、烘干收纳为一
厨房是家里细菌较多的地方,潮湿的环境、没有完全密
实验室设备之全钢实验台如
全钢实验台是实验室家具中较为重要的家具之一,很多

推荐新闻


图片新闻

实验室药品柜的特性有哪些
实验室药品柜是实验室家具的重要组成部分之一,主要
小学科学实验中有哪些教学
计算机 计算器 一般 打孔器 打气筒 仪器车 显微镜
实验室各种仪器原理动图讲
1.紫外分光光谱UV分析原理:吸收紫外光能量,引起分
高中化学常见仪器及实验装
1、可加热仪器:2、计量仪器:(1)仪器A的名称:量
微生物操作主要设备和器具
今天盘点一下微生物操作主要设备和器具,别嫌我啰嗦
浅谈通风柜使用基本常识
 众所周知,通风柜功能中最主要的就是排气功能。在

专题文章

    CopyRight 2018-2019 实验室设备网 版权所有 win10的实时保护怎么永久关闭