![IBM Flex System p260 Installation And Service Manual Manual pdf 119 page image](https://manuals-archive2.s3.amazonaws.com/3494863/f1acbde59244c835f68015342d8c165c119f.jpg)
DiagnosticsUse the available diagnostic tools to help solve any problems that might occur in the compute node.The first and most crucial component of a solid serviceability strategy is the ability to accurately andeffectively detect errors when they occur. While not all errors are a threat to system availability, those thatgo undetected are dangerous because the system does not have the opportunity to evaluate and act ifnecessary. POWER7 processor-based systems are specifically designed with error-detection mechanismsthat extend from processor cores and memory to power supplies and hard drives.POWER7 processor-based systems contain specialized hardware detection circuitry for detectingerroneous hardware operations. Error checking hardware ranges from parity error detection coupled withprocessor instruction retry and bus retry, to ECC correction on caches and system buses.IBM hardware error checkers have these distinct attributes:v Continuous monitoring of system operations to detect potential calculation errorsv Attempted isolation of physical faults based on runtime detection of each unique failurev Initiation of a wide variety of recovery mechanisms designed to correct a problemPOWER7 processor-based systems include extensive hardware and firmware recovery logic.Machine check handlingMachine checks are handled by firmware. When a machine check occurs, the firmware analyzes the errorto identify the failing device and creates an error log entry.If the system degrades to the point that the service processor cannot reach standby state, the ability toanalyze the error does not exist. If the error occurs during hypervisor activities, the hypervisor initiates asystem reboot.In partitioned mode, an error that occurs during partition activity is reported to the operating system inthe partition.Diagnostic toolsTools are available to help you diagnose and solve hardware-related problems.v Power-on self-test (POST) progress codes (checkpoints), error codes, and isolation proceduresThe POST checks out the hardware at system initialization. IPL diagnostic functions test some systemcomponents and interconnections. The POST generates eight-digit checkpoints to mark the progress ofpowering up the compute node.Use the management module to view progress codes.The documentation of a progress code includes recovery actions for system hangs. See “POST progresscodes (checkpoints)” on page 233 for more information.If the service processor detects a problem during POST, an error code is logged in the managementmodule event log. Error codes are also logged in the Linux syslog or AIX diagnostic log, if possible.See “System reference codes (SRCs)” on page 116.The service processor can generate codes that point to specific isolation procedures. See “Serviceprocessor problems” on page 467.v Light path diagnosticsUse the light path diagnostic LEDs to identify failing hardware. If the enclosure fault LED on the frontor rear of the IBM Flex System Enterprise Chassis is lit, one or more fault LEDs on the compute nodewill also be lit. Use the light path diagnostic LEDs on the compute node to help identify the failingitem.Chapter 8. Troubleshooting 107