Run-Time CPU Deconfiguration (CPU Gard)L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache correctable errorsare monitored by the processor runtime diagnostics (PRD) firmware running on the service processor.When a predefined error threshold is met, an error log with warning severity and threshold exceededstatus is returned to AIX. At the same time, PRD marks the CPU for deconfiguration at the next boot. AIXwill attempt to migrate all resources associated with that processor to another processor and then stop thedefective processor.Service Processor System Monitoring - SurveillanceSurveillance is a function in which the service processor monitors the system, and the system monitors theservice processor. This monitoring is accomplished by periodic samplings called heartbeats.Surveillance is available during the following phases:v System firmware bringup (automatic)v Operating system runtime (optional)System Firmware SurveillanceSystem firmware surveillance is automatically enabled during system power-on. It cannot be disabled bythe user, and the surveillance interval and surveillance delay cannot be changed by the user.If the service processor detects no heartbeats during system IPL (for a set period of time), it cycles thesystem power to attempt a reboot. The maximum number of retries is set from the service processormenus. If the fail condition persists, the service processor leaves the machine powered on, logs an error,and displays menus to the user. If Call-out is enabled, the service processor calls to report the failure anddisplays the operating-system surveillance failure code on the operator panel on the HMC.Operating System SurveillanceNote: This function is not available on a partitioned system.Operating system surveillance provides the service processor with a means to detect hang conditions, aswell as hardware or software failures, while the operating system is running. It also provides the operatingsystem with a means to detect a service processor failure caused by the lack of a return heartbeat.Operating system surveillance is not enabled by default, allowing you to run operating systems that do notsupport this service processor option.You can also use service processor menus and AIX service aids to enable or disable operating systemsurveillance.For operating system surveillance to work correctly, you must set these parameters:v Surveillance enable/disablev Surveillance intervalThe maximum time the service processor should wait for a heartbeat from the operating system beforetimeout.v Surveillance delayThe length of time to wait from the time the operating system is started to when the first heartbeat isexpected.Surveillance does not take effect until the next time the operating system is started after the parametershave been set.Chapter 4. Using the Service Processor 47