SR870BH2 Machine Check Error Handling Intel® Server Platform SR870BH2Revision 1.185.3 Error SignalingThere are two classes of error events: Machine Check Error Events: A processor machine check occurs when the processordetects a fatal or recoverable error during execution of instructions or when theprocessor is signaled by the platform to enter machine check. Machine Check Architecture (MCA): The MCA can be either local or global. In theevent of an MCA, the processor will take the exception at instruction boundary withhighest priority. In the event of a local abort, the affected processor will enter MCAhandling mode. If the event is global, all processors will enter MCA handling mode.Uncorrectable Error Events:• Local MCA: A local MCA is taken by the processor when it reads data withuncorrectable errors, or receives a hard fail response to a transaction. There aretwo types of machine check events: local and global. A local MCA is when anindividual processor enters machine check. Some examples of local machinechecks include a Distributed Translation Lookaside Buffer (DTLB) data parityerror, or when the processor consumes data with an uncorrectable error.• Global MCA: A machine check is global when all processors enter machinecheck. A machine check is global when all processors enter machine check. Onthe SR870BH2 platform, the method used to get all processors into machinecheck are the BINIT# and BERR# signals. The processor asserts BINIT#, orthere is an assertion of BERR# by the processor or platform. The processor canassert BINIT# on a transaction time-out event. BERR# is asserted by theplatform on platform-fatal errors, and can be programmed to assert BERR# whenan uncorrectable error is detected on I/O read data.Correctable Error Events:• Corrected Machine Check (CMC): Corrected Machine Check Interrupt (CMCI):Corrected processor errors are signaled as a CMCI to system software. Forexample, L1 tag parity errors, on shared lines or thermal events, are corrected bythe processor (logic or the PAL). System software must insure that the interrupthandler for CMCI executes on the same processor that signaled the correctederror event.• Corrected Platform Errors (CPE): These interrupts are signaled by the platform orthe SAL. These include errors that are corrected by the platform (such as single-bit ECC error in memory) and errors that are not correctable by the platform. Ineither case, the error is contained (i.e., data poisoning), and the platform can stillfunction reliably. One example of an uncorrected error is a 2XECC error detectedon a write to memory.