120 IBM BladeCenter PS703 and PS704 Technical Overview and Introduction4.1 IntroductionEach successive generation of IBM servers is designed to be more reliable than the previousserver family. POWER7 processor-based servers have new features to support new levels ofvirtualization, ease administrative burden, and increase system use.Reliability starts with components, devices, and subsystems designed to be fault-tolerant.POWER7 uses lower voltage technology, improving reliability with stacked latches to reducesoft error (SER) susceptibility. During the design and development process, subsystems gothrough rigorous verification and integration testing processes. During system manufacturing,systems go through a thorough testing process to ensure high product quality levels.The processor and memory subsystem contain a number of features designed to avoid orcorrect environmentally induced, single-bit, intermittent failures as well as handle solid faultsin components. This includes selective redundancy to tolerate certain faults without requiringan outage or parts replacement.The PS703 and PS704 blades are used with a BladeCenter chassis and the variouscomponents that make up the BladeCenter infrastructure. In general, the BladeCenterinfrastructure RAS is outside the scope of this chapter. However, when appropriate, theBladeCenter features that enable, complement, or enhance RAS functionality on the PS703and PS704 blades are discussed.IBM is the only vendor that designs, manufactures, and integrates its most critical servercomponents: POWER processors Caches Memory buffers Hub-controllers Clock cards Service processorsDesign and manufacturing verification and integration, along with field support feedback,informs and motivates continued improvement on the final products.This chapter includes a manageability section describing the means to successfully manageyour systems.Several software-based availability features exist that are based on the benefits availablewhen using AIX and IBM i as the operating system. Support of these features when usingLinux varies.4.2 ReliabilityHighly reliable systems are built with highly reliable components. On IBM POWERprocessor-based systems, this basic principle is expanded upon with a clear design forreliability architecture and methodology. A concentrated, systematic, architecture-basedapproach is designed to improve overall system reliability with each successive generation ofsystem offerings.