Search the web
Sign In
New User? Sign Up
dvcug · Delaware Valley Computer Users Group
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
An interesting article - LINUX on the Power5 chip from IBM   Message List  
Reply | Forward Message #148 of 259 |

http://www.devx.com/ibm/Article/22211

Linux on Power: Suddenly, a Natural Fit
New features in IBM's Power5 processor support recent advances in the
kernel – providing elegant enterprise-level scalability not
previously seen in Linux systems.

by Andrew Binstock October 20, 2004


In late September of this year, a shocking announcement was made by
IBM: The company's BlueGene/L supercomputer took over the top spot in
the computer performance ratings.

The vanquished titleholder—a system built by NEC for meteorological
research—had sat atop the worldwide rankings since its roll-out in
2002. Its performance was attributable to the use of custom-built
processors designed for that specific machine. At nearly 36 teraflops
(trillion floating-point operations per second), the NEC system held
a wide lead over all other systems. Meanwhile, the number-two system
in the world could only manage 20 teraflops—barely half the
performance of the NEC supercomputer.

So what lies behind IBM's unexpected surge to the top of the list? In
part, the sudden leap is due to technology found in the company's
line of Power and PowerPC processors.

Power5, the latest generation of IBM's high-performance 64-bit RISC
processor, was released earlier this year and quickly established
itself as a remarkable chip. It sports many architectural features
that deliver superior performance, especially when running Linux.
This article describes these features and how they work with Linux to
provide world-class performance—perhaps not in the form of
supercomputers, but certainly as servers and workstations capable of
handling demanding commercial and scientific applications.

The Power Architecture
Before getting into the details of the new features, let's quickly
recap the basic Power5 architecture. It's a RISC processor design--
meaning that unlike the common x86 architecture, it uses fixed-length
instructions and it favors numerous small, fast instructions in lieu
of fewer complex instructions. UNIX servers and workstations, as well
as recent Macintosh systems from Apple, are the primary users of RISC
processors. In addition, IBM's RISC architecture has incorporated two
distinguishing features: enormous caches and multicore modules. Let's
look at these aspects briefly.

Caches are the areas of the processor die that hold data and
instructions that are about to be used by the processor execution
pipelines. All data touched by the processor must be loaded from main
memory into cache before it can be used. If a processor needs a data
item that is not in cache (an event termed a cache miss), the
processor must pause while the data item is retrieved from main
memory. This pause is costly because RAM chips are very slow by
comparison with data caches—generally an order of magnitude slower.
So for a short while, the processor is doing no productive work.
These cache misses occur less frequently when processors are endowed
with larger caches that can be loaded with more data. The significant
performance benefits of large caches prompted Intel last year to
raise on-board cache for its Pentium 4 and Xeon processors from half
a megabyte, to a full one megabyte. By contrast, the Power5 processor
boasts an extraordinary 36 megabytes of cache per processor core.
This is by far the largest cache in commercial production.

These big-cached Power processors come in modules containing four
processors (and their caches) rather than as individual chips. This
multi-processor design foreshadows the expected 2006 release of what
Intel and AMD are calling "multicore" processors in which one module
will have two of the x86 processor cores. IBM's Power5 modules have a
built-in memory manager. Between the multiple cores, the caches, and
the memory manager, the Power5 design gives you a multiprocessing
server in a package roughly the size of a man's palm.

Power processors and PowerPC chips (the latter derive from the Power
architecture) are used in IBM's mainframes, servers, and
workstations. A chip based on the architecture runs Nintendo's game
systems. They have also been in the news periodically. Deep Blue,
IBM's computer that beat chess world champion Garry Kasparov in 1997,
ran Power processors. And now the world's most powerful computer runs
on PowerPC chips.

Power5
The new generation of Power processors adds several key features to
the storied architecture. The most important of these is simultaneous
multithreading (SMT). This feature will be familiar to users of Xeon
and Pentium 4 chips, on which it is known by Intel's trademarked
moniker, Hyper-Threading technology. SMT enables two threads to
execute simultaneously on the same processor. These threads can be
from different programs. SMT is possible because the processor's out-
of-order execution requires there to be multiple units for handling
floating-point calculations, integer math, load/store instructions,
and so on. These extra units are pressed into service by SMT.
However, IBM's implementation of SMT has a unique feature: When one
thread is waiting, its priority is dynamically lowered. This change
has special implications in the case of the Power5 processor: Fewer
resources are allocated to low-priority threads, and the resources
they yield up are given to the other thread running on the chip. This
arrangement enables dynamic distribution of resources and a near-
optimal use of the processor execution pipeline.

SMT is a particularly good fit with the new Linux kernel. Prior to v.
2.6 of the kernel, the Linux thread scheduler left a lot to be
desired. The process of determining which thread should be assigned
to an available execution resource could in some cases take a very
long time. The 2.6 kernel fixed this problem and greatly expanded the
number of processors on which the kernel could run. Since a Power5
sports four processors with SMT, it will appear as an 8-way system to
Linux. Hence, the operating system's new, robust threading mechanism
arrives in time to really exploit this underlying hardware.

Management of the large caches was another area of focus for IBM in
Power5, since large caches do have their tradeoffs. The processor has
new circuitry to diminish the amount of time spent performing cache
coherency. This latter term refers to the need for copies of a data
item in cache to be identical. For example, suppose one processor
updates an integer. A copy of the updated integer will be placed in
that processor's cache prior to being written out to main memory. If
another processor has a copy of the same integer in its cache, the
two processors need to coordinate so that the copies always show the
latest value of the integer. This activity is known as cache
coherency and it results in substantial chatter between processors.
Typically, this chatter is handled by dedicated circuitry. On the
Power5 chips, this circuitry has been substantially improved so that
accessing in-cache variables can be done more quickly and
efficiently. This process helps Linux make best use of multiple
processor execution pipelines.

Linux systems have been made more robust as well due to improved
memory monitoring with Chipkill technology. Now memory errors are
more correctable and result in few if any, system shutdowns or kernel
panics. Let me explain. In the old days, parity was used to detect—
but not repair—bit errors. Parity counted 0s and 1s and created a
check bit that enabled all the bits in a given byte to sum an even
number or an odd one. If one bit was accidentally reversed, the sum
would be, let us say, even when it should be odd. This event resulted
in a parity-check error that basically caused serious repercussions
once it was detected—generally a system freeze. ECC came along later
on and figured out how to correct 1-bit errors and detect 2-bit
errors. Today, if ECC can correct the error, it often does so
silently and system processing continues without interruption. What
happens though when multibit errors occur? Such errors are generally
the result of defective RAM chips. In the old days, if a memory SIMM
experienced a chip problem, only one bit in a byte would need to be
fixed and ECC handled this efficiently. Today, however, as memory
SIMMs become much denser, a defective SIMM results in corruption of 8
or 16 bits, which causes an immediate machine halt. Chipkill is an
IBM memory design that spreads the bits of any given byte across
multiple SIMMs, such that if any given RAM chip goes bad, it can only
corrupt a single bit of the byte. In this way, the bit (and hence the
byte) can be corrected via ECC. It's a cute and effective remedy.
Only high-end designs offer Chipkill technology and the Power5
architecture is one of them. It makes servers nearly impervious to
defective RAM chips and when defects occur, Chipkill enables the
system to keep running until the problem is corrected.

Mainframe Features
While Linux has run well on servers, hereto it has not enjoyed
advanced mainframe capabilities. With IBM's OpenPower initiative,
features taken from mainframes are now available on Linux systems.
Primary among these is what IBM calls its Virtualization Engine—a
misleading term, because it comprises several technologies. The
engine enables systems to create dynamic execution partitions and
dynamically allocate I/O resources to them.

As old-timers will recall, mainframes offered partitions: you could
carve out a chunk of system RAM and a few processors and from them
create a single execution partition. This partition contained its own
instance of the operating system and it would be, for all intents, a
stand-alone—dare I say virtual?—machine. Mainframes could sport many
of these partitions; and the partitions could be resized dynamically.
This capability meant that if a partition suddenly became very busy
(let's say month-end processing starts up) while adjacent partitions
were humming along using only a fraction of their resources in use,
the busy partition could pick up some of the less-used resources, be
they processors or memory. In this way, the mainframe delivered
maximum performance to all the applications it ran. OpenPower brings
this dynamic partitioning to Linux. So think of the prospect of
adding processors on-the-fly to your Linux apps as they require them.

Virtual I/O—another component of the Virtualization Engine—adds I/O
channels dynamically to a Linux application in response to heavy data
loads. In this way, the maximum I/O resources are directed where
they're needed and the best sustainable system throughput is
achieved. It is tempting to view this Virtualization as functionality
of the operating system. However, this is not the case; hardware
support for the dynamic addition (and reduction) of resources is
built into the Power5 architecture, and Linux, as it runs on
OpenPower, capitalizes on this circuitry to attain top performance.

Conclusion
There are several other features in the Power5 processor that help
Linux run better. These include faster lock acquisition for threads,
and improved robustness features, but the bottom line is that this
generation of Power was custom designed to deliver features important
to enterprise Linux. Likewise, Linux itself advanced via enhancements
in the 2.6 kernel and in the two reference IBM implementations
(RedHat Enterprise Linux and SuSE Linux). As a result, Linux can
exploit the processor hardware in ways it could not do before.

To be fair, I should point out that everything that I have said Linux
can do on the Power5 platform, IBM's own version of UNIX, entitled
AIX, can do as well. The difference is that this is the first time in
history that Linux can do them on large multiway systems. And this is
due to Linux's growth and the features found in the new Power 5.

Resources
Developers who are interested in making use of these features should
explore the following pages.

Linux on Power home page is http://www-
106.ibm.com/developerworks/linux/power/

The definitive developer's page, complete with docs, tools,
tutorials, and trial software is http://www-
106.ibm.com/developerworks/linux/power/downloads.html

Page 1 of 1
Andrew Binstock is the principal analyst at Pacific Data Works LLC.
Previously he was the director of PricewaterhouseCooperss Global
Technology Forecasts. He writes the business integration column for
SD Times. His latest book, 'Programming with Hyper-Threading
Technology: How to Write Multithreaded Software for Intel IA-32
Processors,' is now available from Intel Press







Thu Nov 4, 2004 12:57 am

johnvoris
Offline Offline
Send Email Send Email

Forward
Message #148 of 259 |
Expand Messages Author Sort by Date

http://www.devx.com/ibm/Article/22211 Linux on Power: Suddenly, a Natural Fit New features in IBM's Power5 processor support recent advances in the kernel –...
johnvoris
Offline Send Email
Nov 4, 2004
12:58 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help