All
Congratulations to Werner Rams for
the first solution to problem #17. His solution compresses and
decompresses 3 records using CLCL to detect end of duplicate character
strings. This solution executed 30,629 instructions. To view the
source code and the generated log file running on z390 visit:
http://z390.sourceforge.net/z390_Mainframe_Assemble_Coding_Contest.htm
I’ve
added a new problem # 18 to
measure the performance gain from using new IBM z10 opcode CIJNE:
Write a benchmark program to
calculate the percent performance improvement to 2 decimal places when
replacing the following loop code:
LOOP DS 0H
BCTR R1,0
*** APPLICATION CODE COMMENTED OUT FOR TEST ***
LTR R1,R1
JNE LOOP
with the following optimized loop code using the new z10 compare immediate and
branch relative opcode code:
LOOP DS 0H
BCTR R1,0
*** APPLICATION CODE COMMENTED OUT FOR TEST ***
CIJNE
R1,0,LOOP
The performance improvement in this case comes from replacing 2 instruction
cycles fetching a total of 6 bytes with a single instruction cycle fetching 6
bytes. You can use whatever interval timing method is available on your
system such as TIME BIN (requires running standalone). The initial values
in R1 must be set to perform enough iterations to reduce the timing error due
to interval timer precision etc. To code and unit test solution on z390
you will need version v1.4.01 with the new z10 opcodes scheduled for release by
03/14/08. To run the real test, you will need an IBM z10 mainframe and
updated HLASM. My own initial test on pre-release version of z390 v1.4.01
indicates about a 15% improvement but this is measuring J2SE VM
emulation overhead of each instruction cycle on Intel Dual-Core chip versus
measuring the IBM z10 hardware/Millicode. I'll be very interested to hear
the real results.
Don Higgins
mailto:don@...
http://don.higgins.net