I am pleased to announce new #1
solution for problem #5 submitted by David Wilkinson. Here is a way to translate
any number of binary bytes to EBCDIC hex using truncated 2 byte table for TROT.
P7EH1.MLC/LOG by John Erhman -
has been updated to remove work-around for AW since the latest z390 PTF
v1.4.01f now has support for AW and all the HFP unnormalized floating
point instructions.
An email quiz question was
posted about the most efficient way to test if the left most bit in any
mask is on. The solution is to shift the mask 1 bit right and
compare it to the selected bits AND'd with mask. If the selected
bits are high then the high bit must be on.
It’s been a while, but the
contest will resume once z390 priority 1 compatibility RPI’s are fixed
hopefully in the next day or two with RPI v1.4.01f.
I’m working on z390 RPI 844
to fix a problem with TMHH, TMHL, TMLH, and TMLL where z390 was not compatible
with z9 or z10 running the z390 regression test TESTINS2.MLC. The problem
was with setting CC1 or CC2 when the selected bits were mixed. The POP
says set CC1 if the left most selected bit is 1 else set CC2 when selected bits
are mixed. The quiz question is simply given register bits in half
word REG and mask bits in half work MASK, what is the most efficient way to
determine if high selected bit is on or not.
Congratulations to Werner Rams for coding and testing
published random number generator which ran for 5 hours on his PC without any
duplications. To see the source visit the site here:
As Martin Ward pointed out, the USA National Institute
of Standards and Technology has various statistical tests for random number
generators for those interested:
I’m too busy with development of a new z390 macro
assembler based tool at the moment to think of a good problem #21. I hope
to catch up in a few weeks. Suggestions welcome.
As with all the problems, it has helped me to find and
fix another bug in the z390 emulator. The time limit default of 15
seconds and the override limit with TIME(sec) was not working for the emulator.
It will be fixed in the next release. The project I’m working on
with lots of macros has also uncovered an obscure bug or too. Since HLASM
compatibility and bug fixing take first priority, some other RPI’s have
been pushed back a bit. See updated list of pending RPI’s here:
On Friday 28 Mar 2008 18:10, Don Higgins wrote:
> A new problem #20 has been posted to write integer random number generator
> and test program to determine the longest sequence without duplication that
> it produces for a given seed number. The longest sequence of non-repeating
> pseudo-random numbers wins.
Non-repetition is not a very good test for a random number generator.
Eg start with a 64 bit seed and increment by one on each call.
Your test program will be running for a long time waiting for a repetition.
(To make the output look a bit more more "random": increment the seed
by a large number which is relatively prime to 2^64. It will still take 2^64
iterations to get a repetition.)
The USA National Institute of Standards and Technology has various
statistical tests for random number generators:
http://csrc.nist.gov/groups/ST/toolkit/rng/index.html
--
Martin
martin@...http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Congratulations to Werner Rams for
solution to #19 using TRTR. Honorable mention also goes to Steve R.K. for
sending email suggesting TRTR earlier the same day as when Werner’s complete
solution arrived. Visit the contest website to view code and output:
A new problem #20 has been posted
to write integer random
number generator and test program to determine the longest sequence without
duplication that it produces for a given seed number. The longest
sequence of non-repeating pseudo-random numbers wins.
Following the release of z390
v1.4.01a with new date and time callable routine named DAT that can be used for
benchmark testing, I’ve published a first solution to problem #18.
The solution show that the MIP rate dropped 15% when using the new z10 compare and
branch code, but the elapsed time in nano-seconds also reduced by 8%.
Visit the web site to see source code including copy of the new DAT routine and
the execution log showing results:
I’ve also posted a new
problem #18 to write code to find the last non-blank character in an 80 byte text
line using the least number of instructions. Hint there is a z10 instruction
that might be the fastest, but I could also be wrong.
Correction on the number of
instructions required for solution to problem 17 - it is 827 and not 30,619.
I had mistakenly published the instructions per second which appears at end of
log 672instead of the total which appears on the end of step message displayed
on console log, err, and optional sta files. Note that the instructions
per second number is also inaccurate for programs running less than a second
due to startup overhead.
I’m really glad Werner Rams
corrected my error, as I was discussing problems 17 and 18 with Melvyn Maltz
which he was visiting here on Friday. I was commenting that the initial number
of instructions seemed too high and it was!
Melvyn is taking a tour of the US
following our presentation at SHARE. Checkout new picture here:
Regarding problem 18, I’m
currently working on adding the last 30 new z10 instructions to assembler and
the new rt\test\TESTINS3.MLC regression test of each new instruction and hope
to have z390 v1.4.01 tested and published by Friday 03/14/08. Note I’ve
also installed J2SE 1.6.0_05 update and will be testing it this week.
Congratulations to Werner Rams for
the first solution to problem #17. His solution compresses and
decompresses 3 records using CLCL to detect end of duplicate character
strings. This solution executed 30,629 instructions. To view the
source code and the generated log file running on z390 visit:
I’ve
added a new problem # 18 to
measure the performance gain from using new IBM z10 opcode CIJNE:
Write a benchmark program to
calculate the percent performance improvement to 2 decimal places when
replacing the following loop code:
LOOP DS 0H
BCTR R1,0
*** APPLICATION CODE COMMENTED OUT FOR TEST ***
LTR R1,R1
JNE LOOP
with the following optimized loop code using the new z10 compare immediate and
branch relative opcode code:
LOOP DS 0H
BCTR R1,0
*** APPLICATION CODE COMMENTED OUT FOR TEST ***
CIJNE
R1,0,LOOP
The performance improvement in this case comes from replacing 2 instruction
cycles fetching a total of 6 bytes with a single instruction cycle fetching 6
bytes. You can use whatever interval timing method is available on your
system such as TIME BIN (requires running standalone). The initial values
in R1 must be set to perform enough iterations to reduce the timing error due
to interval timer precision etc. To code and unit test solution on z390
you will need version v1.4.01 with the new z10 opcodes scheduled for release by
03/14/08. To run the real test, you will need an IBM z10 mainframe and
updated HLASM. My own initial test on pre-release version of z390 v1.4.01
indicates about a 15% improvement but this is measuring J2SE VM
emulation overhead of each instruction cycle on Intel Dual-Core chip versus
measuring the IBM z10 hardware/Millicode. I'll be very interested to hear
the real results.
My intention in using
the work “transparent” was to indicate that all 256 EBCDIC
characters are allowed including any kind of operand characters and comments
plus continuation character and sequence characters up to 80 byte record
length. So yes the compression and decompression routines will need to
handle any special byte codes used in compression which can also appear as
valid byte codes in the original source. This is the sort of compression
and decompression routines required to handle any sort of data like ZIP does.
Don Higgins
mailto:don@...
http://don.higgins.net
From:
z390-assembler-contest@yahoogroups.com
[mailto:z390-assembler-contest@yahoogroups.com] On Behalf Of John P. Baker Sent: Saturday, February 23, 2008
12:17 AM To: z390 Assembler Contest Subject: [z390-assembler-contest]
Problem 17
Can we have some
clarification on Problem 17 in as far as any constraints on the operand field.
For example,
compressing:
L1
CLI F1,64
Is much easier
than compressing:
L1
CLI F1,C’ ‘
If quoted operands
are to be allowed, the recognition and compression algorithms become much more
difficult.
Also, can we
assume that neither comments or sequence numbers (73-80) are to be permitted?
Martin
Thanks, after SHARE I'll take a run at optimizing the SQXTR routine. The
emulator pz390.java code for the SQXBR and SQXR routines actually uses Math
library square root function to start with 64 bit square root guess which
then only requires a couple of iterations to get to 128 bit square root
solution. It sounds like I can improve these as well by skipping
unnecessary iteration and unrolling the code.
Don Higgins
mailto:don@...http://don.higgins.net
-----Original Message-----
From: z390-assembler-contest@yahoogroups.com
[mailto:z390-assembler-contest@yahoogroups.com] On Behalf Of Martin Ward
Sent: Saturday, February 23, 2008 4:47 AM
To: z390-assembler-contest@yahoogroups.com
Subject: Re: [z390-assembler-contest] ZMFACC z390 Mainframe Assembler Coding
Contest Update 02/22/08 - 5 new solutions and 1 new problem
On Saturday 23 Feb 2008 02:05, Don Higgins wrote:
> 4. I've posted 2 additional solutions for problem 12 calculation of
> standard deviation. Now there are 3 solutions; one using BFP, one using
> HFP, and one using just DFP instructions. The DFP solution proved to be
> more challenging than expected as the latest POP (at least for today) is
> missing the opcode SQXTR corresponding to the existing SQXBR and SQXR used
> in the other solutions. As a result I have coded a substitute SQXTR macro
> which calls standalone SQXTR CSECT which uses DFP instructions to perform
> the square root function using Newton Raphson method. For the current
> problem it converged to 34 digit solution in 7 iterations. Suggestions
for
> improving the first guess will help. Note all 3 solutions produce the
same
> result to 33+ decimal places.
This web page:
http://www.mactech.com/articles/mactech/Vol.14/14.01/FastSquareRootCalc/
describes how to use a table to obtain a more accurate first approximation.
Given that the first approximation has a known maximum error, it should be
possible to compute the maximum number of iterations required to
guarantee convergance. This saves an iteration, since the final iteration
is there just to prove convergance (the result does not improve on the final
iteration). Given that the number of iterations is fixed (and small!),
you can also fully unroll the loop.
--
Martin
martin@...http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Yahoo! Groups Links
On Saturday 23 Feb 2008 02:05, Don Higgins wrote:
> 4. I've posted 2 additional solutions for problem 12 calculation of
> standard deviation. Now there are 3 solutions; one using BFP, one using
> HFP, and one using just DFP instructions. The DFP solution proved to be
> more challenging than expected as the latest POP (at least for today) is
> missing the opcode SQXTR corresponding to the existing SQXBR and SQXR used
> in the other solutions. As a result I have coded a substitute SQXTR macro
> which calls standalone SQXTR CSECT which uses DFP instructions to perform
> the square root function using Newton Raphson method. For the current
> problem it converged to 34 digit solution in 7 iterations. Suggestions for
> improving the first guess will help. Note all 3 solutions produce the same
> result to 33+ decimal places.
This web page:
http://www.mactech.com/articles/mactech/Vol.14/14.01/FastSquareRootCalc/
describes how to use a table to obtain a more accurate first approximation.
Given that the first approximation has a known maximum error, it should be
possible to compute the maximum number of iterations required to
guarantee convergance. This saves an iteration, since the final iteration
is there just to prove convergance (the result does not improve on the final
iteration). Given that the number of iterations is fixed (and small!),
you can also fully unroll the loop.
--
Martin
martin@...http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
1. Congratulations to Werner Rams for first
solution to problem 15 requiring rounding of DP result! Werner used SRP
to solve the problem.
2.I have also posted a second solution for problem 15 which
uses AP and CP to accomplish the rounding.
3.A new problem 17 has been posted requiring
transparent compression and decompression of spaces.
4.I’ve posted 2 additional solutions for problem
12 calculation of standard deviation. Now there are 3 solutions; one
using BFP, one using HFP, and one using just DFP instructions. The DFP
solution proved to be more challenging than expected as the latest POP (at
least for today) is missing the opcode SQXTR corresponding to the existing SQXBR
and SQXR used in the other solutions. As a result I have coded a
substitute SQXTR macro which calls standalone SQXTR CSECT which uses DFP
instructions to perform the square root function using Newton Raphson
method. For the current problem it converged to 34 digit solution in 7
iterations. Suggestions for improving the first guess will help.
Note all 3 solutions produce the same result to 33+ decimal places.
5.I posted a first solution to problem 16 using a
SETBIT macro and a TESTBIT macro to access bit table used to record prime numbers
from 3 to 97 and then retrieve and print them.
It demonstrates to me that just
when you think you know something about a subject, you find out you don’t
know much at all. For links to other interesting math subjects also visit
Bob’s website (Thanks Bob):
Now that I’ve gotten past
VSAM insert support using AVL trees, and I don’t want to start anything
too big before next week’s SHARE conference, I plan to submit solutions
to the ZMFACC problems with no solutions yet, and post the next problem on best
transparent compression/decompression routines.
Congratulations to Werner Hams for submitting
new #1 solution to problem #11 hash table lookup using linked list for
duplicates. This solution uses a few more instructions but uses
significantly less memory for tables (approximately 12 * the number entries in
table):
Congratulations to Martin Ward for
a very impressive solution to problem #14 using packed decimal to handle higher
value Ackerman functions with up to 15 digit solutions. I also
posted my recursive macro code only solution which only handles up to 32 bit
integer solutions.
A new problem #16 requiring
routines to store and fetch bits to/from bit array representing numbers
submitted by Jim Connelley has been posted. Thanks Jim. For
the old timers this may remind you of how cool it was when HASP (Alias JES2) came
out to speed up OS/PCP/MFT/MVT spooling using a bit array in memory to handle
allocation of spool tracks as opposed to updating every sysout file in VTOC.
A friend asked me about the
potential application of techniques used in the contest problems. In
response here is an initial list of applications by category with my first
thoughts:
1.Swapping fields (#1,#2) – used in sorting,
file buffering
2.Conversion to display characters (#3,#6,#9,#10) –
used in dump, error display, or report formatting
3.Sorting records (#4) – used in preparing for
sequential processing and reporting
4.Conversion of display characters to binary (#5) –
used in decoding input data for processing
5.Floating point calculations (#7,#8,#12) – used
to calculate statistics with required precision
6.Table lookup (#11) – used to access data
tables required for processes such as validating records
7.Decimal calculations (#13,#15) – used to
calculate currency precisely in base 10
8.Recursive functions (#14) – used in sorting
routines such as Quicksort, compiling languages
Going forward, I’d like to
build a list of categories for future problems. My initial list of
possible additions include:
1.Boolean logic
2.Branch logic
3.Comparisons
4.Compression and de-compression (someone just asked
me if z390 supports CMPSC yet. The answer was not yet but I did go read
the 9 page description in the latest POP which includes 3 page diagram)
5.Encryption and decryption
6.File access methods
7.Heuristics
8.Totally useless just for fun
Do you have some categories or
specific questions you would like to see added. All feedback welcome.
ZMFACC Assembler Coding Contest - A
new problem #14 - code the Ackerman function to calc a(4,1)=65533.
Code a macro assembler program to calculate
the value of the Ackerman function a(4,1) = 65533. The Ackerman function
a(m,n) is a recursively defined function as follows:
1.If m = 0, then a(m, n) = n+1
2.if m > 0 and n = 0, then a(m, n) =
a(m-1,1)
3.if m> 0 and n > 0, then a(m,n) =
a(m-1,a(m,n-1))
For
more information on the Ackerman function visit:
I’ve updated the problem
statement for #13 per posting on Mainframe Assembler List indicating it was
incorrect. I’ve also posted a new solution to #13 using 7 DFP
instructions to perform the required division and rounding to 2 decimal
places. Calculating the number of significant digits to request for
rounding to 2 decimal places was not exactly trivial since the division
produced an irrational fraction with repeating sequence of 270270… which
resulted in 16 significant digit result with corresponding exponent set to
required position versus preferred position. Visit the web site to see
source and log listings:
Note running this solution on z390
requires update to PTF level v1400a which can now be downloaded in
InstallShield format for easy installation from www.z390.org.
This PTF fixes trap in RRDTR and RRXTR under some conditions and includes
trailing zeros as significant digits in ESDTR and ESXTR instructions.
Also the traces for several DFP instructions have been corrected to show
correct GPR versus FPR registers.
Although figuring out how to do
this in DFP was fun, I don’t recommend changing existing PD routines like
Steve’s P13SC1.MLC unless there is a well defined business application
requirement such as the need for an industry standard method that can be used
on different hardware and software platforms producing exactly the same result.
Congratulations to Steve Comstock
with http://www.trainersfriend.com/
for posting first solution to problem #13 using 5 packed decimal instructions.
See the source and generated output here:
1. In response to the question about the purpose
of the contest and the trade-off between fewest instructions, fastest
execution, least memory, and ease of maintenance, I have updated the web site
to say the primary goals are to have some fun and learn more about mainframe
assembler. Hopefully in the process folks will also share what they think
makes for maintainable assembler code. Certainly I agree that when
writing assembler code for a project where the code will live on indefinitely
and probably be maintained by someone else, the first priority may well be
making the code as understandable and portable as possible including such
things as defining the derivation and limits of any constants, avoiding tricks
which may be machine dependant and may make future upgrades and/or porting to
other languages much more difficult.
2.In response to the statement that floating point
should not be used in the solution to this problem, I have a question.
What method of rounding is “correct” for this problem? In
Steve’s packed decimal solution he only used 1 extra decimal place to
determine rounding. Using the new packed decimal instructions, it is possible
to use any number of “extra” digits to perform rounding up to 34
digits using the 128 bit extended format. If additional digits are
included and there is a carry into the 2 decimal digits required, the result is
different. Perhaps one of the most common methods for performing this
function is using a COBOL statement like DIVIDE TOTAL-PRICE BY QUANTITY GIVING
UNIT-PRICE ROUNDED. What kind of code does this generate and does the
COBOL standard say exactly how many extra digits or bits will be used in
performing the rounding. I can remember in my distant past, having issues
with this sort of thing when simply upgrading from one COBOL compiler version
to another on the same machine let along switching between platforms with
different hardware and software. If I am reading this document correctly
it would appear that the COBOL “Standard” calls for using 128 bit
decimal floating point to handle ROUNDED in a way that is portable across
machines and compilers:http://www.cobolstandard.info/j4/files/05-0063.doc
I’ve posted a second solution
to the hash table problem which allows up to 2 duplicate hash key searches per
entry resulting in a reduction in the hash table size from 48011 down to 3473
at the cost of a 10% increase in instructions.
This solution executes 7
instructions if the first entry is the key, 12 instructions if the second entry
is key, and 17 instructions if the third entry is the key.
To see the source and execution log
listings for this and other solutions visit the ZMFACC site here:
Given a decimal number
with 2 decimal places representing the total cost of an item and another
decimal number representing the quantify, calculate the unit price with 2
decimal places rounded half up? Problem was derived from question posted
on IBM Mainframe
Assembler-List by Ludmila Koganer.
I think the most
straight forward solution using the least instructions may be done using the
new Decimal Floating Point (DFP) instructions. However the most efficient
may be done using basic packed decimal instructions and no floating point
instructions. For more information on ZMFACC problems and solutions visit
the web site here:
I’ve posted a solution to problem #11 which asks
for the fastest hash routine to retrieve address of table entry given 8
character key for a table of known opcodes:
The solution was developed using another program P11FIND.MLC
which I’ve also posted which searches for minimum hash table with no
duplicate keys for the given primary table using the given hash routine which
is a 64 bit divide and then using positive remainder as index. The
divisor it found was 48011 resulting in a 172k+ hash table which will return
address of primary table entry using 6 instructions including the BR return.
Surely there are faster routines using less storage and
I’d like to see a few.
This was posted on Hercules email
group today by Jay Maynard, and I thought I’d pass it on for those
interested in using it to extend older assemblers to include many of the z/Architecture
instructions which the z390 already supports:.
Stephen Powell's macro library for
z/Architecture instructions for use with z/VM ASMFX and other older assemblers:
With these macro extensions you may
be able to use MVS 3.8 IFX00, z/VM Assembler XF, ASMG etc. to run z390
Mainframe Assembler Coding Contest solutions that depend on some of the newer
instructions for 64 bit GPR’s, 16 64 bit FPR’s, extended translate
instructions such as TROT, etc. I don’t see the decimal floating
point instructions yet but perhaps they are coming too.
1.P10MB1.MLC/LOG by Mats Broberg
at SEB using fewer instr. and single ED (Note execution of this solution on
z390 requires latest z390 v1.3.08h PTF to fix overflow bug)
There is still no solution posted
yet for the hash table problem #12. I will post a solution soon if none
is forthcoming. How about a new problem that can best be solved using the
new decimal floating point? After spending a good deal of time on DFP recently,
one quiz problem that comes to mind is how many different ways can you
represent the exact value of 1 using DFP short, double, or extended formats?
The answer may surprise you/