(This is a question & answer from the ICSI's new internal FAQ. Since
it's on a Wiki which is not accessible without a password, I decided
to post it here for the benefit of people outside ICSI.)
How do I ensure qnstrn will give consistent results on different machines?
A discussion of this question is provided below in the form of an
email thread. Note that the sort of performance differences described
below could also come from sources other than those described below,
such as building QuickNet under different operating system versions or
different compiler versions, or using different values of the qnstrn
mlp3_threads option.
Date: Thu, 16 Nov 2006 21:06:10 -0800 (PST)
From: David Gelbart
Subject: MLP accuracy difference between hero6 and octopus4
Hello,
Today I ran identically configured MLP trainings on hero6 and octopus4
with MSG features. On hero6 the final train and CV accuracies were
87.24% and and 83.37%, and on octopus4 they were 88.41% and 83.88%.
...
The train and CV accuracies in the log files are identical for the
first 2 epochs, and then start to diverge.
I also tried comparing trainings on both machine types for two other
feature types (MFCC and PLP), and for those I found I obtained
different MLP weights files but identical final CV and train accuracies.
Date: Fri, 17 Nov 2006 10:10:35 -0800
From: David Johnson
Subject: Re: MLP accuracy difference between hero6 and octopus4
The primary difference between the hero machines and the octopus
machines is that they have different CPUs and hence, for performance
reasons, use different matrix libraries with different computation
order for any given matrix op. This results in different rounding
(remember, single precision only has 24 bit mantissas, which makes it
easy to lose lower bits), end hence very slightly different results
for any given forward pass, depending on machine. If you then feedback
these results, as you do in MLP training, it's quite possible that the
two trainings will diverge (they may not, but with a non-linear system
and lots of feedback, assuming small perturbations won't change where
you finally end up is somewhat optimistic!).
And then you throw the regular "newbob" learning rate schedule on top
of that, which makes hard decisions (e.g. do another epoch if
improvement is better than 0.5%) and you're guaranteed a significant
range of results.
...
Date: Fri, 17 Nov 2006 11:31:15 -0800 (PST)
From: David Gelbart
Subject: Re: MLP accuracy difference between hero6 and octopus4
Andreas wrote:
> are you sure that the same binaries are run in both cases (selected
by the dispatcher script for qnsfwd) ?
The binaries were not the same. hero6 ran
/u/drspeech/i586-linux/bin/qnstrn-P4SSE2. octopus4 ran
/u/drspeech/i586-linux/bin/qnstrn-HAMMER32SSE2.
Date: Fri, 17 Nov 2006 11:33:42 PST
From: Andreas Stolcke
Subject: Re: MLP accuracy difference between hero6 and octopus4
So there is your answer (cf. David's explanation of why results are
expected to diverge).
...
If you had time you could redo all experiments using qnstrn-P4SSE2
(which should run on all machines).
--Andreas
Date: Fri, 17 Nov 2006 12:52:52 -0800 (PST)
From: David Gelbart
Subject: Re: MLP accuracy difference between hero6 and octopus4
In the experiments I described in my previous email, the hero6 version
of qnstrn was qnstrn-v3_11-P4SSE2, and the octopus4 version was
qnstrn-v3_11-HAMMER32SSE2.
I just tried running the trainings over again on octopus4 using
qnstrn-v3_11-P4SSE2. This produced identical MLP weights files as
running qnstrn on hero6. So in the future I would like to use
qnstrn-v3_11-P4SSE2 for all my trainings.