> > Be aware that there's a big in v3_20 (and earlier) that causes
> > problems if there are a lot of reject frames. Specifically if the
> > fraction of frames rejected ever passes the fraction of work being
> > done by a given thread (e.g. 12.5% reject with 8 threads). There's
> > been a fix around for a while but I haven't had time to release it (or
> > merge Arlo's recent improvements).
> Noted. Thank goodness I'm an accepting kind of guy, and don't tend
> to reject much. :-)
Rejecting frames is no joke! We've recently found that it's a great way
to speed up training if you reject them in such a way as to leave a
subset of data that has a uniform distribution over classes. This can
reduce training time by an order of magnitude.
Unfortunately there's some issues with the multi-threading involved when
you're rejecting the majority of your data, due to the way in which
frames in a bunch are distributed to the threads. The current best hack
is to dupe Quicknet into thinking that you have a lot more threads than
you're actually going to use. For example, if you reject 90% of your
data and you want to train on a 4-CPU server, you should set
mlp_threads=40. This is probably something that should be fixed in a
less crude manner... but it works for now!
-arlo