>>>>> "Arlo" == Arlo Faria <arlo@...> writes:
>> > Be aware that there's a big in v3_20 (and earlier) that
>> causes > problems if there are a lot of reject
>> frames. Specifically if the > fraction of frames rejected ever
>> passes the fraction of work being > done by a given thread
>> (e.g. 12.5% reject with 8 threads). There's > been a fix around
>> for a while but I haven't had time to release it (or > merge
>> Arlo's recent improvements). Noted. Thank goodness I'm an
>> accepting kind of guy, and don't tend to reject much. :-)
Arlo> Rejecting frames is no joke! We've recently found that it's
Arlo> a great way to speed up training if you reject them in such
Arlo> a way as to leave a subset of data that has a uniform
Arlo> distribution over classes. This can reduce training time by
Arlo> an order of magnitude.
Arlo> Unfortunately there's some issues with the multi-threading
Arlo> involved when you're rejecting the majority of your data,
Arlo> due to the way in which frames in a bunch are distributed to
Arlo> the threads. The current best hack is to dupe Quicknet into
Arlo> thinking that you have a lot more threads than you're
Arlo> actually going to use. For example, if you reject 90% of
Arlo> your data and you want to train on a 4-CPU server, you
Arlo> should set mlp_threads=40. This is probably something that
Arlo> should be fixed in a less crude manner... but it works for
Arlo> now!
Note that there's a functionality bug (the net not training) _and_ a
performance (i.e. speed) bug. I can dig up a patch for the
functionality issue if you want. Arlo's suggestion above fixes the
performance issues _after_ the functionality issue is sorted out.
David.
Arlo> -arlo
Arlo> Yahoo! Groups Links