I am a grateful junkfilter user, but I am having to abandon it for
better methods. It is probably time for most of us to consider
awarding it some medals, retiring it, and moving on.
My email address has been around for so long and so publicly (lots of
newsgroup postings in old days before spam troubles) that I am on
every spammer list. On a bad day recently I got 300 spam messages!
They arrive all day, and all night, every few minutes or so.
While junkfilter served me well for several years, I have found that
it is losing the battle against the volume and techniques of spam
lately. A year ago the error rate was acceptable. Lately I was
having to deal with about a 10 percent false-positive and false-
negative rate. This meant that after junkfilter was done, my inbox
still had more spam than genuine email, and my spam box had to be
carefully examined for false positives.
I felt so miserable about email, compared to 10 years ago, that I was
despairing that it was doomed as a tool, just like fax was a big deal
in the 1980s but now is largely obsolete (for different reasons). It
got so the shell alert "You have mail" gave me a pit in my stomach.
After researching the publicity lately praising Bayesian spam
filtering, I turned off junkfilter and installed a trial of
bogofilter (http://bogofilter.sourceforge.net/). I trained it with
several years of accumulated good mail, and the latest week's worth
of spam (1000+ pieces), and now I have zero percent false-positive
rate, and only about a 1 or 2 percent false negative rate
(note: using spam_cutoff of 0.38 with the "fisher" method). The false
negatives rate seems to continue to improve as the training history
is extended. I am so relieved to have control of my inbox again.
So I thank the junkfilter author(s) for their work. It was an
excellent rule-based implementation, as long as rules could be
effective, but rules just won't work anymore, and statistics seems
the only way out. To learn more about rules vs statistical methods,
read the essay that started the current sensation:
http://www.paulgraham.com/spam.html
Or, search http://www.sourceforge.net for "Bayesian". I recommend
the bogofilter project, a procmail filter like junkfilter, as the
best current implementation.
Richard Kinch
http://www.truetex.com