> I have noticed that the bodychk function takes precedence over
> even the White List when flagging messages as spam. Is there any
> standard way to make any message coming from anyone in the White list
> simply pass, no questions asked so to speak?
>
> What I presently do is short-circuit the filter for mail
> coming from certain lists and cause those messages to be delivered to
> the inbox.
I do that, too. I set my whitelist for processing ahead of junkfilter.
Messages from whitelisted senders are not sent to junkfilter this way.
I found that whitelisting my own domain the junkfilter whitelist way
caused all spam to be delivered not filtered because the domain name is in
the headers (not just the From: header) because of the mail server's name.
So, unless that's been changed, the whitelist is useless.
I like the preprocessing idea because it gives me complete flexibility. My
.procmailrc file is nothing more than a list of INCLUDERC statements in
the order in which I want them called.
Ralph
I have noticed that the bodychk function takes precedence over
even the White List when flagging messages as spam. Is there any
standard way to make any message coming from anyone in the White list
simply pass, no questions asked so to speak?
What I presently do is short-circuit the filter for mail
coming from certain lists and cause those messages to be delivered to
the inbox.
Martin McCormick WB5AGZ Stillwater, OK
OSU Information Technology Division Network Operations Group
On Tue, 03 Jun 2003 16:34:38 -0000
"phillipremaker" <remaker@...> wrote:
> I use junkfilter and find it very useful.
>
> Howeevr, in order to reduce false positives on internal company
> email, I want a rule that says something like
>
> Whitelist any message that did NOT pass through
>
> proxy-[1234].companyname.com
>
> In the "received" headers.
>
> I know I can blacklist of whitelist any message with that in the
> received header, but I don't know the syntax to say "whitelist it
> if it does NOT contain this header information"
>
> I also was trying to figure out a way to blacklist anything with
>
> charset="Windows-1251"
>
I use this, and I don't know if I saw it on this list or somewhere
else:
:0:
* 1^0
^\/Subject:.*=\?(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_
5601-1987|windows-1251|windows-1256)\?
* 1^0
^\/Content-Type:.*charset="(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|g
b2312|ks_c_5601-1987|windows-1251|windows-1252|windows-1256)
Word-wrap messed it up a bit, but that is what I put in
junkfilter.user.
I seem to recall this recipe or a very similar one was posted to
this list recently.
--
Andrew
I use junkfilter and find it very useful.
Howeevr, in order to reduce false positives on internal company
email, I want a rule that says something like
Whitelist any message that did NOT pass through
proxy-[1234].companyname.com
In the "received" headers.
I know I can blacklist of whitelist any message with that in the
received header, but I don't know the syntax to say "whitelist it if
it does NOT contain this header information"
I also was trying to figure out a way to blacklist anything with
charset="Windows-1251"
in the header. I added it to 'headers-user' but it seems not to
always catch them (the content type header is usually linewrapped
with the charset appearing on the second line.)
Thanks for any clue.
On Thu, 15 May 2003 12:08:37 -0700 (PDT)
Eric S <ejs@...> wrote:
> I'm using maildrop rather than procmail (still read this list
> though), so this may not be an exact match, but what I use
> (translated from maildrop semantics):
>
>
> /Subject:.*[^a-z0-9][a-z0-9][^a-z0-9]+[a-z0-9][^a-z0-9]+[a-z0-9][
> ^a-z0-9]+/
Thanks, I think I can use something like that.
--
Andrew
On Thu, 15 May 2003, Pollywog wrote:
> Lately, spammers seem to be getting wise to filtering, and I don't
> know how to write a recipe that will take care of Subject: headers
> that contain for example the word "debt" but with spaces or other
> characters between the letters in the word: "d e b t" or "d-e-b-t".
>
> I could write one rule or several, but there is probably a way to
> write one short recipe to get all the possible variations. Any
> ideas?
I'm using maildrop rather than procmail (still read this list though), so
this may not be an exact match, but what I use (translated from maildrop
semantics):
/Subject:.*[^a-z0-9][a-z0-9][^a-z0-9]+[a-z0-9][^a-z0-9]+[a-z0-9][^a-z0-9]+/
which catches any time they use three or more individual characters
seperated by non-alphanumeric. It isn't 100%, every once in a while it
will catch non-spam.
I'm finding that my most effective rules lately are the ones that look for
standard obscuring behaviors rather than rules that try to examine the
message (ie: ignore the words, it's how the words are presented that is
currently working).
Lately, spammers seem to be getting wise to filtering, and I don't
know how to write a recipe that will take care of Subject: headers
that contain for example the word "debt" but with spaces or other
characters between the letters in the word: "d e b t" or "d-e-b-t".
I could write one rule or several, but there is probably a way to
write one short recipe to get all the possible variations. Any
ideas?
thanks
On Mon, 12 May 2003 10:11:59 -0700
Johann Schubert <jes_Ygroup@...> wrote:
>
> Works fine on both my domains. Keep in mind if you forward the
> message, you will likely get an empty plain text part. That will
> cause the message to pass through the filter/recipe unmatched...
I forgot about that. The recipe does work for me, though, and right
after that sample spam message, I received a spam that was trapped
by that recipe.
--
Andrew
Pollywog wrote:
>
> Martin McCormick <martin@...> wrote:
>
> > I installed the recipe in junkfilter.user that we have been
> > discussing.
> >
> > :0
> > * ^Content-type:(.*\<)?multipart
> > * B ?? ^Content-transfer-encoding: base64
> > * B ?? ! ^Content-type: text/plain
> > { JFMATCH="$JFSEC: base64 encoded multipart with no plain text"
> > INCLUDERC=$JFDIR/junkfilter.match
>
> I am using that recipe and it has caught a few spams.
> Send me that spam ( to croak at shadypond d o t com) and we will see
> if it gets past it.
Works fine on both my domains. Keep in mind if you forward the message, you
will likely get an empty plain text part. That will cause the message to pass
through the filter/recipe unmatched...
John
On Mon, 12 May 2003 10:01:05 -0500
Martin McCormick <martin@...> wrote:
> I installed the recipe in junkfilter.user that we have been
> discussing.
>
> :0
> * ^Content-type:(.*\<)?multipart
> * B ?? ^Content-transfer-encoding: base64
> * B ?? ! ^Content-type: text/plain
> { JFMATCH="$JFSEC: base64 encoded multipart with no plain text"
> INCLUDERC=$JFDIR/junkfilter.match
I am using that recipe and it has caught a few spams.
Send me that spam ( to croak at shadypond d o t com) and we will see
if it gets past it.
--
Andrew
On [2003-May-12] Martin McCormick <martin@...> wrote:
> I installed the recipe in junkfilter.user that we have been
> discussing.
>
> :0
> * ^Content-type:(.*\<)?multipart
> * B ?? ^Content-transfer-encoding: base64
> * B ?? ! ^Content-type: text/plain
> { JFMATCH="$JFSEC: base64 encoded multipart with no plain text"
> INCLUDERC=$JFDIR/junkfilter.match
> }
>
> I have a base64-encoded spam to test with. They sort of grow
> like weeds. It still bypasses the filter but I think something is
> trying to happen.
>
> I set JF_USER=1 and I see the junkfilter.user file referenced
> in the log if I set VERBOSE to YES.
>
> My question is whether the base64 message gets umpacked and
> fed through the rest of junk filter. What I see at that point in the
> debugging output looks like:
Martin,
This particular test just looks for the MIME'ish "headers". If there is a
multipart content header in the message's headers AND, in the body of the
message, there is a base64 encoding specifier AND no text/plain section then
the message is considered spam.
No decoding is done from the base64 section.
From your log the check fails immediately because the is no "multipart"
designator in the header. If you have base64 encoded messsages without the
"multipart" descriptor then they have to be caught with a different recipe.
How that should be constructed depends on the messages you are getting.
Rich
I installed the recipe in junkfilter.user that we have been
discussing.
:0
* ^Content-type:(.*\<)?multipart
* B ?? ^Content-transfer-encoding: base64
* B ?? ! ^Content-type: text/plain
{ JFMATCH="$JFSEC: base64 encoded multipart with no plain text"
INCLUDERC=$JFDIR/junkfilter.match
}
I have a base64-encoded spam to test with. They sort of grow
like weeds. It still bypasses the filter but I think something is
trying to happen.
I set JF_USER=1 and I see the junkfilter.user file referenced
in the log if I set VERBOSE to YES.
My question is whether the base64 message gets umpacked and
fed through the rest of junk filter. What I see at that point in the
debugging output looks like:
procmail: Score: 1 1 ""
procmail: Assigning
"INCLUDERC=/home/martin/.procmail/junkfilter/junkfilter.user"
procmail: Assigning "JFSEC=user"
procmail: No match on "^Content-type:(.*\<)?multipart"
procmail: Assigning "JFSEC"
procmail: Score: 1 1 ""
procmail: Assigning "INCLUDERC=/home/martin/.procmail/junkfilter/junkfilter.one"
procmail: Assigning "JFSEC=1"
procmail: No match on
"^Received:.*\[\/[0-9\.]*([03-9][0-9][0-9]|2[6-9][0-9]|25[6-9])"
The line that reads
procmail: No match on "^Content-type:(.*\<)?multipart"
That appears to be the first test in the recipe, but I don't see it do
anything afterward except looking for text strings.
I unpacked the base64 payload and it was a HTML message that
contained an URL which has one of the usual current spammer tricks in
it of substituting a zero for an O, but if there had been a decode,
bodychk would have dumped it. The base64decode is vintage
MSDOS/Windows text with LF-CR sequences for newlines, but that
shouldn't hurt a thing as far as decoding goes. Eventually, we'll have those
spammers trained so well that we can spot this junk just by
looking for all the tricks and not worrying about what words they
use.:-)
If it's base64 that decodes in to embedded html that splits
words, it's spam. I have no UCE for spam.
Martin McCormick WB5AGZ Stillwater, OK
OSU Center for Computing and Information Services Network Operations Group
On 2003-05-10 02:05 +0000, Pollywog <croak@...> wrote:
> On Fri, 09 May 2003 16:08:26 -0500
> Martin McCormick <martin@...> wrote:
>
> > I was about to add the RE to trap words with html directives
> > in them, but I am not sure enough of what I am doing yet.
> >
> > It looks like this belongs in bodychk.
> >
> > [a-z]<!--[^>]+-->[a-z]
> >
> > I know what you did, there, but what do I add after the
> > regular expression?
> >
> > If it doesn't go in bodychk, where does it go?
>
> in junkfilter.user
> There is a sample in the junkfilter.user-default file.
Or in bodychk-user in $JFUSERDIR. Read the README, section 4,
paragraph 4. :)
Greg
--
Gregory S. Sutter The best way to accelerate Windows
mailto:gsutter@... is at 9.8 m/s^2.
http://www.zer0.org/~gsutter/hkp://wwwkeys.pgp.net/0x845DFEDD
On Fri, 09 May 2003 16:08:26 -0500
Martin McCormick <martin@...> wrote:
> I was about to add the RE to trap words with html directives
> in them, but I am not sure enough of what I am doing yet.
>
> It looks like this belongs in bodychk.
>
> [a-z]<!--[^>]+-->[a-z]
>
> I know what you did, there, but what do I add after the
> regular expression?
>
> If it doesn't go in bodychk, where does it go?
in junkfilter.user
There is a sample in the junkfilter.user-default file.
Jeff A. Earickson wrote:
> [...] I'm a big mail of MailScanner (www.mailscanner.info)
> which can use SpamAssassin (www.spamassassin.org) and other
> anti-spam tools to stop most it.
[...]
Maybe also a useful hint for anybody fighting spam:
http://www.eleven.de/
They commercially offer a flexible, very configurable spam filtering
system, but private use is free.
Don't worry about the ".de" domain - they have English pages, too.
Ciao,
Martin (who doesn't have any relationship to this company)
"Jeff A. Earickson" wrote:
> While I still use junkfilter, it only traps 30-40 spams a day
> at best for me -- mostly because I have so much anti-spam stuff
> upstream. I'm a big mail of MailScanner (www.mailscanner.info)
> which can use SpamAssassin (www.spamassassin.org) and other
> anti-spam tools to stop most it.
>
> I still keep junkfilter in use because I can quickly trap
> custom stuff without a lot of head scratching.
I wasn't sure whether is was OK to mention SpamAssassin here ;-), but since
you did... I run junkfilter and SpamAssassin on all incoming email. What
junkfilter misses, SpamAssassin usually catches. Addtionally I skip
SpamAssassin if the sender was in junkfilter's white list.
Martin McCormick wrote:
>
> I was about to add the RE to trap words with html directives
> in them, but I am not sure enough of what I am doing yet.
>
> It looks like this belongs in bodychk.
>
> [a-z]<!--[^>]+-->[a-z]
>
> I know what you did, there, but what do I add after the
> regular expression?
>
> If it doesn't go in bodychk, where does it go?
IMHO this definately goes in bodychk. That's where I put it. Be aware that
unless you run another anti-spam tool, this won't catch all HTML SPAM trashed
with comments... HTML comments can also be inserted are <! comment>
John
I was about to add the RE to trap words with html directives
in them, but I am not sure enough of what I am doing yet.
It looks like this belongs in bodychk.
[a-z]<!--[^>]+-->[a-z]
I know what you did, there, but what do I add after the
regular expression?
If it doesn't go in bodychk, where does it go?
Thank you.
Martin McCormick WB5AGZ Stillwater, OK
OSU Center for Computing and Information Services Network Operations Group
Hi,
While I still use junkfilter, it only traps 30-40 spams a day
at best for me -- mostly because I have so much anti-spam stuff
upstream. I'm a big mail of MailScanner (www.mailscanner.info)
which can use SpamAssassin (www.spamassassin.org) and other
anti-spam tools to stop most it.
I still keep junkfilter in use because I can quickly trap
custom stuff without a lot of head scratching.
--- Jeff Earickson
> After researching the publicity lately praising Bayesian spam
> filtering, I turned off junkfilter and installed a trial of
> bogofilter (http://bogofilter.sourceforge.net/).
I've been using spamprobe, another Bayesian filter, for about 6 months. You can
see my spam counts at http://www.phord.com/spam/
I don't list "false positives" there because there are practically none. I
_have_ had 6 false positives, and all were "spammy-looking" advertorials from
companies with whom I have prior relationships.
I loved junkfilter when I used it, but I haven't used it in a while.
Phil
Man, i have a hard time believing you've got a 10% failure rate using
junkfilter, at the same time where i've got almost a 100% success rate.
I don't get 300 spam message a day; perhaps half that. But using
Rich's offered up user-recipes and turning on all four levels of JF,
i get spectacular results. Perhaps 1 or 2 spam messages a WEEK
slipping through and 3-4 false positives (mostly b/c i get a lot of
mail from random users who i converse with once or twice ever).
To each his own i guess; i'll never abandon JF.
boss
>
> I am a grateful junkfilter user, but I am having to abandon it for
> better methods. It is probably time for most of us to consider
> awarding it some medals, retiring it, and moving on.
>
> My email address has been around for so long and so publicly (lots of
> newsgroup postings in old days before spam troubles) that I am on
> every spammer list. On a bad day recently I got 300 spam messages!
> They arrive all day, and all night, every few minutes or so.
>
> While junkfilter served me well for several years, I have found that
> it is losing the battle against the volume and techniques of spam
> lately. A year ago the error rate was acceptable. Lately I was
> having to deal with about a 10 percent false-positive and false-
> negative rate. This meant that after junkfilter was done, my inbox
> still had more spam than genuine email, and my spam box had to be
> carefully examined for false positives.
>
> I felt so miserable about email, compared to 10 years ago, that I was
> despairing that it was doomed as a tool, just like fax was a big deal
> in the 1980s but now is largely obsolete (for different reasons). It
> got so the shell alert "You have mail" gave me a pit in my stomach.
>
> After researching the publicity lately praising Bayesian spam
> filtering, I turned off junkfilter and installed a trial of
> bogofilter (http://bogofilter.sourceforge.net/). I trained it with
> several years of accumulated good mail, and the latest week's worth
> of spam (1000+ pieces), and now I have zero percent false-positive
> rate, and only about a 1 or 2 percent false negative rate
> (note: using spam_cutoff of 0.38 with the "fisher" method). The false
> negatives rate seems to continue to improve as the training history
> is extended. I am so relieved to have control of my inbox again.
>
> So I thank the junkfilter author(s) for their work. It was an
> excellent rule-based implementation, as long as rules could be
> effective, but rules just won't work anymore, and statistics seems
> the only way out. To learn more about rules vs statistical methods,
> read the essay that started the current sensation:
>
> http://www.paulgraham.com/spam.html
>
> Or, search http://www.sourceforge.net for "Bayesian". I recommend
> the bogofilter project, a procmail filter like junkfilter, as the
> best current implementation.
>
> Richard Kinch
> http://www.truetex.com
>
>
> junkfilter, http://junkfilter.zer0.org/ -- End spam, filter it
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
On [2003-May-08] tom sgouros <tomss@...> wrote:
>
> Rich:
>
> Your strategy sounds very much like what I'd like to do, but getting
> it all configured properly is stalling me. Would you consider sharing
> your procmailrc and JF configuration files?
Sure. In the .procmailrc I first sort out all the "known to be good" mail
(email lists, etc.) before using JF (no sense wasting cycles) then...
# store a backup msg (just in case we need to reprocess later) for 92 days
:0 c
$BACKUP
:0fw
| formail -I "X-Backup: $LASTFOLDER"
:0 ic
| cd $BACKUP && find . -ctime +92 -name msg.\* -exec rm {} \;
# run JunkFilter to get spam
....(standard JF invocation here)
# these are the rules I have in junkfilter.user ($ME is a variable holding
# a regexp form of my address)
# using these I don't even bother to decode the html since "good" html mail has
# been diverted via the whitelist
:0
* ! ^(To|Cc): $ME
* ^Content-type: text/html
{ JFMATCH="$JFSEC: html mail not addressed TO me"
INCLUDERC=$JFDIR/junkfilter.match }
# dump any non-multipart messages with base64 encoded text (html or plain)
:0
* ^Content-type:(.*\<)?text/(html|plain)
* ^Content-transfer-encoding:(.*\<)base64
{ JFMATCH="$JFSEC: mail in BASE64" INCLUDERC=$JFDIR/junkfilter.match }
# dump those that say they are html and have no plain-text version attached
:0
* ^Content-type: text/html
* B ?? ! ^Content-type: text/plain
{ JFMATCH="$JFSEC: html mail with no text" INCLUDERC=$JFDIR/junkfilter.match }
:0
* ^Content-type:(.*\<)?multipart
* B ?? ^Content-transfer-encoding: base64
* B ?? ! ^Content-type: text/plain
{ JFMATCH="$JFSEC: base64 encoded multipart with no plain text"
INCLUDERC=$JFDIR/junkfilter.mat
ch }
:0
* ^Content-type:(.*\<)?multipart
* B ?? ! ^Content-type: text/plain
{ JFMATCH="$JFSEC: multipart with no plain text"
INCLUDERC=$JFDIR/junkfilter.match }
:0 B
* http://
* 5^0
* -1^1 [A-z!-@]+$
{ JFMATCH="$JFSEC: fewer than 5 lines & one is a link #14"
INCLUDERC=$JFDIR/junkfilter.match }
I am a grateful junkfilter user, but I am having to abandon it for
better methods. It is probably time for most of us to consider
awarding it some medals, retiring it, and moving on.
My email address has been around for so long and so publicly (lots of
newsgroup postings in old days before spam troubles) that I am on
every spammer list. On a bad day recently I got 300 spam messages!
They arrive all day, and all night, every few minutes or so.
While junkfilter served me well for several years, I have found that
it is losing the battle against the volume and techniques of spam
lately. A year ago the error rate was acceptable. Lately I was
having to deal with about a 10 percent false-positive and false-
negative rate. This meant that after junkfilter was done, my inbox
still had more spam than genuine email, and my spam box had to be
carefully examined for false positives.
I felt so miserable about email, compared to 10 years ago, that I was
despairing that it was doomed as a tool, just like fax was a big deal
in the 1980s but now is largely obsolete (for different reasons). It
got so the shell alert "You have mail" gave me a pit in my stomach.
After researching the publicity lately praising Bayesian spam
filtering, I turned off junkfilter and installed a trial of
bogofilter (http://bogofilter.sourceforge.net/). I trained it with
several years of accumulated good mail, and the latest week's worth
of spam (1000+ pieces), and now I have zero percent false-positive
rate, and only about a 1 or 2 percent false negative rate
(note: using spam_cutoff of 0.38 with the "fisher" method). The false
negatives rate seems to continue to improve as the training history
is extended. I am so relieved to have control of my inbox again.
So I thank the junkfilter author(s) for their work. It was an
excellent rule-based implementation, as long as rules could be
effective, but rules just won't work anymore, and statistics seems
the only way out. To learn more about rules vs statistical methods,
read the essay that started the current sensation:
http://www.paulgraham.com/spam.html
Or, search http://www.sourceforge.net for "Bayesian". I recommend
the bogofilter project, a procmail filter like junkfilter, as the
best current implementation.
Richard Kinch
http://www.truetex.com
Thanks for the tip Eric. I just added a rule to my procmailrc and it's
already caught garbage.
Here's what I did, maybe it'll help someone else,
:0 B:
* [a-z]<!--[^>]+-->[a-z]
| formail -A "X-SPAM-RULE: HTML Comments in words" \
>> $SPAM
Thanks again,
Kevin
-----Original Message-----
From: Eric S [mailto:ejs@...]
Sent: Thursday, May 08, 2003 2:08 PM
To: junkfilter-users@yahoogroups.com
On Thu, 8 May 2003, Kevin Ring wrote:
> Recently I've noticed a pattern in mails that are defeating the
junkfilter.
> Spammers are embedding HTML comments in the middle of words that should be
> caught.
Yup. So I updated my filters so that an HTML comment in the middle of a
word flags the email as spam. I've seen HTML comments in emails composed
by MUAs that do HTML natively, but I've never seen a non-spam that wasn't
discussing spam that matched [a-z]<!--[^>]+-->[a-z]. Gotta love spammer
tricks that actually make spam easier to identify.
The nonsense tags are going to take more work, but even then, it won't be
too hard.
junkfilter, http://junkfilter.zer0.org/ -- End spam, filter it
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Great. So, can we get this particular feature implemented in a future
JF release?
Actually; non spam-specific question. Shortly after the demise of the
jf-miss list, i recall seeing a message from Greg Sutter (see
http://groups.yahoo.com/group/junkfilter-users/message/672) that said
something about a "quantum leap" in the program being needed.
I was confused by this; JF has been specacularly successful for me
and I'd like to continue to see new releases with new and better
recipes contributed by its users. Is this the general plan for the
product, or are we facing the end of "development" on the current
code line?
boss
>
> On Thu, 8 May 2003, Kevin Ring wrote:
>
> > Recently I've noticed a pattern in mails that are defeating the junkfilter.
> > Spammers are embedding HTML comments in the middle of words that should be
> > caught.
>
> Yup. So I updated my filters so that an HTML comment in the middle of a
> word flags the email as spam. I've seen HTML comments in emails composed
> by MUAs that do HTML natively, but I've never seen a non-spam that wasn't
> discussing spam that matched [a-z]<!--[^>]+-->[a-z]. Gotta love spammer
> tricks that actually make spam easier to identify.
>
> The nonsense tags are going to take more work, but even then, it won't be
> too hard.
>
>
> junkfilter, http://junkfilter.zer0.org/ -- End spam, filter it
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
On Thu, 8 May 2003, Kevin Ring wrote:
> Recently I've noticed a pattern in mails that are defeating the junkfilter.
> Spammers are embedding HTML comments in the middle of words that should be
> caught.
Yup. So I updated my filters so that an HTML comment in the middle of a
word flags the email as spam. I've seen HTML comments in emails composed
by MUAs that do HTML natively, but I've never seen a non-spam that wasn't
discussing spam that matched [a-z]<!--[^>]+-->[a-z]. Gotta love spammer
tricks that actually make spam easier to identify.
The nonsense tags are going to take more work, but even then, it won't be
too hard.
Rich:
Your strategy sounds very much like what I'd like to do, but getting
it all configured properly is stalling me. Would you consider sharing
your procmailrc and JF configuration files?
-Tom
---------------------------------
tomss@... 401-861-2831
http://sgouros.com
On [2003-May-08] Kevin Ring <kring@...> wrote:
> Recently I've noticed a pattern in mails that are defeating the junkfilter.
> Spammers are embedding HTML comments in the middle of words that should be
> caught.
>
> I've found solutions on the net for stripping HTML for every message (which
> is an attractive concept, but not what I need). What I'm looking for is a
> filter or program that will strip the HTML before it passes the message to
> the junkfilter.
>
> Has anyone implemented something like this or have any ideas?
I'd probably suggest that "stripping HTML for every message" is exactly what
you need to do. I accept HTML-email from people in my whitelist only; all
other mail is treated like this (via JF):
if it is html and not addressed to me (in To or Cc) then it is spam
if it doesn't have a plain-text section then it's spam
(plus a few others to deal with base64 encoded stuff and some variants).
But, I also make an untouched backup copy prior to any stripping/checking so I
can, if need be, read the original html in all its glory :-)
Rich
Lynx can act as a filter to display html mail on a terminal or
redirected in to a file but I am not sure how to harness this for
junkfilter. The ultimate solution is to decode all messages before
sending them to junkfilter to defeat all the cute little tricks they
are trying. I just got the umpteenth-thousandth spam for the Iraq
playing card deck because the cyber-urchin who sent this spam placed a
period between the letters in IRAQ. A possible algorithm to combat
this ploy might be to run the text through a filter that removes
anything that is not strictly digits and letters. It could also have
a synonym list to equate I and 1, o and 0 and other dodges by the
rifraf Mafia.
The idea, here, is not to produce pleasant-looking text, but
to produce the same output pattern no matter how many spaces or other
fillers the pond scum try to insert to beat the filters.
I don't know that much about karma, but me thinks some of
these fellows must have been gnats or flies in an earlier life and got
human status due to some cosmic hacking incident.
Martin McCormick
"Kevin Ring" writes:
>Recently I've noticed a pattern in mails that are defeating the junkfilter.
>Spammers are embedding HTML comments in the middle of words that should be
>caught.
>
>I've found solutions on the net for stripping HTML for every message (which
>is an attractive concept, but not what I need). What I'm looking for is a
>filter or program that will strip the HTML before it passes the message to
>the junkfilter.
>
>Has anyone implemented something like this or have any ideas?
>
>Thanks for the help,
>
>Kevin
>
>
>junkfilter, http://junkfilter.zer0.org/ -- End spam, filter it
>
>Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
One way could be to have procmail send each message first for processing
as you describe (which could be done by your external program which then
resubmits it) and then send it back through for delivery.
Perhaps the html stripper program could add an X-header to tell procmail
that the message has been through the stripper and is ready for spam
analysis prior to delivery.
In other words, there'd be a processing loop ahead of the JF stuff in
procmailrc
Just an idea.
On Thu, 8 May 2003, Kevin Ring wrote:
> Recently I've noticed a pattern in mails that are defeating the junkfilter.
> Spammers are embedding HTML comments in the middle of words that should be
> caught.
>
> I've found solutions on the net for stripping HTML for every message (which
> is an attractive concept, but not what I need). What I'm looking for is a
> filter or program that will strip the HTML before it passes the message to
> the junkfilter.
>
> Has anyone implemented something like this or have any ideas?
>
> Thanks for the help,
>
> Kevin
>
>
> junkfilter, http://junkfilter.zer0.org/ -- End spam, filter it
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>