|
Posted by Toby A Inkster on 06/11/07 16:53
Pavel Kalinov wrote:
> BTW, I am not trying to make a spam filter, but to sort news articles in
> a number of categories (16 at present, as test). And I need
> milliseconds, not days :-(
Still, SpamAssassin might be what you're looking for.
Turn off all SA's non-Bayes scoring, and then feed SA a corpus of say, 500
sports articles, telling it that they're "spam"; then 500 non-sports
articles, telling them they're "ham". After this preparation, your SA
configuration should be primed to detect sports articles.
Another 15 SA configurations, and your setup should be complete.
With SA, one user can have multiple configurations using the "--configpath"
command-line option.
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 108 days, 16 min.]
URLs in demiblog
http://tobyinkster.co.uk/blog/2007/05/31/demiblog-urls/
[Back to original message]
|