|
Posted by Pavel Kalinov on 06/11/07 01:39
Thanks, I didn't know this - will look into it.
BTW, I am not trying to make a spam filter, but to sort news articles in
a number of categories (16 at present, as test). And I need
milliseconds, not days :-(
Best
Pavel
shimmyshack wrote:
> On Jun 8, 11:52 am, Pavel Kalinov <pavk...@gmail.com> wrote:
>> Hi all,
>>
>> I am trying to build an application to classify texts from a number of
>> sources. I am programming it in PHP and I go "by the book" - i.e.
>> calculating probabilities according to the formula etc.
>> It works, but it's very slow (due to slow PHP mathematical
>> implementation, I guess).
>> Is there some variation of the Naive Bayes classifier which is not so
>> demanding in the way of computing power used?
>>
>> Best
>> Pavel
>
> spamassasin's code is OS, have you checked that out?
> http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Bayes.pm?view=markup
> AFAIK php offloads its maths to c libraries; so your problem is that
> it can be much more computationally intensive to work by the book,
> with no code optimisation techniques etc... (hash tables and so on).
> (A mathematician C programmer I know got their code to run in 2 days
> rather than 2 weeks after some optimisation)
>
[Back to original message]
|