|
Posted by shimmyshack on 06/08/07 12:00
On Jun 8, 11:52 am, Pavel Kalinov <pavk...@gmail.com> wrote:
> Hi all,
>
> I am trying to build an application to classify texts from a number of
> sources. I am programming it in PHP and I go "by the book" - i.e.
> calculating probabilities according to the formula etc.
> It works, but it's very slow (due to slow PHP mathematical
> implementation, I guess).
> Is there some variation of the Naive Bayes classifier which is not so
> demanding in the way of computing power used?
>
> Best
> Pavel
spamassasin's code is OS, have you checked that out?
http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Bayes.pm?view=markup
AFAIK php offloads its maths to c libraries; so your problem is that
it can be much more computationally intensive to work by the book,
with no code optimisation techniques etc... (hash tables and so on).
(A mathematician C programmer I know got their code to run in 2 days
rather than 2 weeks after some optimisation)
[Back to original message]
|