Re: Weighted Lists — PHP — IT news, forums, messages

You are here: Re: Weighted Lists « PHP « IT news, forums, messages

Posted by Matthew Weier O'Phinney on 02/09/05 20:20

* W Luke <wtluke@gmail.com>:
> I've been fascinated by Flickr's, del.icio.us and other sites' usage
> of these Weighted Lists. It's simple but effective and I really want
> to use it for a project I'm doing.
>
> So I had a look at Nick Olejniczak's plugin for Wordpress (available
> here: www.nicholasjon.com) but am struggling to understand the logic
> behind it.
>
> What I need is to dump all words (taken from the DB) from just one
> column into an array. Filter out common words
> (the,a,it,at,you,me,he,she etc), then calculate most frequent words to
> provide the weighted list. Has anyone attempted this?

Funny you should mention this -- I'm working on something like this
right now for work.

Basically, you need to:

* define a list of common words to skip
* define weighting (I weight items in a title and in text differently,
for instance -- usually you weight by which field you're using); store
weighting in an associative array
* define a weights array (associative array of word => score)
* separate all text from the column into words (build a words array)
* loop over the words array
* skip if the word is a common word
* increment word element in weights array by the weight

The sticky issues are: what is a word (you'll need to build a regexp for
that), and how will you weight words (usually by field). Once you have
all this, you populate a database table for use as a reverse lookup.

For a good example of how to do this (in perl), see:

http://www.perl.com/lpt/a/2003/09/25/searching.html

--
Matthew Weier O'Phinney | WEBSITES:
Webmaster and IT Specialist | http://www.garden.org
National Gardening Association | http://www.kidsgardening.com
802-863-5251 x156 | http://nationalgardenmonth.org
mailto:matthew@garden.org | http://vermontbotanical.org

Navigation:

Next in thread: Re: [PHP] Re: Weighted Lists
Prev in thread: Weighted Lists
Next in forum: Re: [PHP] Re: stream_set_timeout() stream_get_meta_data() etc...
Prev in forum: Prevent browser back...
Thread view: Weighted Lists

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация