|
Posted by Steve Edberg on 06/20/05 21:08
At 6:00 PM +0200 6/20/05, Cilliè wrote:
>>out of interest what are you trying/going to do with
>>such a list?
>
>playing with categorizing stuff based on word frequency and
>relevance to other stuff with similar word frequency.
>"the" will give a lot of false positives :)
You might want to look at full text indexing or text analysis/data
mining software, eg:
http://www.textanalysis.info/
You could also check the stop-word lists from MySQL's fulltext
indexing, or from search engines like htdig...
http://dev.mysql.com/doc/mysql/en/fulltext-search.html
http://www.htdig.org/
Googling for the phrase "stop word list" also may be useful
steve
--
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center sbedberg@ucdavis.edu |
| Bioinformatics programming/database/sysadmin (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+
Navigation:
[Reply to this message]
|