You are here: Re: Removing duplicate entries/stories from a RSS feed? « HTML « IT news, forums, messages
Re: Removing duplicate entries/stories from a RSS feed?

Posted by Paul Lutus on 12/06/06 18:25

gaikokujinkyofusho@gmail.com wrote:

> Hi, I have been enjoying being able to subscribe to RSS
> (http://kinja.com/user/thedigestibleaggie) for awhile and have come up
> with a fairly nice list of feeds but I have run into an annoying
> (though not critical) problem, duplicate stories. Apparently there is
> overlap with some of the sites I subscribe to so I get duplicate
> stories. Does anyone know of some sort of filter (software or online
> service) that can remove duplicate stories? Any help or suggestions
> would really be appreciated!

Write a script in a language that supports associative arrays (as do Java,
Perl, Ruby, Python, and even JavaScript). Key the associative array to a
unique key created out of elements in the various RSS feed items. Fill the
associative array using the generated key.

Unfortunately, it is rare for two RSS feed items to be truly identical.
Often, they tell the same story with small differences in wording (to avoid
accusations of plagiarism) and of course the URL is normally different.

Without some complex coding to detect items that are almost the same, the
above method will remove only genuinely identical items from different RSS
feeds.

--
Paul Lutus
http://www.arachnoid.com

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация