Diff for words in string — PHP Programming Language

You are here: Diff for words in string « PHP Programming Language « IT news, forums, messages

Posted by Csaba Gabor on 06/21/06 09:56

I'm comparing the text of (snippets of) web pages which I expect to be
quite different or quite similar. In the case where they are similar,
I would like to display the more recent one and say something like:
Word 2 added [before word 2 in original]: "Jack be nimble"

Words 10-11 changed to: "the quick brown fox"
[from words 9-11 in original]: "the brown fast quick fox"

Words [20-22 in original] before word 20 removed: "sat in a corner on"

One way to do this is to replace all spacing chars with \n, write the
strings to two files, and then run diff (FC on my Win XP Pro = file
compare), collecting the output. Does anyone happen to have PHP code
for this where I don't have to write files? Note that the diff is on
words of the strings and not characters.

In particular, the normal algorithms for this (longest common
subsequence) only produce a number, and don't note the differences.

Also, I wanted to give a threshhold (about 10, say) and if the longest
common subsequence differs from the shorter string (strings are on the
order of 100 words) by at least this amount, then simply fail (since
the difference between the two strings would be deemed to great). This
should make the algorithm far more efficient. The corresponding
argument in FC would be /LB10.

Thanks,
Csaba Gabor from Vienna

Ref: http://www.ics.uci.edu/~dan/class/161/notes/6/Dynamic.html
http://en.wikipedia.org/wiki/Longest-common_subsequence_problem
Note that this problem is distinct from the longest common substring
problem.

Navigation:

Next in forum: Re: Wiki style text input
Prev in forum: Re: Installation question MS IIS
Thread view: Diff for words in string

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация