|
Posted by Carl Vondrick on 08/10/06 19:27
brock@gunter-smith.com wrote:
> I'd like to be able to take a string and search within it for all words
> (of the longest length possible) that are possibly contained within it
> (in sequence, we're not re-ordering the letters in the string).
> Obviously the brute force approach (which may be the only solution) is
> to iterate through a dictionary file searching for occurances of each
> entry within the string.
It sounds like you are after a LCS (Longest Common Subsequence)
implementation. Just google for "longest common subsequence" and you'll
get a thousand ways to do it. Wikipedia has one that seems to work
well: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
LCS is used in diff algorithms.
> If anyone has done anything similar to this, were there any other
> methods used to reduce the number of iterations required like using a
> list of common words that are not generally elements of other words
> that can be quickly broken out from the string? Or are there libraries
> that may be of use in efficiently processing this type of search?
>
> e.g. given the string "themeether", possible solutions might be
> {'the','meet','her'} or {'theme','ether'}
>
Navigation:
[Reply to this message]
|