|
Posted by Chung Leong on 08/11/06 00:28
brock@gunter-smith.com wrote:
> I'd like to be able to take a string and search within it for all words
> (of the longest length possible) that are possibly contained within it
> (in sequence, we're not re-ordering the letters in the string).
> Obviously the brute force approach (which may be the only solution) is
> to iterate through a dictionary file searching for occurances of each
> entry within the string.
>
> If anyone has done anything similar to this, were there any other
> methods used to reduce the number of iterations required like using a
> list of common words that are not generally elements of other words
> that can be quickly broken out from the string? Or are there libraries
> that may be of use in efficiently processing this type of search?
>
> e.g. given the string "themeether", possible solutions might be
> {'the','meet','her'} or {'theme','ether'}
That's very similiar to the Thai word-breaking problem. Is that what
you're trying to do, in fact? There's a link listing some of the
approaches:
http://www.fi.muni.cz/~xantos/poster/#x1-3000
[Back to original message]
|