|
Posted by jim_geissman on 04/04/06 23:12
If you're willing to compare the input string with every entry in a
table, then something like Levenshtein distance or other forms of edit
distance will work. But it's not much use for a quick lookup. To do
that, you're better off with a hash function, which can be pre-computed
and indexed in the reference table. For example, discard "noise"
tokens, and for the remaining tokens compute their Soundex value (or
some similar function) and concatenate together.
Navigation:
[Reply to this message]
|