You are here: Re: Looking for a search engine that search a mysql database « PHP Programming Language « IT news, forums, messages
Re: Looking for a search engine that search a mysql database

Posted by Rik on 05/10/06 06:21

Drakazz wrote:
> Full text search is mostly used. About the 200 characters I am not
> sure.

No idea, but two methods come to mind:
Assuming $text is the returned text from the database, and $string is the
searchword:

Normal functions:

$occurance = stripos($text, $string);
$start = ($occurance-100 < 0) ? 0: $occurance-100;
$display = substr($start, 200 + strlen($text));

Advantage is it's quick, disadvantage is will only find the first occurance,
and will cut up words.

A probably more versatile method are regular expressions:

$chars = 100; // (the desired characters before and after)
$allowword = 20; //extra characters allowed to find a word boundary

$allow = $chars + $allowword;
$else = $chars-1; //pff, naming variables is a drag

$search = preg_quote($string, '/'); //escape all characters that could have
special meaning:

preg_match_all('/(^(?:.){0,'.$else.'}|\b(?:.){'.$chars.','.$allow.'})('.$sea
rch.')((?:.){'.$chars.','.$allow.'}\b|(?:.){0,'.$else.'}$)/si', $text,
$matches, PREG_SET_ORDER);

Now you have an array $matches, that contains the searchstring and
surrounding $chars characters. The expressions tries to keep words whole,
with a maximum of extra characters given bij $allowword. It's no problem
when there aren't that many characters in front or behind the searchstring,
in that case the matchs just returns from the beginning or untill the end
respectively.

$matches is now an array, containg:
$matches[index_of_match][0] = The entire text.
$matches[index_of_match][1] = The preceeding text.
$matches[index_of_match][2] = The searchstring.
$matches[index_of_match][3] = The proceeding text (? don't know wether this
is good english)

Matches can be diplayed like:
foreach($matches as $match){
print $match[0];
}

But maybe you want to highlight your searchstring, no problem:

foreach($matches as $match){
print $match[1].'<span
class="highlight">'.$match[2].'</span>'.$match[3];
}

When looking for several words, you could even change the search string like
this:

$searcharray = array('searchstring','some other word', 'yet another');
$search = implode('|',array_map('preg_quote', $searcharray));

And just apply the same regex. Note that will give back a match for each
word seperately. How to prevent those "double" matches is a whole other
ballgame. Coming here I realize that even searching for one term could give
you doubles.

Highlighting the other searchterms can't be done using just the matches
array. While keeping the double entries, every searchterm can be highlighted
like:

foreach($matches as $match){
print preg_replace('/('.$search.')/si', '<span
class="highlight">\1</span>', $match[0]);
}

Doubles could be prevented by using PREG_OFFSET_CAPTURE in the folowwing
regex:

$searcharray = array('searchstring','some other string', 'yet another');
$search = implode('|',array_map('preg_quote', $searcharray));
preg_match_all('/'.$search.'/si',$text, $matches,PREG_OFFSET_CAPTURE);

And then looping through $matches[0], gathering the surrounding text with
preg_matches on substrings (makes it a lot quicker), and checking wether or
not the offset of the following match is "within reach".

Create a substring from the text from searchterms close to eachother, with
max allowed characters +1 on either side.

pregmatch('/(\b{'.$chars.','.$allow.'}|^.{'./*exact number of preceeding
chars*/'}).{'./*exact_length from first offset to last offset plus
stringlength last searchterm*/.'}(.{'.$chars.','.$allow.'}\b|.{'./*exact
number of proceeding chars*/'}$)/si', $substring, $combinations.
PREG_SET_ORDER);

foreach($combinations as $final){
print preg_replace('/('.$search.')/si', '<span
class="highlight">\1</span>', $final[0]);
}


Grtz,

--
Rik Wasmus

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация