You are here: Re: Pulling a synopsis from text « PHP Programming Language « IT news, forums, messages
Re: Pulling a synopsis from text

Posted by Chung Leong on 03/26/06 04:36

crucialmoment wrote:
> Greetings,
> I am trying to automatically pull a beginning section from submitted
> text and return it with a More.. link. The submitted text is in html
> created by FckEditor (http://www.fckeditor.net/).
> The trouble I am running into is the cutoff point is often inside of a
> tag - ie after an opening <div> but the closing div is cut.
> The only idea I have come up with is to build an array of all possible
> html tags and search for a close for each but I am hoping there is a
> cleaner method. Has anyone attempted such a feat previously?
>
> function getSynop($input="", $more_link="", $synop_size='750') {
> $tmp_str = substr($input, 0, $synop_size);
> $end_val = strrpos($tmp_str, ">") + 1;
> if($end_val < ($synop_size)) {
> $end_val = strrpos($tmp_str, ".") + 1;
> }
> if($end_val < ($synop_size)) {
> $end_val = strrpos($tmp_str, ">") + 1;
> }
> Return substr($input, 0, $end_val) ." <a
> href='$more_link'>more...</a>";
> }

The trick here is to ignore the tags and only operate on what's between
the tags. Say if we have the following:

This is <div>a test</div> and this is only <div>a test.</div>

and we want 10 characters, we would look at "This is " and grab 8
characters. Then we look at "a test" and retain only 2 characters. As
we have want we need, we will retain 0 characters from " and this is
only " and "a test.". The end result will be:

This is <div>a </div><div></div>

Once the empty tags are discarded we end up with

This is <div>a </div>

which is want we want.

Here's an implementation of the technique:

<?

$s = 'This is some <strong>sample text</strong>. You are using <a
href="http://www.fckeditor.net/">FCKeditor</a>.';

function synop_callback($m) {
global $synop_char_to_fetch;
$tag = $m[2];

// got enough characters already, return just the tag
if($synop_char_to_fetch < 0) {
return $tag;
}

// decode HTML entities to avoid undercounting
$inner_html = $m[1];
$inner_text = html_entity_decode($inner_html);

if(strlen($inner_text) > $synop_char_to_fetch) {
// retain up to $synop_char_to_fetch, ending
// at a word boundary
$r = preg_replace("/^(.{0,$synop_char_to_fetch}\b)?.*/", '\1',
$inner_text);
$inner_html = htmlspecialchars(rtrim($r));
}

// substract the number of characters retained
$synop_char_to_fetch -= strlen($inner_text);
return "$inner_html$tag";
}

function synop_chop($s, $num) {
// chop off extra text beyond $num characters
global $synop_char_to_fetch;
$synop_char_to_fetch = $num;
$s = preg_replace_callback('/([^<]*)(<.*?>)?/s', 'synop_callback',
$s);

// collapse empty tags
do {
$r = $s;
$s = preg_replace('/<(\S*?)[^>]*?>\s*<\/\1>/i', '', $r);
} while($r != $s);

// add ellipsis
$s = preg_replace('/\.?$/', '...', trim($s), 1);
return $s;
}

echo synop_chop($s, 20);

?>

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация