|
Posted by Chung Leong on 03/26/06 04:36
crucialmoment wrote:
> Greetings,
> I am trying to automatically pull a beginning section from submitted
> text and return it with a More.. link. The submitted text is in html
> created by FckEditor (http://www.fckeditor.net/).
> The trouble I am running into is the cutoff point is often inside of a
> tag - ie after an opening <div> but the closing div is cut.
> The only idea I have come up with is to build an array of all possible
> html tags and search for a close for each but I am hoping there is a
> cleaner method. Has anyone attempted such a feat previously?
>
> function getSynop($input="", $more_link="", $synop_size='750') {
> $tmp_str = substr($input, 0, $synop_size);
> $end_val = strrpos($tmp_str, ">") + 1;
> if($end_val < ($synop_size)) {
> $end_val = strrpos($tmp_str, ".") + 1;
> }
> if($end_val < ($synop_size)) {
> $end_val = strrpos($tmp_str, ">") + 1;
> }
> Return substr($input, 0, $end_val) ." <a
> href='$more_link'>more...</a>";
> }
The trick here is to ignore the tags and only operate on what's between
the tags. Say if we have the following:
This is <div>a test</div> and this is only <div>a test.</div>
and we want 10 characters, we would look at "This is " and grab 8
characters. Then we look at "a test" and retain only 2 characters. As
we have want we need, we will retain 0 characters from " and this is
only " and "a test.". The end result will be:
This is <div>a </div><div></div>
Once the empty tags are discarded we end up with
This is <div>a </div>
which is want we want.
Here's an implementation of the technique:
<?
$s = 'This is some <strong>sample text</strong>. You are using <a
href="http://www.fckeditor.net/">FCKeditor</a>.';
function synop_callback($m) {
global $synop_char_to_fetch;
$tag = $m[2];
// got enough characters already, return just the tag
if($synop_char_to_fetch < 0) {
return $tag;
}
// decode HTML entities to avoid undercounting
$inner_html = $m[1];
$inner_text = html_entity_decode($inner_html);
if(strlen($inner_text) > $synop_char_to_fetch) {
// retain up to $synop_char_to_fetch, ending
// at a word boundary
$r = preg_replace("/^(.{0,$synop_char_to_fetch}\b)?.*/", '\1',
$inner_text);
$inner_html = htmlspecialchars(rtrim($r));
}
// substract the number of characters retained
$synop_char_to_fetch -= strlen($inner_text);
return "$inner_html$tag";
}
function synop_chop($s, $num) {
// chop off extra text beyond $num characters
global $synop_char_to_fetch;
$synop_char_to_fetch = $num;
$s = preg_replace_callback('/([^<]*)(<.*?>)?/s', 'synop_callback',
$s);
// collapse empty tags
do {
$r = $s;
$s = preg_replace('/<(\S*?)[^>]*?>\s*<\/\1>/i', '', $r);
} while($r != $s);
// add ellipsis
$s = preg_replace('/\.?$/', '...', trim($s), 1);
return $s;
}
echo synop_chop($s, 20);
?>
Navigation:
[Reply to this message]
|