|
Posted by Peter Fox on 06/09/05 11:21
Following on from Hans Gruber's message. . .
>My problem has to do with HTML tags. If for example an entry contains a
><BLOCKQUOTE> with a large quote, my function would break off somewhere
>halfway in the quote. The end result of course won`t have the
></BLOCKQUOTE>, rendering the resulting page horribly bad.
>
>I would like to build a function that breaks a string up to max X
>characters long, but plays it safe when it encounters any HTML tag: it
>does not matter if the end result is a string of say 670 characters long,
>it only matters that it approximates the max character setting and doesn`t
>mess up the HTML tags.
A simple way would be to decide where your end point was going to be
roughly (not inside <...>) then leave all the remaining tags but remove
the text.
The reason for putting all the following tags in is that you can have
complex nested structures where you'd have to do lots of complicated
parsing - just not worth the effort. Also the entry could start with
say <center> and end with </center> many pages apart.
eg.
1 - split string to get 1st X chars and work with remainder of string
2 - explode remainder by '<' so that tags _except possibly in array[0]_
will be the first part and therefore look like "ATAG>some text" (or
"/ATAG>some text")
3 - if array[0] doesn't contain a '>' this is tail of a tag
(NB /sort of/ there are two exceptions - no more tags at all and this
tag followed immediately by another in which case '>' would appear as
last character if you see what I mean)
4 - Now strip the bits after '>' from the array , implode with '<' and
add to end of text.
--
PETER FOX Not the same since the pancake business flopped
peterfox@eminent.demon.co.uk.not.this.bit.no.html
2 Tees Close, Witham, Essex.
Gravity beer in Essex <http://www.eminent.demon.co.uk>
[Back to original message]
|