|
Posted by p.lepin on 10/20/06 08:03
ChianHsieh@gmail.com wrote:
> I face some problem that I want to filter the all words
> in HTML.
>
> Before Filter:
> <div id="pp"> hello man <br/> Thank's for your answer.
> </div>
>
> After Filter:
> <div id="pp"> <br/> </div>
Forget regexes. As the saying goes, 'You cannot parse HTML
with regexes'. There's also no reason to write your own
HTML parser -- there already are more than enough of those.
XSLT was meant exactly for this type of processing, and it
doesn't really care what you're processing, as long as it's
a DOMDocument.
Using PHP5's DOM and XSL modules:
<?php
$xml_str =
'<div id="pp"><p> hello man <br/> Thank\'s for your ' .
'answer. </div>' ;
$xsl_str =
'<xsl:stylesheet ' .
' xmlns:xsl="http://www.w3.org/1999/XSL/Transform" ' .
' version="1.0">' .
' <xsl:template match="node()|@*">' .
' <xsl:copy>' .
' <xsl:apply-templates select="node()|@*"/>' .
' </xsl:copy>' .
' </xsl:template>' .
' <xsl:template match="html">' .
' <xsl:apply-templates/>' .
' </xsl:template>' .
' <xsl:template match="body">' .
' <result>' .
' <xsl:apply-templates/>' .
' </result>' .
' </xsl:template>' .
' <xsl:template match="text()"/>' .
' </xsl:stylesheet>' ;
$xml = DOMDocument :: loadHTML ( $xml_str ) ;
$xsl = DOMDocument :: loadXML ( $xsl_str ) ;
$xform = new XSLTProcessor ( ) ;
$xform -> importStylesheet ( $xsl ) ;
$result = $xform -> transformToDoc ( $xml ) ;
header ( 'Content-type: text/xml' ) ;
print ( $result -> saveXML ( ) ) ;
?>
If you're using real XHTML (as opposed to mumbo jumbo tag
soup pretending to be XHTML), it's even better, because you
don't have to pretend you're processing XML. XHTML *is*
XML.
--
Pavel Lepin
Navigation:
[Reply to this message]
|