|
Posted by shimmyshack on 05/14/07 20:58
On May 14, 7:47 pm, loretta <lorb...@optonline.net> wrote:
> On May 14, 2:16 pm, shimmyshack <matt.fa...@gmail.com> wrote:
>
>
>
> > On May 14, 6:08 pm, loretta <lorb...@optonline.net> wrote:
>
> > > This code is just reading html and printing , eventually I want to
> > > modify the html. However, the original html contains javascript and
> > > the output html contains tags not in the original.
>
> > > $url = "http://www.something.com";
> > > $doc = new DOMDocument();
> > > $doc->loadHTMLFile($url);
> > > print $doc->saveHTML();
>
> > > Original html snippet:
> > > function exampleFunction() {
> > > var doc = '<html><head>';
> > > doc += '<title>Title</title>';
> > > doc += '</head>';
> > > doc += '<body onload="self.focus();">';
> > > doc += '</body></html>';
> > > }
>
> > > Html after saveHTML:
> > > function exampleFunction() {
> > > ('about:blank','imagemanagerpopup',settings);
> > > var doc = '<html><head>';
> > > doc += '<title>Title</title>';
> > > doc += '</script>
> > > </head>
> > > <body>
> > > <p>';
> > > doc += '</body>
> > > </html><html><body>
> > > <p>';
>
> > > }
>
> > > Extra tags to end the script, head and begin a new body are being
> > > added before the </body> tag and after the <body onload=self.focus()>
> > > tag in the js variable. Is there a way for the Dom to leave the
> > > javascript as is without trying to 'fix' the html ? The changes being
> > > made are causing a javascript error.
> > > Thanks
>
> > start off with xHTML, so it can be loaded with no errors, see google
> > on how to add javascript in a way that is compliant with xml standards- Hide quoted text -
>
> > - Show quoted text -
>
> The html I am retrieving has a xhtml doctype. I also have no control
> over the original webpage. The original webpage loads with no errors
> in both IE and FF.
this is what i find on google.
http://developer.mozilla.org/en/docs/Properly_Using_CSS_and_JavaScript_in_XHTML_Documents
use <!CDATA or the "xhtml" document is no such thing, btw it should
not just claim to be xhtml but should be properly validated as such,
including the content-type text/xml+xhtml (served with as .xhtml)
once you have obtained the webpage, and parsed it adding the right
instructions for the xml parser, all should work, if indeed the rest
of the doc is valid xml.
[Back to original message]
|