Posted by SDG on 09/19/07 12:30
Hi, I'm writing a web scraper to extract text from a web page, and I
need to know what characters can be present inside an attribute of a
tag.
So far, in the code of my program, I've written that attributes can
contain this characters: '!=@/ \[]#.:_()-&;?
Did I forget something? I've looked if there's an official
specification (like a regular expression for HTML or even only for
attributes), but so far I haven't found anything.
Thanks a lot
[Back to original message]
|