|
Posted by Andy Dingley on 02/16/07 10:36
On 16 Feb, 03:09, dorayme <doraymeRidT...@optusnet.com.au> wrote:
> This is very easy technically. This is what I do: search for
> instances of ids and classes in the html files by using Search
> and Replace functions that come with any decent text editor.
That sounds like hard work!
I do it in Python, as there are a couple of decent HTML parsers in
existence for it: the event-driven HTMLParser is in the box and would
find class or id attributes very easily. BeautifulSoup is a separate
install, but rather friendlier to use for screen-scraping in general.
Trivial use of dictionaries (Parseltongue for associative arrays or
hashes) counts how many duplicate ids you have in each file.
I wouldn't like to do it in an editor, even though I have powerful
ones to hand. Is emacs and lisp the favoured choice on Mars?
Without Python I'd do it in a shell easily enough, but I'd use grep to
match things and it might be confused when parsing HTML tutorials
where the string "class=foo" occurs in the content, but not as an
attribute.
[Back to original message]
|