| 
	
 | 
 Posted by Andy Dingley on 02/16/07 10:36 
On 16 Feb, 03:09, dorayme <doraymeRidT...@optusnet.com.au> wrote: 
 
> This is very easy technically. This is what I do: search for 
> instances of ids and classes in the html files by using Search 
> and Replace functions that come with any decent text editor. 
 
That sounds like hard work! 
 
I do it in Python, as there are a couple of decent HTML parsers in 
existence for it: the event-driven HTMLParser is in the box and would 
find class or id attributes very easily. BeautifulSoup is a separate 
install, but rather friendlier to use for screen-scraping in general. 
Trivial use of dictionaries (Parseltongue for associative arrays or 
hashes) counts how many duplicate ids you have in each file. 
 
I wouldn't like to do it in an editor, even though I have powerful 
ones to hand. Is emacs and lisp the favoured choice on Mars? 
 
Without Python I'd do it in a shell easily enough, but I'd use grep to 
match things and it might be confused when parsing HTML tutorials 
where the string "class=foo" occurs in the content, but not as an 
attribute.
 
  
Navigation:
[Reply to this message] 
 |