|  | Posted by Chris F.A. Johnson on 07/20/07 20:09 
On 2007-07-20, M wrote:> I thank all for some of your suggestions but most of them deal with CSS and
 > not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
 > etc. Maybe I'm coming at this the wrong way.
 >
 > As I mentioned, Notetab's script language does most stuff for me. In order
 > to strip out CSS though I need to strip out phrases like:
 > id="something"
 > class="something"
 > style="bunch of css attributes"
 >
 > I've been playing around with Notetab's (v4.95) regular expression search
 > and replace but I can't seem to find a combination that finds the above
 > expressions.
 >
 > Is there a regular expression program that will break this down for me? For
 > example, the program RegEx Coach lets you enter your text, then test various
 > regular expressions. The results are highlighted in real time in the text
 > you entered.
 
 I have no idea how standard notepad's regular expression syntax is,
 but this would match embedded style in *nix utilities:
 
 style="[^"]*"
 
 For example, with sed, this will remove all so long as there are no
 quotes within the style themselves:
 
 sed 's/style="[^"]*"//' index.html > newindex.html
 
 > I need something that works IN REVERSE. i.e. I enter text, highlight the
 > expression I want removed, then it tells me the regular expression needed to
 > achieve that.
 >
 > Anything like that out there?
 
 No. If you wanted to match the 12345 in abc12345def, the regex
 could be any of:
 
 abc\(123[0-9]5*\)def
 abc\(1234[0-9]*\)def
 abc\([0-9]*\)def
 [a-z][a-z][a-z]\([0-9]*\)[a-z][a-z][a-z]
 [a-z]bc\([0-9]*\)de[a-z]
 
 ... and an infinite number of other expressions.
 
 
 --
 Chris F.A. Johnson                      <http://cfaj.freeshell.org>
 ===================================================================
 Author:
 Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
  Navigation: [Reply to this message] |