|
Posted by Chris F.A. Johnson on 07/20/07 20:09
On 2007-07-20, M wrote:
> I thank all for some of your suggestions but most of them deal with CSS and
> not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
> etc. Maybe I'm coming at this the wrong way.
>
> As I mentioned, Notetab's script language does most stuff for me. In order
> to strip out CSS though I need to strip out phrases like:
> id="something"
> class="something"
> style="bunch of css attributes"
>
> I've been playing around with Notetab's (v4.95) regular expression search
> and replace but I can't seem to find a combination that finds the above
> expressions.
>
> Is there a regular expression program that will break this down for me? For
> example, the program RegEx Coach lets you enter your text, then test various
> regular expressions. The results are highlighted in real time in the text
> you entered.
I have no idea how standard notepad's regular expression syntax is,
but this would match embedded style in *nix utilities:
style="[^"]*"
For example, with sed, this will remove all so long as there are no
quotes within the style themselves:
sed 's/style="[^"]*"//' index.html > newindex.html
> I need something that works IN REVERSE. i.e. I enter text, highlight the
> expression I want removed, then it tells me the regular expression needed to
> achieve that.
>
> Anything like that out there?
No. If you wanted to match the 12345 in abc12345def, the regex
could be any of:
abc\(123[0-9]5*\)def
abc\(1234[0-9]*\)def
abc\([0-9]*\)def
[a-z][a-z][a-z]\([0-9]*\)[a-z][a-z][a-z]
[a-z]bc\([0-9]*\)de[a-z]
... and an infinite number of other expressions.
--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Navigation:
[Reply to this message]
|