|
Posted by Ben C on 07/20/07 16:35
On 2007-07-20, M <nowhereman@twilightzone.net> wrote:
> I thank all for some of your suggestions but most of them deal with CSS and
> not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
> etc. Maybe I'm coming at this the wrong way.
>
> As I mentioned, Notetab's script language does most stuff for me. In order
> to strip out CSS though I need to strip out phrases like:
> id="something"
> class="something"
> style="bunch of css attributes"
> I've been playing around with Notetab's (v4.95) regular expression search
> and replace but I can't seem to find a combination that finds the above
> expressions.
(style|id|class)=".*?"
is your basic regexp for that in PCRE, which I think is what Notetab
uses. Not too difficult.
It reads 'style or id or class followed by =" and then everything up to
the next "'
> Is there a regular expression program that will break this down for me? For
> example, the program RegEx Coach lets you enter your text, then test various
> regular expressions. The results are highlighted in real time in the text
> you entered.
>
> I need something that works IN REVERSE. i.e. I enter text, highlight the
> expression I want removed, then it tells me the regular expression needed to
> achieve that.
That's very difficult for the program to know-- there are a vast number
of ways to match a given bit of highlighted text, how is the program
supposed to know which of them you want?
> Anything like that out there?
Honestly it's easier just to read the manual. The Python docs have a
very clear explanation of PCRE syntax.
http://docs.python.org/lib/re-syntax.html
Navigation:
[Reply to this message]
|