Reply to Re: Regex problem

Your name:

Reply:


Posted by Alan on 03/28/07 13:42

"Razvan" <defconhaya@gmail.com> wrote in message
news:1174814928.079661.94330@n76g2000hsh.googlegroups.com...
> On Mar 24, 1:45 pm, "Alan" <a...@spamless.net> wrote:
>> "Razvan" <defconh...@gmail.com> wrote in message
>>
>> news:1174729037.379978.230910@o5g2000hsb.googlegroups.com...
>>
>>
>>
>> > Hello there,
>>
>> > I have the following problem:
>> > I have a big html and i want to remove from it everything between some
>> > tags and to keep the rest, of course using regex, but any solution
>> > will be great.
>> > The number and type of tags may vary. Here is an example:
>>
>> > <body>
>> > text text text text text text text
>> > text text text
>> > text text text text
>>
>> > <remove1>
>> > text text text text text text
>> > text text
>> > text
>> > text text text
>> > </remove1>
>>
>> > text text text
>> > text text
>>
>> > <remove1>
>> > text text text text
>> > </remove1>
>>
>> > text text
>> > text text
>> > text text text
>>
>> > <remove2>
>> > text text text text text
>> > text text text
>> > text text
>> > </remove2>
>>
>> > text text text text text
>> > text text text text
>> > </body>
>>
>> > Any suggestions will be appreciated !
>> > Thanks.
>>
>> regex search and replace with <(/?[^\>]+)> and "" leaves just your text
>> text
>> text etc
>>
>> Possible some flavours may need escaping: \<(/?[^\>]+)\>
>> hth
>>
>> Alan
>
> i dont understand what are you trying to say. i want to remove
> everything between <removeX> and </removeX> including tags.
>

Sorry, didn't read your post carefully enough. As no other response,
perhaps this may help:

Similar to your original:

<body>
text text text text text text text
text text text
text text text text

<remove1>
text text text text text text
text text
text
text text text
</remove1>

text text text
text text

<anotherremove1>
text text text text
</anotherremove1>

text text
text text
text text text

<remove2>
text text text text text
text text text
text text
</remove2>

text text text text text
text text text text
</body>

Processing this with basically:

(?<=<[ra])(.+\s)+|<[ra]

eg: php processing the file with
$RegStr = '/(?<=<[ra])(.+\s)+|<[ra]/mi';
$OutStr = preg_replace($RegStr,"",$TstStr);
with $TstStr containing the file contents.

will do what you (I think!) want.
Outputs

<body>
text text text text text text text
text text text
text text text text


text text text
text text


text text
text text
text text text


text text text text text
text text text text
</body>


You will need to define the contents of the [ ] enough to identify the
tags and contents you want to remove. Don't know whether this is the best
(simplest?) way to achieve what you want.

If you process the file with a regex search and replace, it will need a
positive look behind assertion capability.

hth
Alan

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация