|
Posted by Jerry Stuckle on 10/15/07 14:36
Steve wrote:
> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>> Steve wrote:
>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>>> working on this for hours with no luck.
>>>>
>>>> Basically I need to parse a page for certain information which will be
>>>> fed back into CURL to post to a site. I need to find four types of tags
>>>> on the page:
>>>>
>>>> <input type=hidden name=a1 value=b1>
>>>> <input type=text name=a2>
>>>> <input type=submit name=a3 value=b3>
>>>> <select name=a4>
>>>>
>>>> I don't need any other tags.
>>>>
>>>> From the hidden and submit types, I need name and value. From the text
>>>> and select types, I just need the name.
>>>>
>>>> I can assume the attributes will always show up in this order, but there
>>>> may be other things between the < and > delimiters. Additionally, the
>>>> actual type and name may have single or double quotes around them, or
>>>> neither.
>>>>
>>>> Does anyone have some code for this? It doesn't have to be all one
>>>> regex.
>>> alright, jer. let's see what we can do...
>>>
>>> here's an eyeballed attempt:
>>>
>>> <(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
>>>
>>> to keep it easier, i'd think about using that to get your general
>>> matches. iterating through those, i'd apply another regex to break out
>>> the name, type, and value. you could very well catch it all in the above,
>>> however, it's not as straightforward and hence, not easily maintained. if
>>> you need additional help on writing this, let me know. i'll psuedo-code
>>> the whole enchillada if you want. this should be sufficient in getting
>>> only those tags you listed above...which is a good start.
>>>
>>> btw, make the seach caseINsensitive.
>> Hi, Steve,
>>
>> Yep, it's a start. Some problems (output below), but I think it will get
>> me a little farther.
>>
>> And you're right, I already gave up on getting everything in one pass. I
>> was thinking of trying to just get everything for a single element type
>> (i.e. all <input type=text ...> elements), but this gives me another idea,
>> also.
>>
>> And the output from the first try:
>>
>> Array
>> (
>> [0] => Array
>> (
>> [0] => <select n
>> [1] => <select n
>> [2] => <select n
>> )
>>
>> [1] => Array
>> (
>> [0] => select n
>> [1] => select n
>> [2] => select n
>> )
>>
>> [2] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> [3] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> [4] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> )
>
> well, that's no so good a start! i'll break out the old regex ide and fix
> that...if you want.
>
>
>
If you have the time, I would appreciate it. Otherwise I can struggle
through this myself :-)
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|