|
Posted by Steve on 10/15/07 21:58
"Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>> Steve wrote:
>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>> Steve wrote:
>>>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>> been working on this for hours with no luck.
>>>>>>>
>>>>>>> Basically I need to parse a page for certain information which will
>>>>>>> be fed back into CURL to post to a site. I need to find four types
>>>>>>> of tags on the page:
>>>>>>>
>>>>>>> <input type=hidden name=a1 value=b1>
>>>>>>> <input type=text name=a2>
>>>>>>> <input type=submit name=a3 value=b3>
>>>>>>> <select name=a4>
>>>>>>>
>>>>>>> I don't need any other tags.
>>>>>>>
>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>> text and select types, I just need the name.
>>>>>>>
>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>> there may be other things between the < and > delimiters.
>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>> quotes around them, or neither.
>>>>>>>
>>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>>> regex.
>>>>>> alright, jer. let's see what we can do...
>>>>>>
>>>>>> here's an eyeballed attempt:
>>>>>>
>>>>>> <(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
>>>>>>
>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>> the above, however, it's not as straightforward and hence, not easily
>>>>>> maintained. if you need additional help on writing this, let me know.
>>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>>> sufficient in getting only those tags you listed above...which is a
>>>>>> good start.
>>>>>>
>>>>>> btw, make the seach caseINsensitive.
>>>>> Hi, Steve,
>>>>>
>>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>>> get me a little farther.
>>>>>
>>>>> And you're right, I already gave up on getting everything in one pass.
>>>>> I was thinking of trying to just get everything for a single element
>>>>> type (i.e. all <input type=text ...> elements), but this gives me
>>>>> another idea, also.
>>>>>
>>>>> And the output from the first try:
>>>>>
>>>>> Array
>>>>> (
>>>>> [0] => Array
>>>>> (
>>>>> [0] => <select n
>>>>> [1] => <select n
>>>>> [2] => <select n
>>>>> )
>>>>>
>>>>> [1] => Array
>>>>> (
>>>>> [0] => select n
>>>>> [1] => select n
>>>>> [2] => select n
>>>>> )
>>>>>
>>>>> [2] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [3] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [4] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> )
>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>> fix that...if you want.
>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>> through this myself :-)
>>
>> ok, here's the one to get the select:
>>
>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*?[^>]
>>
>> here's the one to break out the inputs and capture each type, name, and
>> value:
>>
>> (input)\s*?[^n].*?(?:(name|type|value)
hey...did you notice this above? it should be [^ntv]
they may account for some of the wierdness. ;^)
Navigation:
[Reply to this message]
|