|
Posted by Steve on 10/16/07 16:41
"Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
news:NZqdncA6nNuKL4nanZ2dnUVZ_h_inZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>> news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
>>> Steve wrote:
>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>>>> Steve wrote:
>>>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>>>> Steve wrote:
>>>>>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>>>> been working on this for hours with no luck.
>>>>>>>>>
>>>>>>>>> Basically I need to parse a page for certain information which
>>>>>>>>> will be fed back into CURL to post to a site. I need to find four
>>>>>>>>> types of tags on the page:
>>>>>>>>>
>>>>>>>>> <input type=hidden name=a1 value=b1>
>>>>>>>>> <input type=text name=a2>
>>>>>>>>> <input type=submit name=a3 value=b3>
>>>>>>>>> <select name=a4>
>>>>>>>>>
>>>>>>>>> I don't need any other tags.
>>>>>>>>>
>>>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>>>> text and select types, I just need the name.
>>>>>>>>>
>>>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>>>> there may be other things between the < and > delimiters.
>>>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>>>> quotes around them, or neither.
>>>>>>>>>
>>>>>>>>> Does anyone have some code for this? It doesn't have to be all
>>>>>>>>> one regex.
>>>>>>>> alright, jer. let's see what we can do...
>>>>>>>>
>>>>>>>> here's an eyeballed attempt:
>>>>>>>>
>>>>>>>> <(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
>>>>>>>>
>>>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>>>> the above, however, it's not as straightforward and hence, not
>>>>>>>> easily maintained. if you need additional help on writing this, let
>>>>>>>> me know. i'll psuedo-code the whole enchillada if you want. this
>>>>>>>> should be sufficient in getting only those tags you listed
>>>>>>>> above...which is a good start.
>>>>>>>>
>>>>>>>> btw, make the seach caseINsensitive.
>>>>>>> Hi, Steve,
>>>>>>>
>>>>>>> Yep, it's a start. Some problems (output below), but I think it
>>>>>>> will get me a little farther.
>>>>>>>
>>>>>>> And you're right, I already gave up on getting everything in one
>>>>>>> pass. I was thinking of trying to just get everything for a single
>>>>>>> element type (i.e. all <input type=text ...> elements), but this
>>>>>>> gives me another idea, also.
>>>>>>>
>>>>>>> And the output from the first try:
>>>>>>>
>>>>>>> Array
>>>>>>> (
>>>>>>> [0] => Array
>>>>>>> (
>>>>>>> [0] => <select n
>>>>>>> [1] => <select n
>>>>>>> [2] => <select n
>>>>>>> )
>>>>>>>
>>>>>>> [1] => Array
>>>>>>> (
>>>>>>> [0] => select n
>>>>>>> [1] => select n
>>>>>>> [2] => select n
>>>>>>> )
>>>>>>>
>>>>>>> [2] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> [3] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> [4] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> )
>>>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>>>> fix that...if you want.
>>>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>>>> through this myself :-)
>>>> ok, here's the one to get the select:
>>>>
>>>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*?[^>]
>>>>
>>>> here's the one to break out the inputs and capture each type, name, and
>>>> value:
>>>>
>>>> (input)\s*?[^n].*?(?:(name|type|value)
>>
>> hey...did you notice this above? it should be [^ntv]
>>
>> they may account for some of the wierdness. ;^)
>
> Yep, and I got it working. Thanks again!
awesome! any time. and, i'm sure i'll have plenty of questions i need help
answering in the future.
l8r.
Navigation:
[Reply to this message]
|