|
Posted by Jerry Stuckle on 10/16/07 12:43
Steve wrote:
> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
> news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
>> Steve wrote:
>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>>> Steve wrote:
>>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>>> Steve wrote:
>>>>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>>> been working on this for hours with no luck.
>>>>>>>>
>>>>>>>> Basically I need to parse a page for certain information which will
>>>>>>>> be fed back into CURL to post to a site. I need to find four types
>>>>>>>> of tags on the page:
>>>>>>>>
>>>>>>>> <input type=hidden name=a1 value=b1>
>>>>>>>> <input type=text name=a2>
>>>>>>>> <input type=submit name=a3 value=b3>
>>>>>>>> <select name=a4>
>>>>>>>>
>>>>>>>> I don't need any other tags.
>>>>>>>>
>>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>>> text and select types, I just need the name.
>>>>>>>>
>>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>>> there may be other things between the < and > delimiters.
>>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>>> quotes around them, or neither.
>>>>>>>>
>>>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>>>> regex.
>>>>>>> alright, jer. let's see what we can do...
>>>>>>>
>>>>>>> here's an eyeballed attempt:
>>>>>>>
>>>>>>> <(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
>>>>>>>
>>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>>> the above, however, it's not as straightforward and hence, not easily
>>>>>>> maintained. if you need additional help on writing this, let me know.
>>>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>>>> sufficient in getting only those tags you listed above...which is a
>>>>>>> good start.
>>>>>>>
>>>>>>> btw, make the seach caseINsensitive.
>>>>>> Hi, Steve,
>>>>>>
>>>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>>>> get me a little farther.
>>>>>>
>>>>>> And you're right, I already gave up on getting everything in one pass.
>>>>>> I was thinking of trying to just get everything for a single element
>>>>>> type (i.e. all <input type=text ...> elements), but this gives me
>>>>>> another idea, also.
>>>>>>
>>>>>> And the output from the first try:
>>>>>>
>>>>>> Array
>>>>>> (
>>>>>> [0] => Array
>>>>>> (
>>>>>> [0] => <select n
>>>>>> [1] => <select n
>>>>>> [2] => <select n
>>>>>> )
>>>>>>
>>>>>> [1] => Array
>>>>>> (
>>>>>> [0] => select n
>>>>>> [1] => select n
>>>>>> [2] => select n
>>>>>> )
>>>>>>
>>>>>> [2] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> [3] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> [4] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> )
>>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>>> fix that...if you want.
>>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>>> through this myself :-)
>>> ok, here's the one to get the select:
>>>
>>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*?[^>]
>>>
>>> here's the one to break out the inputs and capture each type, name, and
>>> value:
>>>
>>> (input)\s*?[^n].*?(?:(name|type|value)
>
> hey...did you notice this above? it should be [^ntv]
>
> they may account for some of the wierdness. ;^)
>
>
>
Yep, and I got it working. Thanks again!
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
[Back to original message]
|