|
Posted by Jerry Stuckle on 10/15/07 20:47
Steve wrote:
> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>> Steve wrote:
>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>> Steve wrote:
>>>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>>>>> working on this for hours with no luck.
>>>>>>
>>>>>> Basically I need to parse a page for certain information which will be
>>>>>> fed back into CURL to post to a site. I need to find four types of
>>>>>> tags on the page:
>>>>>>
>>>>>> <input type=hidden name=a1 value=b1>
>>>>>> <input type=text name=a2>
>>>>>> <input type=submit name=a3 value=b3>
>>>>>> <select name=a4>
>>>>>>
>>>>>> I don't need any other tags.
>>>>>>
>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>> text and select types, I just need the name.
>>>>>>
>>>>>> I can assume the attributes will always show up in this order, but
>>>>>> there may be other things between the < and > delimiters.
>>>>>> Additionally, the actual type and name may have single or double
>>>>>> quotes around them, or neither.
>>>>>>
>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>> regex.
>>>>> alright, jer. let's see what we can do...
>>>>>
>>>>> here's an eyeballed attempt:
>>>>>
>>>>> <(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
>>>>>
>>>>> to keep it easier, i'd think about using that to get your general
>>>>> matches. iterating through those, i'd apply another regex to break out
>>>>> the name, type, and value. you could very well catch it all in the
>>>>> above, however, it's not as straightforward and hence, not easily
>>>>> maintained. if you need additional help on writing this, let me know.
>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>> sufficient in getting only those tags you listed above...which is a
>>>>> good start.
>>>>>
>>>>> btw, make the seach caseINsensitive.
>>>> Hi, Steve,
>>>>
>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>> get me a little farther.
>>>>
>>>> And you're right, I already gave up on getting everything in one pass. I
>>>> was thinking of trying to just get everything for a single element type
>>>> (i.e. all <input type=text ...> elements), but this gives me another
>>>> idea, also.
>>>>
>>>> And the output from the first try:
>>>>
>>>> Array
>>>> (
>>>> [0] => Array
>>>> (
>>>> [0] => <select n
>>>> [1] => <select n
>>>> [2] => <select n
>>>> )
>>>>
>>>> [1] => Array
>>>> (
>>>> [0] => select n
>>>> [1] => select n
>>>> [2] => select n
>>>> )
>>>>
>>>> [2] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> [3] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> [4] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> )
>>> well, that's no so good a start! i'll break out the old regex ide and fix
>>> that...if you want.
>> If you have the time, I would appreciate it. Otherwise I can struggle
>> through this myself :-)
>
> ok, here's the one to get the select:
>
> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*?[^>]
>
> here's the one to break out the inputs and capture each type, name, and
> value:
>
> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2>]*?)\2?(?:\s)?)*?>
>
> the problem with this one though, is that it debugs fine in 'the regulator'
> regex ide. however, some of the captures are being overwritten under
> preg_match_all.
>
> the implementation would have been an array of these two patterns. preg
> should return the type (select or input)...from that point, you'd know where
> in the matches to find the type, name, and value regardless of the order in
> which it came. as it is, you can use $matches[0][...n] on the input pattern
> matches to iterate the full input match.
>
> hope that helps.
>
>
>
Thanks much, Steve! I think I can make it from here.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|