|
Posted by Jerry Stuckle on 10/15/07 10:15
Steve wrote:
> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>> working on this for hours with no luck.
>>
>> Basically I need to parse a page for certain information which will be fed
>> back into CURL to post to a site. I need to find four types of tags on
>> the page:
>>
>> <input type=hidden name=a1 value=b1>
>> <input type=text name=a2>
>> <input type=submit name=a3 value=b3>
>> <select name=a4>
>>
>> I don't need any other tags.
>>
>> From the hidden and submit types, I need name and value. From the text
>> and select types, I just need the name.
>>
>> I can assume the attributes will always show up in this order, but there
>> may be other things between the < and > delimiters. Additionally, the
>> actual type and name may have single or double quotes around them, or
>> neither.
>>
>> Does anyone have some code for this? It doesn't have to be all one regex.
>
> alright, jer. let's see what we can do...
>
> here's an eyeballed attempt:
>
> <(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
>
> to keep it easier, i'd think about using that to get your general matches.
> iterating through those, i'd apply another regex to break out the name,
> type, and value. you could very well catch it all in the above, however,
> it's not as straightforward and hence, not easily maintained. if you need
> additional help on writing this, let me know. i'll psuedo-code the whole
> enchillada if you want. this should be sufficient in getting only those tags
> you listed above...which is a good start.
>
> btw, make the seach caseINsensitive.
>
>
>
Hi, Steve,
Yep, it's a start. Some problems (output below), but I think it will
get me a little farther.
And you're right, I already gave up on getting everything in one pass.
I was thinking of trying to just get everything for a single element
type (i.e. all <input type=text ...> elements), but this gives me
another idea, also.
And the output from the first try:
Array
(
[0] => Array
(
[0] => <select n
[1] => <select n
[2] => <select n
)
[1] => Array
(
[0] => select n
[1] => select n
[2] => select n
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
[3] => Array
(
[0] =>
[1] =>
[2] =>
)
[4] => Array
(
[0] =>
[1] =>
[2] =>
)
)
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|