|
Posted by Robin Vickery on 10/25/05 15:23
On 10/24/05, Manuel Lemos <mlemos@acm.org> wrote:
> on 10/23/2005 07:21 PM Robin Vickery said the following:
> >>
> >>> ... would it not make sense for there to be a BUILT-IN PHP function of
> >>> a TRUE email syntactic validation?
> >> I don't see that being much better than passing a good regular
> >> expression to preg_match.
> >
> > 1. Technically you can't write a regular expression that matches
> > *all* valid email addresses as part of the address specification is
> > recursive.
> >
> > ccontent = ctext / quoted-pair / comment
> > comment = "(" *([FWS] ccontent) [FWS] ")"
> >
> > Admittedly 99.99% of people don't even know you *can* comment email
> > addresses so it's not a huge problem...
>
> If I am not mistaken, PCRE supports recursive regular expressions.
I'm afraid not. You can hack recursion in Perl with the (??{ })
postponed expression construct. But PCRE doesn't support it.
Without recursion, the best you can do is decide on a reasonable depth
of nested comments and hardcode that.
> Anyway, the way I got the RFC that is not quite the form of an address
> but the way it may be presented in message header. Meaning, you can add
> comments in To: or other e-mail header but in reality the comments are
> not part of the address.
I'm not sure exactly what you mean here. It's true that comments don't
affect how mail gets delivered, but they're very definitely part of
the address and may well have a meaning to the recipient that you
can't predict. They could be using it for anti-spam or to disinguish
between users of a mailbox or... well, anything really.
Which is the reason that RFC-2821 recommends that they be passed to
the recipient unchanged.
> > 2. Very few people seem to be capable of recognising a *good* regular
> > expression, let alone writing one. It seems clear that validating an
> > address is a task that many people want to do, but few can do
> > properly. I'd say that's a good reason for making it a built-in
> > function.
>
> Yes. What I meant is that just copying a good enough regular expression
> would be sufficient to use it. There is no need to understand it.
I had a quick look through my email last night and found 14 different
email validation regular expressions posted to this list in the last
few months. All of them would falsely reject valid addresses even
without taking comments into account. 6 of them wouldn't even allow
"judy.o'grady@example.com" and another 3 would reject mail from the
entire .museum TLD.
What that would indicate to me, is that many people can't even
recognise what a "good enough" regular expression looks like.
> What I meant is that despite I use that regular expression for many
> years without complaints, it could be improved to reject only invalid
> characters, but of course that is not what that expression does.
Possibly because those whose email addresses it rejected couldn't
contact you to complain? :-)
Actually, I have very little problem with your regexp - I'd like it to
handle domain literals, as they can be useful in communicating with
people with broken DNS. But that's about it.
-robin
Navigation:
[Reply to this message]
|