|
Posted by Steve on 01/25/07 04:59
"Taras_96" <taras.di@gmail.com> wrote in message
news:1169698495.332218.308200@v45g2000cwv.googlegroups.com...
| >
| > | This to me implies that the function need not know what the bytes
| > | represent, it operates on the data as a raw byte stream.
| >
| > that's correct.
| >
|
| Thus, by this definition, wouldn't strpos NOT be binary safe, since it
| needs to know something about what is represented by the raw byte
| stream? In particular, that the byte 0x00 represents the end of a
| string.
not really. that function does not rely on character encoding for the bytes
being interpreted. look at strcoll...it does, and for that reason (data
needs interpreting) it is not considered 'safe'.
| Going back to my example, say we pass in strpos('a','cat') with the
| strings encoded in UCS-2.
| So, in terms of bytes, strpos would be passed in 0x00 0x16 as the first
| parameter. Because the function imposes some meaning on specific bytes,
| in particular 0x00, the function would conclude that the first
| parameter was an empty string. Strpos can't blindly operate on the
| bytes it receives, it must interpret them to find the end of strings.
no...'00 16' (the letter 'a' in ucs-2) would be seen as ascii character 48
followed by another 48, followed by the asc char for a space followed by the
asc char for 1, etc. that's the literal string contents for 'a' in ucs-2. if
you searched that literal string for 'a', you would find nothing. if you
converted the string value of '00 16' from ucs-2 then you'd have the letter
'a'...and completely different search results. as for blindly searching for
\0, that's just not what is happening.
what is it that you're trying to do. perhaps i can give an example that will
work and clear up your questions at the same time.
| Compare this with say an array_join function, where the two parameters
| need to just melded together - no interpretation of the input byte
| sequences are needed whatsoever, you just need to join the two
| together! This to me seems a more correct view of operating on the byte
| stream.
yes...if that were what php were doing behind the scenes with strings.
first, there is no byte 'steam'. there is an array in memory in both cases.
concatenating a string is done the same way as any other array is with join.
| > strpos would recognize '00' as two characters of a string...not as one
| > individual byte equal to \0. this is where your en/decoding comes into
play.
|
| I would have thought if you passed in the byte sequence '0x001600' (the
| null terminated string 'a' encoded in UCS-2) strpos would *not*
| recognise the first byte as the characters 00 - this would be encoded
| as (well, in ASCII anyway) '0x0303'.
your first example and this one, as far as strings go, are completely
different. they both, however, are interpreted one character at a time. in
this case, a 0 followed by an x, two more zeros, a 1, a 6, then two more
zeros. the string has no particular meaning. php does not know that it is a
particular encoded represenation of data (such as ucs-2). you could likewise
represent 0x001600 in octal format and php would be equally unaware of the
string's particular meaning.
this is why you must somehow tell php that a string is to be interpreted a
certain way...such that the value would then become (or be seen as) the
letter 'a'. make sense?
| > | This post:
| >
|http://groups.google.com/group/php.general/browse_thread/thread/c401d...
| > | offers a different definition, which doesn't make much sense to me.
| >
| > not different at all. it doesn't seem that either sources have made much
| > sense to you (not trying to be rude).
| >
|
| No offense taken - if they made much sense I wouldn't be posting
i know, but given this medium of communication and my rather abrupt style of
q&a, i often misrepresent my intentions. ;^)
Navigation:
[Reply to this message]
|