|
Posted by Vladimir Ghetau on 01/19/08 08:18
Hi guys,
This is a weird problem, and I'm not sure if I got it right.
Just a practical example, that will describe my problem:
I'm connecting to google.com host on port 80 using fsock open, and I
send a regular GET header without any specific HTTP headers regarding
the type of encoding accepted, cookies, accepted charset, conditional
headers etc
What happens, is after sending the headers to this stream opened using
fsockopen, I start grabbing the headers, and then, comes the body of
the web page, everything seems logic until this point.
The problem is, just after the headers are received, the body of the
page, contains few odd alphanumeric values , about 4 elements in
length, and it seems it's a hexa value. e.g.. 2A, or two values
maybe: 8c9d... then comes the regular HTML code of the page if any.
At the end of the grabbed content, there's also one of these
alphanumeric groups, or a "0" (zero).
For some reason I tend to believe the characters right after the
headers are sent are used by browsers to identify the type of the
encoding of the stream? e.g. bytes that decide that my page is going
to come as UTF-8 encoding?
Anyways, the problem is, how to make sure I get the page right, and
why the file_Get_contents (url_goes_here) doesn't grab those
alphanumeric characters, considering they're stripping the returned
headers of the request already.
I am still thinking it's some sort of "stream's first byte" that
informs the app about the encoding of the content, but I'm here to
hear your input and solution on this.
Thank you,
Vladimir Ghetau
http://www.Vladimirated.com/
[Back to original message]
|