|
Posted by Bent Stigsen on 03/14/06 04:21
greg wrote:
>>AFAIK it depends on what kind of file it is. Not sure, but ascii are
>>txt, csv, html etc, binary are images, mp3's etc.
>>Correct me if i'm wrong.
>>
>
> surely, but this means I must think of all the possible file extension
> decide whether it's ascii or binary.
> it seems to be limited, but thx anyway.
In a sense he is right, it is not really straightforward to make the
distinction, if you strictly mean the ascii character set.
Binary just means that it consists of binary patterns or sequence of
bits, varied in length and meaning. The content of a binary file only
makes sense to an application which knows what the sequence of bits
means. When a file is viewed in a text-editor, then the data is
(possibly mistakenly) chopped up in 8-bits (or whatever), and the
corresponding symbol of that value is displayed, which may or may not
make any sense at all. Strictly speaking, the only difference between
ascii and non-ascii would be whether or not each chunk of bits is
*intended* to correspond to a specific symbol in the Ascii character
table.
If you by ascii generally mean plain readable/printable text, not
necessarilly limited to ascii, then there is tools that could help you.
http://dk2.php.net/mime_content_type
http://pecl.php.net/package/fileinfo
If you are on a linux/unix, check:
http://www.freebsd.org/cgi/man.cgi?query=file
You could just ignore the subtype, and only distinguish on mediatype
between text and everything else.
/Bent
[Back to original message]
|