|
Posted by Toby A Inkster on 07/22/07 16:06
Jerry Stuckle wrote:
> Toby A Inkster wrote:
>
>> As it happens, it's well-established that Word documents are basically
>> just a memory dump of what Word holds in its memory.
>
> It has? By whom? Certainly not from Microsoft. Where did you find that?
It's been raised in various interviews I've read with OpenOffice.org
developers. Can't find many detailed references to it right now, but, the
following links back me up to an extent:
http://www.openoffice.org/servlets/ReadMsg?list=users&msgNo=54027
http://www.oooforum.org/forum/viewtopic.phtml?t=38703
Chris Pratley (a fairly senior MS Office developer) wrote a few articles
in April 2004 that seemed to nod in that direction too. You should be able
to Google for his blog.
>> That said, most other word processing document formats are different from
>> the way the document is held in memory.
>
> And why would they be so much different than Word? Maybe Word doesn't
> do it what way, either?
Recent word processors (MS products excepted) tend to default to saving in
open, interchangeable formats, such as RTF, ODF and so forth. Assuming
that recent word processors are not all based on the same codebase
(they're not), then it follows that their internal structures cannot all
match up to one of these file formats.
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 31 days, 18:22.]
Parsing an HTML Table with PEAR's XML_HTTPSax3
http://tobyinkster.co.uk/blog/2007/07/20/html-table-parsing/
[Back to original message]
|