|
Posted by SpaceGirl on 12/25/61 11:26
Roy Schestowitz wrote:
> __/ [Alan J. Flavell] on Sunday 11 September 2005 11:19 \__
>
>
>>On Sun, 11 Sep 2005, SpaceGirl wrote:
>>
>>
>>>Alan J. Flavell wrote:
>>
>>[comprehensive quote of my posting, without apparently having anything
>>relevant to say about it.]
>>
>>
>>>Word XP and upwards stores its documents in XML format doesn't it?
>>
>>So what? XML is only a format for defining markup. If the markup
>>doesn't do anything meaningful (specifically - if it only creates a
>>visual result on a printed page, without having any significant
>>structure) then it's not going to turn into effective HTML: it'd just
>>be the usual garbage in / garbage out that we're accustomed to with
>>Word conversions to soi-disant "web" format.
Word documents, being style based, are easy to convert. Use XSLT to
strip out all the crap so that all you end up with is basic HTML - <p>'s
and <h>'s. I wasn't suggested that anything more complicated that that
should be attempted - but I HAVE seen it done pretty successfully with
Word 2003 files. In the case of that client (although I wasn't part of
the team who wrote those tools), their customers would submit Word
documents and the XSLT would convert them into both HTML and PDFs, and
the reproduction was almost perfect (styling and colours anyway).
>>>You could probably write your own XSLT to turn in into HTML fairly
>>>easily.
>>
>>There seems to be some kind of conceptual disconnect here. Most Word
>>documents (in my experience) simply don't contain the necessary
>>structure for useful conversion to HTML: they've been created as a
>>purely visual construction for printing onto paper. It's irrelevant
>>what underlying technology you use (RTF, XML, SGML, whatever) - the
>>problem is that the source material simply does not represent the
>>needed structures, *because the document authors do not put it there*.
That wasn't what I saw, but like I said I wasn't on that team. As far as
I could tell they wrote a simple parser.
>>You might as well try to convert cheese into fresh cream: both are
>>fine milk products, it's true, but instead of trying to convert the
>>one into the other, you'd do better to produce them both starting from
>>fresh milk. And the kind of "fresh milk" that's needed here is
>>logically structured text markup. Not visual formatting. Until the
>>authors of Word documents can grasp that, the prospects for conversion
>>of Word to web formats are poor, IMHO.
Strange, as I've never had a problem. Generally I have to do it in a
sort of round-robin of programs; First save your Word documents as PDF,
then save the PDF as a web page. It works just fine.
<snip stuff I cant be bothered to read, seeing as everyone else is being
so fucking rude>
--
x theSpaceGirl (miranda)
# lead designer @ http://www.dhnewmedia.com #
# remove NO SPAM to email, or use form on website #
# this post (c) Miranda Thomas 2005
# explicitly no permission given to Forum4Designers
# to duplicate this post.
Navigation:
[Reply to this message]
|