|
Posted by Bryan Oakley on 11/25/06 19:04
comp.lang.tcl wrote:
> ... I do not understand XML parsing at all, DOM, SAX, xQuery,
> X[whatever else], none of it makes sense to me and I cannot understand
> the books, tutorials and online guides (not to mention the tips from
> far smarter people than I) that has barraged me; all of it makes
> absolutely no sense to me.
>
> At this point the only way I can parse XML is using PHP, because that's
> literally the ONLY way I can do it, period!
>
> But if you want to see a sample of the XML I'm working with, here it
> is:
>
> <?xml version="1.0" encoding="utf-8" ?><trivia><entry id="1101"
> triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
> answerID="1" correctAnswerID="1" answer="Believer"
> expDate="1139634000"></entry><entry id="1102" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="2"
> correctAnswerID="1" answer="Saviour Machine"
> expDate="1139634000"></entry><entry id="1103" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="3"
> correctAnswerID="1" answer="Seventh Avenue"
> expDate="1139634000"></entry><entry id="1104" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="4"
> correctAnswerID="1" answer="Inevitable End"
> expDate="1139634000"></entry><entry id="1105" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="5"
> correctAnswerID="1" answer="No such song existed"
> expDate="1139634000"></entry>
That data is mal-formed XML. For example, you are missing the closing
</trivia> tag.
Here's a solution that works with the above data. I've mentioned the
"xml2list" proc a couple of times, but with the sample data I see your
data will need a little extra pre-processing.
Step 1: copy the proc "xml2list" from this page: http://mini.net/tcl/3919
Second, enter the following, which is taking the above data verbatim and
storing it in a variable:
set data {<?xml version="1.0" encoding="utf-8" ?><trivia><entry
id="1101"
triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
answerID="1" correctAnswerID="1" answer="Believer"
expDate="1139634000"></entry><entry id="1102" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="2"
correctAnswerID="1" answer="Saviour Machine"
expDate="1139634000"></entry><entry id="1103" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="3"
correctAnswerID="1" answer="Seventh Avenue"
expDate="1139634000"></entry><entry id="1104" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="4"
correctAnswerID="1" answer="Inevitable End"
expDate="1139634000"></entry><entry id="1105" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="5"
correctAnswerID="1" answer="No such song existed"
expDate="1139634000"></entry>}
Your data is missing an ending </trivia> tag, so we have to add it for
this specific example. I don't know if this is a problem you'll have to
solve with your full dataset. Also, the xml2list proc doesn't like the
leading <?xml...> stuff. So, let's modify your data:
# remove the leading <?xml...?> data
regexp {<\?.*?\?>(.*$)} $data -- data
# add a trailing </trivia> which is missing from
# the sample data
set data "$data</trivia>"
And now, convert it to a list and print it out:
set result [xml2list $data]
puts $result
If you didn't introduce any typos, you'll get the following output:
trivia {} {{entry {id 1101 triviaID 233 question {Who wrote
"Trilogy of Knowledge"?} answerID 1 correctAnswerID 1 answer
Believer expDate 1139634000} {}} {entry {id 1102 triviaID 233 question
{Who wrote "Trilogy of Knowledge"?} answerID 2 correctAnswerID
1 answer {Saviour Machine} expDate 1139634000} {}} {entry {id 1103
triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
answerID 3 correctAnswerID 1 answer {Seventh Avenue} expDate 1139634000}
{}} {entry {id 1104 triviaID 233 question {Who wrote "Trilogy of
Knowledge"?} answerID 4 correctAnswerID 1 answer {Inevitable End}
expDate 1139634000} {}} {entry {id 1105 triviaID 233 question {Who wrote
"Trilogy of Knowledge"?} answerID 5 correctAnswerID 1 answer
{No such song existed} expDate 1139634000} {}}}
The above is a valid tcl list that you can now process with normal tcl
list-handling commands. Do *not* process this list with string
transformations (such as converting " to a quote). If you do, you
run the risk of breaking it's list-ness. Instead, loop over the data and
do the conversion as a final step on a element-by-element basis.
Does this help? It's not robust; the xml2list assumes you have proper
xml with a balanced set of tags (or in the specific case in this
message, with a missing </trivia> tag). Hopefully, though, it will at
least get you started.
Navigation:
[Reply to this message]
|