Trolltech Home | Qt-interest Home | Recent Threads | All Threads | Author | Date
All threads index page 4

Qt-interest Archive, January 2008
Parse a huge XML file

Pages: Prev | 1 | 2 | Next

Message 16 in thread

Why? If I build an index of elements id vs character offset in a separate
file, I should be able to random acces an element given its id, no?

Another question : is there a class for uncompressing gzip in Qt?

Ideally, I would like a QIODevice subclass that takes as construct parameter
a QIODevice* delivering gzipped data. This class would then deliver
ungzipped data on read calls.

I did not find that either in Qt docs and on the web, is this existing
somewhere?

The objective is to receive gzipped data by http, uncompress it and feed it
into a Xml stream parser, all on the fly without having the whole stream
(neither compressed nor uncompressed) stored in memory.

Thanks,

Etienne

Message 17 in thread

On 1/30/08, Etienne Sandré <etienne.sandre@xxxxxxxxxxxxxxxxx> wrote:
> The objective is to receive gzipped data by http, uncompress it and feed it
> into a Xml stream parser, all on the fly without having the whole stream
> (neither compressed nor uncompressed) stored in memory.

...and you will have character offset in the uncompressed data...
Random access in gzipped file?! It's impossible...

--
 [ signature omitted ] 

Message 18 in thread

Yes, I know I will not be able to combine both. But I would like at least
one of both features, that would improve performance over the actual
solution (no indexing & random access, no on the fly decompression).

By the way, do QXmlStreamReader works with sequential QIODevices? (non
random devices)? This is not well documented.
 - I suppose it does for "one shot" parsing (no PrematureEndOfDocumentError)
 - I am not sure this is possible for retry-parsing (with
PrematureEndOfDocumentErrors, for instance when reading partial data such as
data coming from a network connection)

Thanks,

Etienne

Message 19 in thread

On Jan 30, 2008 10:39 AM, Etienne Sandré <etienne.sandre@xxxxxxxxxxxxxxxxx>
wrote:

> Yes, I know I will not be able to combine both. But I would like at least
> one of both features, that would improve performance over the actual
> solution (no indexing & random access, no on the fly decompression).
>
> By the way, do QXmlStreamReader works with sequential QIODevices? (non
> random devices)? This is not well documented.
>  - I suppose it does for "one shot" parsing (no
> PrematureEndOfDocumentError)
>  - I am not sure this is possible for retry-parsing (with
> PrematureEndOfDocumentErrors, for instance when reading partial data such as
> data coming from a network connection)
>

Actually, this is the example used in the documentation. See the
"Incremental Parsing" section of the QXmlStreamReader's summary. Also, the
documentation for readNext() says that if you call it after getting
PrematureEndOfDocumentError, then it will try reading again.

Tom

Message 20 in thread

Etienne Sandré wrote:
>By the way, do QXmlStreamReader works with sequential QIODevices? (non
>random devices)? This is not well documented.

It should work now on Qt 4.4. Before, there was a problem with QIODevices. 
We have now documented the behaviour of the return value of the read() 
calls.

-- 
 [ signature omitted ] 

Attachment: signature.asc
Description: This is a digitally signed message part.


Message 21 in thread

Hi Etienne, 
 
You can certainly reposition the stream to the saved offset, but how do you tell the parser that the context has changed? For example: 
 
<top_element>
   <first_element>
   <second_element>
   <third_element>
</top_element>
 
If you reposition to the second element, and start the parser, it will think this is the top element, and should fail when it sees the third element. 
 
Unless the parser has been specifically written and documented to handle this, it may work, but you are then relying on 'undefined behaviour' which could easily break when you later upgrade to a different version. 
 
Tony Rietwyk. 
 

-----Original Message-----
From: etienne.sandre.chardonnal@xxxxxxxxx [mailto:etienne.sandre.chardonnal@xxxxxxxxx] On Behalf Of Etienne SandrÃ
Sent: Wednesday, 30 January 2008 22:51
To: Tony Rietwyk
Subject: Re: Parse a huge XML file


Why? If I build an index of elements id vs character offset in a separate file, I should be able to random acces an element given its id, no?

Another question : is there a class for uncompressing gzip in Qt?

Ideally, I would like a QIODevice subclass that takes as construct parameter a QIODevice* delivering gzipped data. This class would then deliver ungzipped data on read calls.

I did not find that either in Qt docs and on the web, is this existing somewhere?

The objective is to receive gzipped data by http, uncompress it and feed it into a Xml stream parser, all on the fly without having the whole stream (neither compressed nor uncompressed) stored in memory.

Thanks,

Etienne