Trolltech Home | Qt-interest Home | Recent Threads | All Threads | Author | Date
All threads index page 1

Qt-interest Archive, February 2008
Re: Parse a huge XML file


Message 1 in thread

OK, I think I will forget this tag index idea.

However, I am still very interested by a way to decompress a gz stream "on
the fly" to feed QXmlStreamReader. By now I receive the 5GB by HTTP, store
it in a file, decompress it in a file (60GB) and parse it. A all-in-one
process would be very appreciated... Maybe I can pipe through a QProcess
calling gunzip but this is not a satisfactory solution.

Thanks,

Etienne

Message 2 in thread

On Fri, February 1, 2008 11:03, Etienne Sandré wrote:
> OK, I think I will forget this tag index idea.
>
> However, I am still very interested by a way to decompress a gz stream "on
> the fly" to feed QXmlStreamReader. By now I receive the 5GB by HTTP, store
> it in a file, decompress it in a file (60GB) and parse it. A all-in-one
> process would be very appreciated... Maybe I can pipe through a QProcess
> calling gunzip but this is not a satisfactory solution.
Why not, that's a _good_ solution!

Another possibility would be to use "minizip" for example. I use it in one
of my Qt projects (http://www.mameworld.net/mamecat) and it's fast and good
enough even for these interactive purposes (which means it should be fine
for you as well :):

http://www.winimage.com/zLibDll/minizip.html

HTH, René
-- 
 [ signature omitted ] 

Message 3 in thread

On Fri, February 1, 2008 11:23, R. Reucher wrote:
> On Fri, February 1, 2008 11:03, Etienne Sandré wrote:
>> OK, I think I will forget this tag index idea.
>>
>> However, I am still very interested by a way to decompress a gz stream
>> "on
>> the fly" to feed QXmlStreamReader. By now I receive the 5GB by HTTP, store
>>  it in a file, decompress it in a file (60GB) and parse it. A all-in-one
>> process would be very appreciated... Maybe I can pipe through a QProcess
>> calling gunzip but this is not a satisfactory solution.
> Why not, that's a _good_ solution!
Sorry, not clear enough... what I meant was to us a "pipe" through QProcess!

http://doc.trolltech.com/4.3/qprocess.html#readyReadStandardOutput

René
-- 
 [ signature omitted ] 

Message 4 in thread

That's a good solution for unix apps, but I would like Windows and Mac users
to use it as well. This would require to include an executable with the
application package, and to check for the executable name (gzip, gzip.exe,
etc..)

Regards,

Etienne

2008/2/1, R. Reucher <rene.reucher@xxxxxxxxxxxxx>:
>
> On Fri, February 1, 2008 11:23, R. Reucher wrote:
> > On Fri, February 1, 2008 11:03, Etienne SandrÃ(c) wrote:
> >> OK, I think I will forget this tag index idea.
> >>
> >> However, I am still very interested by a way to decompress a gz stream
> >> "on
> >> the fly" to feed QXmlStreamReader. By now I receive the 5GB by HTTP,
> store
> >>  it in a file, decompress it in a file (60GB) and parse it. A
> all-in-one
> >> process would be very appreciated... Maybe I can pipe through a
> QProcess
> >> calling gunzip but this is not a satisfactory solution.
> > Why not, that's a _good_ solution!
> Sorry, not clear enough... what I meant was to us a "pipe" through
> QProcess!
>
> http://doc.trolltech.com/4.3/qprocess.html#readyReadStandardOutput
>
> RenÃ
> --
> Renà Reucher
> Tel: +49 160 7115802
> FAX: +49 6359 205423
> rene.reucher@xxxxxxxxxxxxx
> http://www.batcom-it.net/
>
> --
> To unsubscribe - send a mail to qt-interest-request@xxxxxxxxxxxxx with
> "unsubscribe" in the subject or the body.
> List archive and information: http://lists.trolltech.com/qt-interest/
>
>

Message 5 in thread

Hi,

Etienne Sandrà wrote:
> That's a good solution for unix apps, but I would like Windows and Mac 
> users to use it as well. This would require to include an executable 
> with the application package, and to check for the executable name 
> (gzip, gzip.exe, etc..)

I'm not sure if anyone mentioned this yet:

http://trolltech.com/products/qt/addon/solutions/catalog/4/Utilities/qtiocompressor/

Tim
----------------------------------------------------------------------
dr. t. dewhirst                                [t] +44 (0)1738 450 465
director                                       [w] www.bugless.co.uk
bugless software development ltd.

[a] algo business centre, glenearn road, perth, PH2 0NJ

--
 [ signature omitted ] 

Message 6 in thread

On Fri, February 1, 2008 13:41, Etienne Sandré wrote:
> That's a good solution for unix apps, but I would like Windows and Mac
> users to use it as well. This would require to include an executable with
> the application package, and to check for the executable name (gzip,
> gzip.exe, etc..)
I don't see your point, but anyway, then use my other suggestion. Minizip
works under Windows as well. You can integrate and freely redistribute it
with your app - as long as it's an open source project, that is.

Regards, René
-- 
 [ signature omitted ] 

Message 7 in thread

Maybe you can use QtIOCompressor : 
 The class works on top of a QIODevice subclass, compressing data before it is written and decompressing it when it is read. Since QtIOCompressor works on streams, it does not have to see the entire data set before compressing or decompressing it. This can reduce the memory requirements when working on large data sets.
I plan to use it on networked XML streams... 

Julien. 

  -----Message d'origine-----
  De : etienne.sandre.chardonnal@xxxxxxxxx [mailto:etienne.sandre.chardonnal@xxxxxxxxx]De la part de Etienne SandrÃ
  Envoyà : vendredi 1 fÃvrier 2008 13:42
  Ã : qt-interest@xxxxxxxxxxxxx
  Objet : Re: Parse a huge XML file


  That's a good solution for unix apps, but I would like Windows and Mac users to use it as well. This would require to include an executable with the application package, and to check for the executable name (gzip, gzip.exe, etc..)

  Regards,

  Etienne


  2008/2/1, R. Reucher <rene.reucher@xxxxxxxxxxxxx>:
    On Fri, February 1, 2008 11:23, R. Reucher wrote:
    > On Fri, February 1, 2008 11:03, Etienne SandrÃÂ wrote:
    >> OK, I think I will forget this tag index idea.
    >>
    >> However, I am still very interested by a way to decompress a gz stream
    >> "on
    >> the fly" to feed QXmlStreamReader. By now I receive the 5GB by HTTP, store
    >>  it in a file, decompress it in a file (60GB) and parse it. A all-in-one
    >> process would be very appreciated... Maybe I can pipe through a QProcess
    >> calling gunzip but this is not a satisfactory solution.
    > Why not, that's a _good_ solution!
    Sorry, not clear enough... what I meant was to us a "pipe" through QProcess!

    http://doc.trolltech.com/4.3/qprocess.html#readyReadStandardOutput

    RenÃ
    --
    Renà Reucher
    Tel: +49 160 7115802
    FAX: +49 6359 205423
    rene.reucher@xxxxxxxxxxxxx
    http://www.batcom-it.net/

    --
    To unsubscribe - send a mail to qt-interest-request@xxxxxxxxxxxxx with "unsubscribe" in the subject or the body.
    List archive and information: http://lists.trolltech.com/qt-interest/




Message 8 in thread

Unfortunately:
"Only available for Qt Solutions license holder with a valid Support and 
Maintenance agreement (authentication required)."

Etienne

Julien MONAT wrote:
> Maybe you can use QtIOCompressor :
>  The class works on top of a QIODevice subclass, compressing data 
> before it is written and decompressing it when it is read. Since 
> QtIOCompressor works on streams, it does not have to see the entire 
> data set before compressing or decompressing it. This can reduce the 
> memory requirements when working on large data sets.
> I plan to use it on networked XML streams...
>  
> Julien.

--
 [ signature omitted ] 

Message 9 in thread

Hi Dimitri,
why don't you suggest the third way: 
http://doc.trolltech.com/4.3/qxmlstreamreader.html  ??

I understood reading the doc about this class that it's a solution to 
replace dom and faster than sax
is it right?

Veronique.

Dimitri a écrit :
> Hi,
>
>> Since this is too big to fit in memory, I want to build an index on 
>> it. Is there a way with Qt classes to parse XML data without having 
>> all document in memory, for instance with a non blocking parser that 
>> will send events at each xml tag he encounters, but without storing 
>> data incrementally in a QDomDocument?
>
> Use the SAX parser instead of DOM:
>     http://doc.trolltech.com/4.3/qtxml.html#the-qt-sax2-classes
>
> -- 
> Dimitri
>
> -- 
> To unsubscribe - send a mail to qt-interest-request@xxxxxxxxxxxxx with 
> "unsubscribe" in the subject or the body.
> List archive and information: http://lists.trolltech.com/qt-interest/
>
>

--
 [ signature omitted ]