Trolltech Home | Qt-interest Home | Recent Threads | All Threads | Author | Date
All threads index page 2

Qt-interest Archive, August 2006
Qt SAX Parser and large files - bad omen?


Message 1 in thread

Hello :-),
Iam just trying to migrate from Xerces-c to the Qt Sax Parser.
It seemed to be pretty straight forward until I tried to parse
some larger
files:

- 170 MB can be parsed as fast as by Xerces-c (handling is great
with QString- loving it :-)
- on 850 MB the parser seems to hang before he even encounters
the "startDocument"
- on 3.5 GB the parser aborts with an "unexpected eof"

Iam (hopefully) using the SAX parser and the "incremental feature"
to parse (tried both with same effect).

Having seen the nice String-Handling in Qt I'd hate to go back to
xerces. Is there a convenient way to parse an xml file
irrespective of its size? Iam a bit surprised that size *does*
matter using SAX?

Best regards,

Peter

PS: Below the code which I use to configure the parser. "this" refers to a child class of QXmlDefaultHandler.


QXmlSimpleReader reader;
reader.setContentHandler(this);
reader.setErrorHandler(this);  
  
QFile file(xces_file);
QXmlInputSource xmlInputSource(&file);
  
if (!reader.parse(&xmlInputSource, true)) {
  cerr << "Parse failed" << endl;
  cerr << reader.errorHandler()->errorString().toLatin1().data() << endl;
 }

--
 [ signature omitted ] 

Message 2 in thread

Hi,

Rüdiger Gleim wrote:
> Iam just trying to migrate from Xerces-c to the Qt Sax Parser.
> It seemed to be pretty straight forward until I tried to parse
> some larger
> files:

Is this Qt3? I also came across a similar problem; basically you need to 
write a sub-class of QXmlInputSource that will actually load the input 
data as required.

Hope that helps,

Tim
// us
#include "chunkedXMLInputSource.h"

// qt
#include <qcstring.h>
#include <qdatastream.h>

static const unsigned int MAX_CHUNK_LENGTH = 524288;

ChunkedXMLInputSource::ChunkedXMLInputSource( QIODevice* dev )
    : m_io( dev )
{
    fetchData();
}

ChunkedXMLInputSource::~ChunkedXMLInputSource()
{
}

void ChunkedXMLInputSource::fetchData()
{
    if ( m_io->atEnd() )
    {
        setData( QByteArray() );
        return;
    }

    QByteArray data( MAX_CHUNK_LENGTH );
    QDataStream input( m_io );
    input.readRawBytes( data.data(), MAX_CHUNK_LENGTH );
    setData( data );
}
#ifndef CHUNKED_XML_INPUT_SOURCE_H
#define CHUNKED_XML_INPUT_SOURCE_H

// qt
#include <qxml.h>
#include <qiodevice.h>

class QFile;
class QTextStream;

class ChunkedXMLInputSource : public QXmlInputSource
{
public:
    ChunkedXMLInputSource( QIODevice* dev );
    ~ChunkedXMLInputSource();

    virtual void fetchData();

private:
    ChunkedXMLInputSource();
    ChunkedXMLInputSource( QFile& );
    ChunkedXMLInputSource( QTextStream& );

    QIODevice* m_io;
};

#endif // CHUNKED_XML_INPUT_SOURCE_H

Message 3 in thread

Hello Tim,
thanks for you quick reply. Iam using the current stable version 4.1.4. Hm... I think I get it- using the XmlInputSource as I did:

QFile file(xces_file);
QXmlInputSource xmlInputSource(&file);

results in the attempt to load the file in one piece...? So the child class I have to implement has the job to read the input data and feed it to the reader manually? Hm- it even seems to have some advantages to be more in control, one could even ad hoc manipulate the input data- however Iam a bit confused that there is no standard implementation for that %).

I'll try that- thank you for the advice!

Best regards,

Rüdiger

--
 [ signature omitted ] 

Message 4 in thread

Hi,

Rüdiger Gleim wrote:
> thanks for you quick reply. Iam using the current stable version
> 4.1.4. Hm... I think I get it- using the XmlInputSource as I did:
> 
> QFile file(xces_file); QXmlInputSource xmlInputSource(&file);
> 
> results in the attempt to load the file in one piece...? 

It certainly used to be the case in Qt3; the entire document would be 
loaded into a single QString, and would usually fall over with a bad 
alloc exception around 80MB (at least it did for me at the time...).

Tim

--
 [ signature omitted ]