Qt-interest Archive, August 2006
Qt SAX Parser and large files - bad omen?
Message 1 in thread
Hello :-),
Iam just trying to migrate from Xerces-c to the Qt Sax Parser.
It seemed to be pretty straight forward until I tried to parse
some larger
files:
- 170 MB can be parsed as fast as by Xerces-c (handling is great
with QString- loving it :-)
- on 850 MB the parser seems to hang before he even encounters
the "startDocument"
- on 3.5 GB the parser aborts with an "unexpected eof"
Iam (hopefully) using the SAX parser and the "incremental feature"
to parse (tried both with same effect).
Having seen the nice String-Handling in Qt I'd hate to go back to
xerces. Is there a convenient way to parse an xml file
irrespective of its size? Iam a bit surprised that size *does*
matter using SAX?
Best regards,
Peter
PS: Below the code which I use to configure the parser. "this" refers to a child class of QXmlDefaultHandler.
QXmlSimpleReader reader;
reader.setContentHandler(this);
reader.setErrorHandler(this);
QFile file(xces_file);
QXmlInputSource xmlInputSource(&file);
if (!reader.parse(&xmlInputSource, true)) {
cerr << "Parse failed" << endl;
cerr << reader.errorHandler()->errorString().toLatin1().data() << endl;
}
--
[ signature omitted ]
Message 2 in thread
Hi,
Rüdiger Gleim wrote:
> Iam just trying to migrate from Xerces-c to the Qt Sax Parser.
> It seemed to be pretty straight forward until I tried to parse
> some larger
> files:
Is this Qt3? I also came across a similar problem; basically you need to
write a sub-class of QXmlInputSource that will actually load the input
data as required.
Hope that helps,
Tim
// us
#include "chunkedXMLInputSource.h"
// qt
#include <qcstring.h>
#include <qdatastream.h>
static const unsigned int MAX_CHUNK_LENGTH = 524288;
ChunkedXMLInputSource::ChunkedXMLInputSource( QIODevice* dev )
: m_io( dev )
{
fetchData();
}
ChunkedXMLInputSource::~ChunkedXMLInputSource()
{
}
void ChunkedXMLInputSource::fetchData()
{
if ( m_io->atEnd() )
{
setData( QByteArray() );
return;
}
QByteArray data( MAX_CHUNK_LENGTH );
QDataStream input( m_io );
input.readRawBytes( data.data(), MAX_CHUNK_LENGTH );
setData( data );
}
#ifndef CHUNKED_XML_INPUT_SOURCE_H
#define CHUNKED_XML_INPUT_SOURCE_H
// qt
#include <qxml.h>
#include <qiodevice.h>
class QFile;
class QTextStream;
class ChunkedXMLInputSource : public QXmlInputSource
{
public:
ChunkedXMLInputSource( QIODevice* dev );
~ChunkedXMLInputSource();
virtual void fetchData();
private:
ChunkedXMLInputSource();
ChunkedXMLInputSource( QFile& );
ChunkedXMLInputSource( QTextStream& );
QIODevice* m_io;
};
#endif // CHUNKED_XML_INPUT_SOURCE_H
Message 3 in thread
Hello Tim,
thanks for you quick reply. Iam using the current stable version 4.1.4. Hm... I think I get it- using the XmlInputSource as I did:
QFile file(xces_file);
QXmlInputSource xmlInputSource(&file);
results in the attempt to load the file in one piece...? So the child class I have to implement has the job to read the input data and feed it to the reader manually? Hm- it even seems to have some advantages to be more in control, one could even ad hoc manipulate the input data- however Iam a bit confused that there is no standard implementation for that %).
I'll try that- thank you for the advice!
Best regards,
Rüdiger
--
[ signature omitted ]
Message 4 in thread
Hi,
Rüdiger Gleim wrote:
> thanks for you quick reply. Iam using the current stable version
> 4.1.4. Hm... I think I get it- using the XmlInputSource as I did:
>
> QFile file(xces_file); QXmlInputSource xmlInputSource(&file);
>
> results in the attempt to load the file in one piece...?
It certainly used to be the case in Qt3; the entire document would be
loaded into a single QString, and would usually fall over with a bad
alloc exception around 80MB (at least it did for me at the time...).
Tim
--
[ signature omitted ]