Trolltech Home | Qt-interest Home | Recent Threads | All Threads | Author | Date
All threads index page 3

Qt-interest Archive, December 2006
QT XML with DOM needs huge amounts of memory

Pages: Prev | 1 | 2 | Next

Message 1 in thread

My problem: I want to parse a large XML file (50 MB) with Qt's XML classes.
I want to do it with DOM, because of the simplicity. I DO know that this is
more memory exhaustive compared to sax2. But however, after parsing the
memory should be freed, right? This does not happen, however. Did I
overlook something? I use only local variables, no new() operator at all.

Arne

-- 
 [ signature omitted ] 

Message 2 in thread

Minimal example:

=============== SNIP =========================
#include <iostream>

#include <QDomDocument>
#include <QFile>
#include <QString>

int
main()
{
    {
        QString in_fname("big.xml");

        // Read in document:
        QDomDocument doc("mySceneDocument");
        QFile file(in_fname);
        if (!file.open(QIODevice::ReadOnly)) {
            std::cout << "Error: could not open file "<< std::endl;
            return -1;
        }
        if (!doc.setContent(&file)) {
            std::cout << "Error: could not read file " << std::endl;
            file.close();
            return -1;
        }
        file.close();
    }

    int i;
    std::cout << "Press return to continue" << std::endl;
    std::cin >> i;
    
    return 0;
}
=============== SNIP =========================

When using a 50 MB big.xml, my Qt eats up 2 GB of RAM! And worst part is:
outside of the inner {} scope, this memory is never released! Why is this
so?

Arne

-- 
 [ signature omitted ] 

Message 3 in thread

Arne Schmitz schrieb:
> Minimal example:
> ...
> When using a 50 MB big.xml, my Qt eats up 2 GB of RAM! And worst part is:
> outside of the inner {} scope, this memory is never released! Why is this
> so?


Maybe your compiler "optimizes away" the inner scope and hence the
memory is only released at program end?

Try explicitly allocating the QDomDocument with new and delete it
explicitly before program end. Maybe that makes a difference.

Cheers, Oliver

--
 [ signature omitted ] 

Message 4 in thread

Till Oliver Knoll wrote:

> Arne Schmitz schrieb:
>> Minimal example:
>> ...
>> When using a 50 MB big.xml, my Qt eats up 2 GB of RAM! And worst part is:
>> outside of the inner {} scope, this memory is never released! Why is this
>> so?
> 
> 
> Maybe your compiler "optimizes away" the inner scope and hence the
> memory is only released at program end?
> 
> Try explicitly allocating the QDomDocument with new and delete it
> explicitly before program end. Maybe that makes a difference.

I will try that, BUT that happens even in debug mode with optimizations off!
Very peculiar. I doubt that an explicit delete will help.

Arne

-- 
 [ signature omitted ] 

Message 5 in thread

> And worst part is: outside of the inner {} scope,
> this memory is never released! Why is this so?

I imagine it is never released because your malloc/free()
implementation doesn't release memory back to the operating system.
And there will be a lot of malloc inside QDom even if you allocate the
nodes on the stack.

I hope this email wasn't redundant, but it seemed to me nobody
answered that part of your question..

Max

--
 [ signature omitted ] 

Message 6 in thread

Max Howell wrote:

>> And worst part is: outside of the inner {} scope,
>> this memory is never released! Why is this so?
> 
> I imagine it is never released because your malloc/free()
> implementation doesn't release memory back to the operating system.
> And there will be a lot of malloc inside QDom even if you allocate the
> nodes on the stack.
> 
> I hope this email wasn't redundant, but it seemed to me nobody
> answered that part of your question..

That is a good hint. I have to check if g++ / glibc behaves that way. I
would be surprised if it did, but maybe...

Arne

-- 
 [ signature omitted ] 

Message 7 in thread

On 12.12.06 16:36:29, Arne Schmitz wrote:
> My problem: I want to parse a large XML file (50 MB) with Qt's XML classes.
> I want to do it with DOM, because of the simplicity. I DO know that this is
> more memory exhaustive compared to sax2. But however, after parsing the
> memory should be freed, right?

No, only after you destroyed the document. I really suggest to use SAX,
its not that hard and you really only need DOM if you want to change
sth. in the document.

Andreas

-- 
 [ signature omitted ] 

Message 8 in thread

Andreas Pakulat wrote:

> On 12.12.06 16:36:29, Arne Schmitz wrote:
>> My problem: I want to parse a large XML file (50 MB) with Qt's XML
>> classes. I want to do it with DOM, because of the simplicity. I DO know
>> that this is more memory exhaustive compared to sax2. But however, after
>> parsing the memory should be freed, right?
> 
> No, only after you destroyed the document. I really suggest to use SAX,
> its not that hard and you really only need DOM if you want to change
> sth. in the document.

Yes, I have already written some SAX parsers. But it is not as elegant as
DOM. I just cannot imagine that Qt need 40x the memory the XML file takes!
After all, I only need a tree representation of the XML file. This
shouldn't take a factor of 40.

Arne

-- 
 [ signature omitted ] 

Message 9 in thread

On 12.12.06 19:28:52, Arne Schmitz wrote:
> Andreas Pakulat wrote:
> 
> > On 12.12.06 16:36:29, Arne Schmitz wrote:
> >> My problem: I want to parse a large XML file (50 MB) with Qt's XML
> >> classes. I want to do it with DOM, because of the simplicity. I DO know
> >> that this is more memory exhaustive compared to sax2. But however, after
> >> parsing the memory should be freed, right?
> > 
> > No, only after you destroyed the document. I really suggest to use SAX,
> > its not that hard and you really only need DOM if you want to change
> > sth. in the document.
> 
> Yes, I have already written some SAX parsers. But it is not as elegant as
> DOM. I just cannot imagine that Qt need 40x the memory the XML file takes!
> After all, I only need a tree representation of the XML file. This
> shouldn't take a factor of 40.

Yes it does, as does any other DOM implementation. For example an
element <foobar> with a child element <barfoo> needs 12 bytes already
for the unicode-element-name (not counting any house-keeping variables
for QString), an empty parent node-pointer, an empty attribute list, an
empty child node list.

Thats why one normally doesn't use DOM if the input data is large, if
you only need a tree of elements, thats really easy with SAX to do.

Andreas

-- 
 [ signature omitted ] 

Message 10 in thread

At 12:42 PM 12/12/2006, Andreas Pakulat wrote:
>Yes it does, as does any other DOM implementation. For example an
>element <foobar> with a child element <barfoo> needs 12 bytes already
>for the unicode-element-name (not counting any house-keeping variables
>for QString), an empty parent node-pointer, an empty attribute list, an
>empty child node list.

Why does it need 12 bytes for the name, instead of 4 bytes of index in to a table of tag names? And even with all that, that's less than a factor of 10, which is a lot less than a factor of 40. 

--
 [ signature omitted ] 

Message 11 in thread

On 12.12.06 15:24:14, Alan M. Carroll wrote:
> At 12:42 PM 12/12/2006, Andreas Pakulat wrote:
> >Yes it does, as does any other DOM implementation. For example an
> >element <foobar> with a child element <barfoo> needs 12 bytes already
> >for the unicode-element-name (not counting any house-keeping variables
> >for QString), an empty parent node-pointer, an empty attribute list, an
> >empty child node list.
> 
> Why does it need 12 bytes for the name, instead of 4 bytes of index in to a table of tag names? And even with all that, that's less than a factor of 10, which is a lot less than a factor of 40. 

That was just an example why it takes much more memory than the "string"
form. If you want the real numbers look up the source code of QDom*.

Andreas

-- 
 [ signature omitted ] 

Message 12 in thread

Do not be surprised. Few years ago I opened an 3mb xml file with IE5 and it
used about 90mb of memory! At that time I was astonished and I thought what
a piece of s**t. Nevertheless it is still is.

Lots of small QString objects consume a lot of memory. On top of that if you
add the unicode 16 bit representation used by QString then the memory is
doubled. If you really need the whole thing in the memory you have to look
at customized memory pools (check out boost::pool). It is a well known fact
that lots of small memory allocations using the new operator lead to a huge
space waste.

Did you try reading the whole file in a QString and then have it parsed? You
might be surprised but it might save memory because the QString objects will
point to chunks from the original string instead of allocating their own
image of the substring.

If that doesn't work then my suggestion is to read the whole file in the
memory, use SAX and create your objects using a memory pool (check out Loki
as well).

I hope this helps.

B.C.

"Arne Schmitz" <arne.schmitz@xxxxxxx> wrote in message
news:elmi9d$unu$1@xxxxxxxxxxxxxxxxxxxxx
> My problem: I want to parse a large XML file (50 MB) with Qt's XML
classes.
> I want to do it with DOM, because of the simplicity. I DO know that this
is
> more memory exhaustive compared to sax2. But however, after parsing the
> memory should be freed, right? This does not happen, however. Did I
> overlook something? I use only local variables, no new() operator at all.
>
> Arne
>
> -- 
> [--- PGP key FD05BED7 --- http://www.root42.de/ ---]


--
 [ signature omitted ] 

Message 13 in thread

Dude,

I tested your program in debug mode with a 7,751kb file and the memory gets
deallocated. I am using win2000, qt 4.2.2 for windows. The program consumes
about 180mb and it takes a few good seconds (over 10seconds on a P4 1.6)
until the memory is released (and that happens when the program steps out of
the inner block that you have in the main function).

It would probably take a lot longer when you deallocate 2GB!

Forget my idea of reading the whole file in the memory. I tried it and it is
worse.

B.C.

"Arne Schmitz" <arne.schmitz@xxxxxxx> wrote in message
news:elmi9d$unu$1@xxxxxxxxxxxxxxxxxxxxx
> My problem: I want to parse a large XML file (50 MB) with Qt's XML
classes.
> I want to do it with DOM, because of the simplicity. I DO know that this
is
> more memory exhaustive compared to sax2. But however, after parsing the
> memory should be freed, right? This does not happen, however. Did I
> overlook something? I use only local variables, no new() operator at all.
>
> Arne
>
> -- 
> [--- PGP key FD05BED7 --- http://www.root42.de/ ---]


--
 [ signature omitted ] 

Message 14 in thread

B.C. wrote:

> Forget my idea of reading the whole file in the memory. I tried it and it
> is worse.

Oh, ok. That's too bad. I think I will go and write my own lightweight,
simple-API XML-Parser. :) Since my XML schema is very simple, I do not
really need a full featured DOM parser with unicode support. 

Cheers,

Arne

-- 
 [ signature omitted ] 

Message 15 in thread

Arne Schmitz wrote:

> My problem: I want to parse a large XML file (50 MB) with Qt's XML
> classes. I want to do it with DOM, because of the simplicity. I DO know
> that this is more memory exhaustive compared to sax2. But however, after
> parsing the memory should be freed, right? This does not happen, however.
> Did I overlook something? I use only local variables, no new() operator at
> all.

Some time ago, there was a blog posting on Planet KDE about this issue. It
related to reading in documents in KOffice, which also resulted in huge
memory usage due to the fact that a DOM approch was used (ODF documents are
XML as well). The poster in question created a lightweight version, that,
as far as I remember, saved about an order of magnitude of memory and which
resulted in quite significant time savings. 
It might be interesting to try and look up this posting and see if you can
re-use it, or at least the technology. He made something that was less
efficient when changing the data (who cares if you're only parsing a
document), but which was way more memory efficient.

André
-- 
 [ signature omitted ] 

Pages: Prev | 1 | 2 | Next