Trolltech Home | Qt-interest Home | Recent Threads | All Threads | Author | Date
All threads index page 6

Qt-interest Archive, February 2008
[qt4-interest] QXmlStreamWriter puts strange bytes inside xml directive


Message 1 in thread

Hi,
I wrote a single source program which uses QXmlStreamWriter to write a
simple XML file.

// main.cpp
#include <QCoreApplication>
#include <QtXml>
#include <QFile>

#include <iostream>
using namespace std;

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);

    QFile file("test.txt");
    if( !file.open(QIODevice::WriteOnly) ) {
        cerr << "Unable to open output file" << endl;
        return 1;
    }

    QXmlStreamWriter out(&file);
    out.setCodec("UTF-8");

    out.writeStartDocument();

    out.writeStartElement("tag");
    out.writeEndElement();

    out.writeEndDocument();


    return 0;
}

# test.pro
TEMPLATE = app
TARGET =
DEPENDPATH += .
INCLUDEPATH += .

QT += xml
QT -= gui
CONFIG += console

# Input
SOURCES += main.cpp

///////////////////////////////////////////////////////////////////////

The output of g++ -v is
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-
1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile
--enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar--with-cpu=generic
--host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)

The output of uname -a is
Linux localhost.localdomain 2.6.23.8-63.fc8 #1 SMP Wed Nov 21 18:51:08 EST
2007 i686 i686 i386 GNU/Linux

I use Qt 4.3.3

The hex dump of the xml directive is

0000:0000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 ef <?xml version="ï
0000:0010 bb bf 31 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d »¿1.0" encoding=
0000:0020 22 55 54 46 2d 38 22 3f 3e 3c 74 61 67 3e 3c 2f "UTF-8"?><tag></

I don't know why there are some bytes between the " and the 1.0 value inside
the version attribute.
Also, if I remove the statement
out.setCodec("UTF-8")
those bytes disappear.

Message 2 in thread

> The same problem under windows... (I use QT 4.3.2)
> As far as I understood there is a problem in QXmlStreamWriter.
> Method writeStartDocument looks like:
> 
>  void QXmlStreamWriter::writeStartDocument()
>  {
>   writeStartDocument(QLatin1String("1.0"));
>  }
> 
>  Where 1.0 is declard as QLatin1String.
>  
>  Lateron in writeStartDocument this String is treated as a unicode 
> string.
>  What You see in the outputfile is the prefix for unicode for 1.0!
>  
>  Regards
>  karl-heinz
>  www.techdrivers.de
>         
    
-- 
 [ signature omitted ] 

Message 3 in thread

Thank you for the quick answer, even if I don't understand what do you mean
for Unicode prefix. If you remove the setCodec("UTF-8"), do those bytes
disappear?

Message 4 in thread

As far as I see, You can omit the setCodec command, as (at least under 
Windows) 
QXmlStreamWriter writes UTF-8 anyway ...

What I mean with unicode prefix is  the so called BOM (Byte order mark).
see Wikipedia..http://en.wikipedia.org/wiki/Byte-order_mark

best regards
karl-heinz



>         
> -------- Original-Nachricht --------
> Datum: Thu, 28 Feb 2008 12:45:21 +0100
> Von: "Manuel Fiorelli" <manuel.fiorelli@xxxxxxxxx>
> An: qt-interest@xxxxxxxxxxxxx
> Betreff: Re: [qt4-interest] QXmlStreamWriter puts strange bytes inside 
> xml directive
> 
>         
> Thank you for the quick answer, even if I don't understand what do you 
> mean for Unicode prefix. If you remove the setCodec("UTF-8"), do those 
> bytes disappear?
> 
>         
    
-- 
 [ signature omitted ] 

Message 5 in thread

---------- Forwarded message ----------
From: Manuel Fiorelli <manuel.fiorelli@xxxxxxxxx>
Date: 28-feb-2008 13.58
Subject: Re: [qt4-interest] QXmlStreamWriter puts strange bytes inside xml
directive
To: Karl-Heinz Reichel <khReichel@xxxxxx>

Now I understand perfectly what you meant. Shouldn't BOM occur at the
beginning of the Unicode file? May we consider this behavior a bug, or at
least a deprecable behavior?
I know that the default encoding for QXmlStreamWriter is UTF-8, but I saw
that if one removes the call to setCodec, then the BOM disappears: does
anyone confirm that?