Qt-interest Archive, March 2002
Fwd: QRegExp captured text.
Message 1 in thread
I'm trying to use QRegExp to process a typical unix style command line.
Suppose there's a command comm with possible options -a -c -t and -f. I tried
the QRegExp as shown in the following test program.
#include <qregexp.h>
#include <qstring.h>
int main(int argc,char *argv)
{
QRegExp rx("comm ((?:\\s*)-([acft]))*");
rx.search("comm -a -f -t");
qDebug(QString("cap(0) = %1").arg(rx.cap(0)));
qDebug(QString("cap(1) = %1").arg(rx.cap(1)));
qDebug(QString("cap(2) = %1").arg(rx.cap(2)));
qDebug(QString("cap(3) = %1").arg(rx.cap(3)));
}
This produces the following output
cap(0) = comm -a -f -t
cap(1) = -t
cap(2) = t
cap(3) =
cap(0) produces the result I expected the ((?:\\s*)-([acft])) part of the
regular expression has been repeated for each option, becuase of the * that
follow it, and so cap(0) show the complete command text.
However, the rest of the results returned by cap() show that only the last
options on the command line (-t) is returned. The -a and -f have been lost. I
cannot find anything in the documents that says whether using * to repeat a
subexpression that captures text is legal or what the results should be. Can
anyone comment on what the correct behaviour should be, and whether this is a
bug. Personnaly I think it would be incorrect to include text in cap(0) that
is not captured in any of the other strings returned by cap().
Cheers
Bob S.
-------------------------------------------------------
Message 2 in thread
On Thursday 21 March 2002 06:47 am, you wrote:
> I'm trying to use QRegExp to process a typical unix style command line.
> Suppose there's a command comm with possible options -a -c -t and -f. I
> tried the QRegExp as shown in the following test program.
>
> #include <qregexp.h>
> #include <qstring.h>
>
> int main(int argc,char *argv)
> {
> QRegExp rx("comm ((?:\\s*)-([acft]))*");
>
> rx.search("comm -a -f -t");
> qDebug(QString("cap(0) = %1").arg(rx.cap(0)));
> qDebug(QString("cap(1) = %1").arg(rx.cap(1)));
> qDebug(QString("cap(2) = %1").arg(rx.cap(2)));
> qDebug(QString("cap(3) = %1").arg(rx.cap(3)));
> }
>
> This produces the following output
>
> cap(0) = comm -a -f -t
> cap(1) = -t
> cap(2) = t
> cap(3) =
>
> cap(0) produces the result I expected the ((?:\\s*)-([acft])) part of the
> regular expression has been repeated for each option, becuase of the * that
> follow it, and so cap(0) show the complete command text.
The output you're getting looks correct to me. cap(0) is the whole
thing, cap(1) is your first parenthesised expression
"((?:\\s*)-([acft]))", and cap(2) is your second parenthesised
expression "([acft])".
> However, the rest of the results returned by cap() show that only the last
> options on the command line (-t) is returned. The -a and -f have been lost.
> I cannot find anything in the documents that says whether using * to repeat
> a subexpression that captures text is legal or what the results should be.
> Can anyone comment on what the correct behaviour should be, and whether
> this is a bug. Personnaly I think it would be incorrect to include text in
> cap(0) that is not captured in any of the other strings returned by cap().
The problem that you're encountering is that your captured text (cap(1)
and cap(2)), are overwritten each time the expression matches. The
expression first matches " -a", but because of the "*" it keeps going so
it then matches " -f" and finally it matches " -t". But the parenthesis
are always cap(1) and cap(2) so what happens is this:
cap(1) cap(2)
-a a
-f f
-t t
One solution you could try is this:
QRegExp rx( "\\s+-([acft])" );
int pos = 0;
while ( pos >= 0 ) {
pos = rx.search("comm -a -f -t", pos);
if ( pos >= 0 ) {
qDebug( rx.cap(1) );
pos++;
}
}
which outputs 'a', 'f' and 't'.
If you want to capture all matching parts in one go, it is possible, but
you *must* know the maximum number of possible matches and the solution
is ugly. This approach is therefore fragile compared with the previous
solution.
QRegExp rx( "\\s+-([acft])(?:\\s+-([acft]))?(?:\\s+-([acft]))?" );
rx.search("comm -a -f -t");
QStringList list = rx.capturedTexts();
list.pop_front(); // Get rid of entire match in cap(0)
qDebug( list.join( ":" ) );
This outputs 'a:f:t'.
Note that after the initial "\\s+-([acft])", you need (max poss matches
-1 ) times this "(?:\\s+-([acft]))?".
--
[ signature omitted ]