Trolltech Home | Qt-interest Home | Recent Threads | All Threads | Author | Date
All threads index page 7

Qt-interest Archive, April 2007
Best way to find a word on a dictionary?


Message 1 in thread

Hi, well not exactly, I can find regular words with no problem.
the tool I am building is to add a simple word and the tool will tell me 
if the word has any accents(tildes) on it..
thanks to someones help I was able to fix a little encoding issue with 
the dictionary, that is already working and behind me.
what I am trying to do is find a way to match as much as possible the 
word I enter... with one in the diccionary, obviosly if the word
has indeed and tilde/accent  is not going to find it by a simple == 
comparing .. so I am thinking in 2-3 ways..
and wondering if there is something much easier..

First if the word is no match then continues and if the word has the 
same amount of characters then I have a change that this is the word..
if this is true (for testing)  I am trying to check if the word in the 
dictionary has tilde/accent, (this is to test if my  if with QChar is 
working..) because
that is what im trying to debug at the moment.. but it always passes by 
like if non word at all in the entire dictionary has any tilde(not true) 
so I know
I am doing it wrong. any ideas? any good algorithm I could use?


void Ventana::aBuscar() {


                Palabra = CasillaPalabra->text();
                Palabra = Palabra.toLower();
                QString Letra;

                int letras = Palabra.size();

                QFile fichero("es_ES.dic");
                if (!fichero.open(QIODevice::ReadOnly | QIODevice::Text))
                return;

                QTextStream leer(&fichero);
                leer.setCodec("ISO 8859-1");
                while (!leer.atEnd()) {

                                Linea = leer.readLine();

                                if ( Palabra == Linea ) {
                                                Acento = "NO se acentua";
                                                break;
                                        } else {

                                                Acento = "buscando";
                                                int letras2 = Linea.size();
                                                if ( letras == letras2 ) {
                                                        numero = 0;
                                                        for ( int a = 0; 
a < letras2; a++ ) {
                                                                 Letra = 
Linea.at(a);
                                                                 
nuevo.append(Linea.at(a));

                                                                   if ( 
Letra == QChar(0xE1) || Letra == "é" ||  Letra == "í" || Letra == "ó" || 
Letra == "ú" ) {

                                                                                
Acento = "SI TIENE ACENTO";
                                                                                
break;

                                                                        
} else {

                                                                                
Acento = "NO TIENE";
                                                                         
         break;  \\ just for testing. but it never gets till here..
                                                                                
}
                                                                        }

                                                                }

                                                        }

                                }

                CasillaResultado->setText(Acento);
                CasillaDefinicion->setText(Linea);

                }

--
 [ signature omitted ] 

Message 2 in thread

Please check your emails for proper punctuation, spelling, grammar, etc.
before sending them. I had to read this message three times before I felt I
really understood what you were trying to say. Leaving the first letter of a
sentance uncaptilized makes it difficult to spot the beginning and ending of
the sentance. Having "change" instead of "chance" in the second paragraph
really threw me off.

As for your actual question, here are some suggestions:
(1) If you don't mind preprocessing your dictionary and doubling its size,
you can strip all of the accents off of each dictionary word and use that as
the key in a hash to point to the original word. Then you just need to do a
hash lookup of your simple word to find the accented version. Note that if
you have collisions (i.e., several dictionary words reduce to the same
simple word when their accents are removed), you will have to store a list
of matches for each key in the hash.
(2) After some quick Googling, I found this article:
http://www.codeproject.com/cs/algorithms/bmsearch.asp?df=100&forumid=189737&exp=0&select=1158653
which discusses a particular algorithm and how to use it to do
case-insensitive searching. In the comments, someone points out that you
could use the same technique to do accent-insensitive searching.
(3) Notice that the algorithm in (2) was invented in the 1970s. Many
brilliant people have worked on string matching problems for decades (read:
many people smarter than you or me have worked on this problem far longer
than you would even dream of spending on it). I would seriously recommend
that instead of spending your time trying to come up with your own
algorithm, you spend that time researching algorithms invented by others.
Google (www.google.com) and Google Scholar (scholar.google.com) should be
good starting places. You may need to get access to journal papers, in which
case hopefully there is a large university near you that will let you into
their library.

Happy hunting,
Tom

On 4/26/07, rek2GNU/Linux <rek2@xxxxxxxxxxxxxxxxxx> wrote:
>
> Hi, well not exactly, I can find regular words with no problem.
> the tool I am building is to add a simple word and the tool will tell me
> if the word has any accents(tildes) on it..
> thanks to someones help I was able to fix a little encoding issue with
> the dictionary, that is already working and behind me.
> what I am trying to do is find a way to match as much as possible the
> word I enter... with one in the diccionary, obviosly if the word
> has indeed and tilde/accent  is not going to find it by a simple ==
> comparing .. so I am thinking in 2-3 ways..
> and wondering if there is something much easier..
>
> First if the word is no match then continues and if the word has the
> same amount of characters then I have a change that this is the word..
> if this is true (for testing)  I am trying to check if the word in the
> dictionary has tilde/accent, (this is to test if my  if with QChar is
> working..) because
> that is what im trying to debug at the moment.. but it always passes by
> like if non word at all in the entire dictionary has any tilde(not true)
> so I know
> I am doing it wrong. any ideas? any good algorithm I could use?
>
>
> void Ventana::aBuscar() {
>
>
>                 Palabra = CasillaPalabra->text();
>                 Palabra = Palabra.toLower();
>                 QString Letra;
>
>                 int letras = Palabra.size();
>
>                 QFile fichero("es_ES.dic");
>                 if (!fichero.open(QIODevice::ReadOnly | QIODevice::Text))
>                 return;
>
>                 QTextStream leer(&fichero);
>                 leer.setCodec("ISO 8859-1");
>                 while (!leer.atEnd()) {
>
>                                 Linea = leer.readLine();
>
>                                 if ( Palabra == Linea ) {
>                                                 Acento = "NO se acentua";
>                                                 break;
>                                         } else {
>
>                                                 Acento = "buscando";
>                                                 int letras2 = Linea.size
> ();
>                                                 if ( letras == letras2 ) {
>                                                         numero = 0;
>                                                         for ( int a = 0;
> a < letras2; a++ ) {
>                                                                  Letra =
> Linea.at(a);
>
> nuevo.append(Linea.at(a));
>
>                                                                    if (
> Letra == QChar(0xE1) || Letra == "é" ||  Letra == "í" || Letra == "ó" ||
> Letra == "ú" ) {
>
>
> Acento = "SI TIENE ACENTO";
>
> break;
>
>
> } else {
>
>
> Acento = "NO TIENE";
>
>          break;  \\ just for testing. but it never gets till here..
>
> }
>                                                                         }
>
>                                                                 }
>
>                                                         }
>
>                                 }
>
>                 CasillaResultado->setText(Acento);
>                 CasillaDefinicion->setText(Linea);
>
>                 }
>
> --
> To unsubscribe - send a mail to qt-interest-request@xxxxxxxxxxxxx with
> "unsubscribe" in the subject or the body.
> List archive and information: http://lists.trolltech.com/qt-interest/
>
>

Message 3 in thread

rek2GNU/Linux said the following on 26.04.2007 06:58:
> Hi, well not exactly, I can find regular words with no problem. the
> tool I am building is to add a simple word and the tool will tell
> me if the word has any accents(tildes) on it.. thanks to someones
> help I was able to fix a little encoding issue with the dictionary,
> that is already working and behind me. what I am trying to do is
> find a way to match as much as possible the word I enter... with
> one in the diccionary, obviosly if the word has indeed and
> tilde/accent  is not going to find it by a simple == comparing ..
> so I am thinking in 2-3 ways.. and wondering if there is something
> much easier..
> 
> First if the word is no match then continues and if the word has
> the same amount of characters then I have a change that this is the
> word.. if this is true (for testing)  I am trying to check if the
> word in the dictionary has tilde/accent, (this is to test if my  if
> with QChar is working..) because that is what im trying to debug at
> the moment.. but it always passes by like if non word at all in the
> entire dictionary has any tilde(not true) so I know I am doing it
> wrong. any ideas? any good algorithm I could use?

Hei rek2GNU(?),

Google's Research Director, Peter Norvig, recently posted an 
interesting article about a spelling corrector (a stripped down 
version of the Google spell corrector), which you should take a look 
at. You'll find it here:
     http://norvig.com/spell-correct.html

-- 
 [ signature omitted ] 

Attachment: signature.asc
Description: OpenPGP digital signature