Ir al contenido principal.
Enlaces relacionados:  Presse  Société  Clientes  Nous contacter
Solsoft

Forum

[php-gni] Inverted terms


Auteur Message
le: 31. 05. 2005 [14:57]
israel@FEE.TCHE.BR
Israel Jose Cefrin da Silva
Auteur du fil
Inscrit depuis: 31.12.1969
Interventions: 0
Hi all

I've got a site with Open*BEEP*/GNI search engine based.

When i do a search with this term: 'economia*informal'
It retrieves 5 records

But, if I do the same search like this: 'informal*economia'
It retrieves 57 records

What could be wrong with my search ? Is something on my FST index ?


You can test my search on this url
http://www.bibvirtual.rs.gov.br:8080/pg_pesquisa.php
!! mark the 'FEE' base

- Search with " economia*informal " :
http://www.bibvirtual.rs.gov.br:8080/pg_pesquisa_resultado.php?termo=3Decon=
omia*informal&tipodedocumento=3D%24&ano=3D&operador=3D*&campo=3D&from=3D0&p=
agina=3D1&base%5B%5D=3DFEE&from=3D0&enviar=3DPesquisar

- Search with " informal*economia " :
http://www.bibvirtual.rs.gov.br:8080/pg_pesquisa_resultado.php?termo=3Dinfo=
rmal*economia&tipodedocumento=3D%24&ano=3D&operador=3D*&campo=3D&from=3D0&p=
agina=3D1&base%5B%5D=3DFEE&from=3D0&enviar=3DPesquisar


regards
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D
=A0Israel Cefrin
=A0t=E9cnico webdesigner
=A0=A0=A0 israel @ fee.tche[dot]br
=A0=A0=A0 msn:isra_rs@hotmail.com
=A0=A0=A0 icq:74378983
=A0=A0=A0 +55 51 3216 9084 - work
=A0=A0=A0 +55 51 8421 7888 - cel
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D

------------------------------------------
Posted to Phorum via PhorumMail
le: 31. 05. 2005 [17:57]
paul@malete.org
Klaus Ripke
Inscrit depuis: 31.12.1969
Interventions: 0
On Tue, May 31, 2005 at 03:57:28PM -0300, Israel Cefrin wrote:
> Hi all
>=20
> I've got a site with Open*BEEP*/GNI search engine based.
>=20
> When i do a search with this term: 'economia*informal'
> It retrieves 5 records
>=20
> But, if I do the same search like this: 'informal*economia'
> It retrieves 57 records
>=20
> What could be wrong with my search ? Is something on my FST index ?
nah

probably this is expected (and even documented!) behaviour.

It's an old story, the infamous OPENISIS_SETLEN .
Mea culpa, I should have provided a default with 1 or 2 zeros more.

For the record, the story is:
The SETLEN determines the number of *hits*, not records,
that will be kept in the internal query buffer.
So, if every record containing economia contains it five times
(which is likely for such a word, since those powerpoint freaks
=66rom the economics department bet their wages on repetition),
you will have only the first 200 records within the buffer
of 1000. The second term works completely different,
filtering the buffer (checking informal in the some 200 records).

The other way round, checking informal first, you will find
much more records within a 1000 hits, from which then
economia is filtered.

Yes, it is documented.

So:
a) recompile using some larger value for OPENISIS_SETLEN
b) educate your users to ask interesting questions first,
so there is less spam to filter out -- it helps a lot, anyway!


saludos

------------------------------------------
Posted to Phorum via PhorumMail
le: 31. 05. 2005 [17:57]
paul@malete.org
Klaus Ripke
Inscrit depuis: 31.12.1969
Interventions: 0
On Tue, May 31, 2005 at 03:57:28PM -0300, Israel Cefrin wrote:
> Hi all
>=20
> I've got a site with Open*BEEP*/GNI search engine based.
>=20
> When i do a search with this term: 'economia*informal'
> It retrieves 5 records
>=20
> But, if I do the same search like this: 'informal*economia'
> It retrieves 57 records
>=20
> What could be wrong with my search ? Is something on my FST index ?
nah

probably this is expected (and even documented!) behaviour.

It's an old story, the infamous OPENISIS_SETLEN .
Mea culpa, I should have provided a default with 1 or 2 zeros more.

For the record, the story is:
The SETLEN determines the number of *hits*, not records,
that will be kept in the internal query buffer.
So, if every record containing economia contains it five times
(which is likely for such a word, since those powerpoint freaks
=66rom the economics department bet their wages on repetition),
you will have only the first 200 records within the buffer
of 1000. The second term works completely different,
filtering the buffer (checking informal in the some 200 records).

The other way round, checking informal first, you will find
much more records within a 1000 hits, from which then
economia is filtered.

Yes, it is documented.

So:
a) recompile using some larger value for OPENISIS_SETLEN
b) educate your users to ask interesting questions first,
so there is less spam to filter out -- it helps a lot, anyway!


saludos

------------------------------------------
Posted to Phorum via PhorumMail



Identification de l'utilisateur

Entrez votre nom d'utilisateur et votre mot de passe pour vous identifier:
Identification

Mot de passe oublié ?


Copyright © 2003-2009, Solsoft de Costa Rica S.A.
Charte de confidentialité