TSearch 2 by Khang Nguyen based on the
TSearch 2 (by Khang Nguyen based on the paper of Kerry and Susan Douglas) A Document is a string to catalog and search. n. A Pattern is a logical collection of words. n
The @@ Operator n n n n Is similar in concept to LIKE, ILIKE, or the regular-expression operators(~, ~*, and so on). Compares a pattern against a string. Movies=# SELECT tape_id, title FROM tapes WHERE Movies=# to_tsvector( title ) @@ to_tsquery ( ‘Godfather’ ); Tape_id | title --------------------AB-12345 | The Godfather AB-67472 | The Godfather
Search for all titles that include n Rear and Window: n n Rear or Window: n n to_tsquery( ‘Rear | Window’) Rear but not Window: n n to_tsquery( ‘Rear & Window’) to_tsquery( ‘Rear & ! Window’) Island (Earth or Gilligan): n to_tsquery( ‘Island & (Earth | Gilligan)’ )
Stop words, Stemming n n Stop words (the, a, and, not, some, who, etc. ) are words that tsearch 2 ignores when searching and indexing. n “the” is a very common word and was not included in your search. Stemming is the process of identifying word variations and replacing each word variation with its stem. n to_tsquery() replaces each word in the pattern (donate, donation, donating, donates and donated) with its stem: “donat”.
Stop Words Configurations n To add a new word to the stop word list, edit this file n n n /usr/local/pgsql/share/english. stop To make tsearch 2 faster, keep a low number of word stems. To decrease the number of word stems, use stop words.
Stem Configuration Example n cd PGSQL_SRC/contrib/tsearch 2/gendict n n Obtain stem. {c, h} files. Create template files for Portuguese n n wget http: //snowball. tartarus. org/portuguese/stem. c wget http: //snowball. tartarus. org/portuguese/stem. h . /config. sh -n pt -s -p portuguese -v C'Snowball stemmer for Portuguese' Compile and install dictionary. n n n cd. . /dict_pt make install
Searching Multiple Columns n n n n Perf=# SELECT COUNT(*) FROM recalls WHERE Perf =# to_tsvector( desc_defect ) @@ to_tsquery ( ‘Hydraulic’ ) n OR Perf =# to_tsvector( con_defect ) @@ to_tsquery ( ‘Hydraulic’ ) n OR Perf =# to_tsvector( cor_defect ) @@ to_tsquery ( ‘Hydraulic’ ) Same as (string all three columns together and search through the concatination). Won’t produce correct results if your documents contain any NULL values. Perf=# SELECT COUNT(*) FROM recalls WHERE Perf =# to_tsvector( desc_defect || con_defect || cor_action ) @@ to_tsquery ( ‘Hydraulic’ )
COALESCE n n n n Fixing the problem by using the COALESCE function to map NULL values into some values like empty string. Perf=# SELECT COUNT(*) FROM recalls WHERE Perf =# to_tsvector( Perf =# COALESCE( desc_defect, ‘’ ) || ‘ ‘ || Perf =# COALESCE( con_defect, ‘’ ) || ‘ ‘ || Perf =# COALESCE( cor_action, ‘’ ) || ‘ ‘ || Perf =# @@ to_tsquery ( ‘Hydraulic’ )
Simplifying tsearch 2 with customized functions n n n n n Create a function named documents() that will return the (properly colasced) concatination of desc_defect, con_defect, and cor_action Perf=# CREATE FUNCTION documents( recalls ) RETURNS TSVECTOR AS Perf=# $$ Perf=# SELECT COUNT(*) FROM recalls WHERE Perf =# to_tsvector( Perf =# COALESCE( desc_defect, ‘’ ) || ‘ ‘ || Perf =# COALESCE( con_defect, ‘’ ) || ‘ ‘ || Perf =# COALESCE( cor_action, ‘’ ) || ‘ ‘ || Perf =# @@ to_tsquery ( ‘Hydraulic’ ); Perf=# $$ LANGUAGE ‘SQL’ IMMUTABLE;
Simplifying tsearch 2 with customized functions (Cont. ) n n n n Define a function named document() that converts documents() into a tsvector: Perf=# CREATE FUNCTION documents( recalls ) RETURNS TSVECTOR AS Perf=# $$ Perf=# SELECT to_tsvector( documents( $1 )); Perf=# $$ LANGUAGE ‘SQL’ IMMUTABLE; Now, use the document() function with the @@ operator: Perf=# SELECT COUNT(*) FROM recalls r Perf=# WHERE document ( r ) @@ to_tsquery( ‘hydraulic’ );
- Slides: 10