|
WAIS - Wide Area Information Server
|
|
Related YoLinux Tutorials:
Free Information Technology Magazines and Document Downloads |
WAIS is one of the original search facilities developed to index and search a web site. For something more current, see the YoLinux.com tutorial on htDig to provide an index and search capability for your web site. WAIS was developed by "Thinking Machines Inc." in 1988 for indexing and searching document indexes. It employs a client/server architecture. It was an advance made necessary by the large number of documents residing on web sites. Free text searches such as "grep" were too slow to be applied against large numbers of documents. WAIS speeds up the process by performing the searches up front. A WAIS search will return the titles for documents best matching the search. Indexing a site will create databases (or sources) by indexing the documents. This is done by the program waisindex. The sources generated are used by the waisserver. The program waisq is the interface to the WAIS server. WAIS incorporates relevance ranking which assigns a factor to all indexed words. Words appearing in a title will be assigned a higher relevence. Words which are used less often get a higher ranking. The number of times a word is used in a document and the size of the document also influence the weighting of the word in the index.
ftp://sunsite.unc.edu/pub/packages/infosystems/wais/servers/freeWAIS/Get binaries: freeWAIS-0.5-<UNIX type>.tar.gz where "<UNIX type>" is SunOS, Linux, AIX... or get source code: freeWAIS-0.5.tar.gz Note the use of the word "source" in the WAIS world does not always mean source code. It often means the source of a search index. (as in origin) Man pages: ftp://sunsite.unc.edu/pub/packages/infosystems/wais/documentation/man-pages/*.1 Technical explanation of file structure: (not needed) ftp://sunsite.unc.edu/pub/packages/infosystems/wais/documentation/protspec.txt Note metalab.unc.edu = sunsite.unc.edu.
Unzip-tar the binaries:
The essential elements you need from this are: waisq, waissearch, waisserver and waisindex. (You can use swais.sh which calls swais to run WAIS without the network if you wish) Place WAIS binaries in "/usr/local/bin", "/opt/bin" or other accessible bin directory.
Indexing a collection of documents generates a "sources" database comprised of the following files:
My index script: (Indexes for use on the web)
Create synonym file if required:
/usr/local/http/wais/sources/abc_index.syn
Words to be ignored are hard coded for you in waisindex.
Start script:
waisindex flags:-d :Directory including file name prefix for source files./usr/local/http/wais/sources/abc_index = File name without suffix for index.-t : Type of index created URL = Returned result from search will be in the form of a URL-r : Recursively through subdirectories. /usr/local/docs/HTML = Path of html documents you will be indexing.
I could never get it working from inetd. Use script instead. Used start script: (placed this statement in /etc/rc.local terminated with &)
-p = Port number. Ansi standard Z39.50 says use port 210 -d = Directory of index files inetd setup: (DID NOT WORK!!) File: /etc/inetd.conf (single line)
File: /etc/services wais 210/tcp # wais server for web indexing AIX start script: --start from cgi-bin by server
PERL script to invoke WAIS client "waisq". Download scripts kidofwais.pl, print_hit_bold.pl and cgi-lib.pl and place them in your /cgi-bin/.
The cgi Perl script to execute can be found at:
Edit script:
Download script: http://ljordal.cso.uiuc.edu/print_hit_bold.pl Edit script variables $serverURL and $maintainer. This requires the Perl script cgi-lib.pl:
- Previous setup is for one index -
Searching multiple indexes with one querry: (OPTIONAL) - Usefull for multiple servers Set variables $use_Source_table = 1; Create file /usr/local/http/wais/sources/Source_table Sample: abc_index~ABC Developer Web Site~1~ABC:~~abc_index,abc_index_2,abc_index_3 See: http://www.cso.uiuc.edu/grady.html/Source_table.txt Note: First line references itself and the lines which follow. Use "1" on first line to allow it to reference other lines using "0" which do not further reference anything else. Format: Table of wais sources and how to process them - columns separated by tilde. wais_source_name~title_to_use~search_multiple_indices?~short_name(all on a single line) This table contains the following info:
WAIS-SF is one of the original search facilities developed to index and search a web site. This version of WAIS for "Structured Fields" was developed in 1993 to extend query functionality. Added functionality includes wild card searches, boolean searches, numeric searches with operators such as numeric values less than and greater than, and searching based on defined fields in a document (i.e. author). The fields must be described using a WAIS-sf format description for the layout of the document. freeWAIS-sf and SFgate:
ftp://ls6-ftp.cs.uni-dortmund.de/pub/src/SFgate/
freeWAIS-sf:
wais.pm (PERL module)
|
|
Copyright © 1999 by Greg Ippolito