YoLinux: Search Engine Review

Add web page search and web page indexing capability to your web site.

Related YoLinux Tutorials:

°Linux Networking

°Linux Sys Admin

°Internet Security

°Security Tools

°Web site configuration

°Web Tricks

°Internet Gateway

°YoLinux Tutorials Index


Search Engines:

There are a plethora of search options available for your web site.

  • Use a search service to spider and index your site and provide the search box on your web pages, search engine and provide the search results which point back to your site. The actual search index and spidering of the web site is handled by the search service. Google and others can provide this service. It is the equivalent of a Google search with:
    search-word site:your-domain.com
    The HTML form which calls the Google search engine can also embed the domain to acheive the same effect. Google and other search firms provide free and for fee services.

  • Index your site and provide your own search capabilities. Commercial and open source solutions exist. This can be a cgi program which performs a grep/search on the site contents when called or it can use a previously generated index of the contents of the web site for faster results.

  • A separate search "appliance" or search server can spider your web site and provide the search facility for your site. This works best for sites with multiple web servers or for intranets with multiple file and web servers.


Commercial Search Services:

Product Vendor Web Site
Google site search Google http://www.google.com/help/features.html#sitesearch
or
Google API's
or
AdSense for search
Alta Vista Searchbox Altavista http://www.altavista.com/help/free/free_searchbox
Bloodhound Bloodhound http://www.bloodhound.com/


Commercial Search Engine Software Vendors:

Vendor Product
Focuseek Searchbox2: Index HTML. PDF, MS/Word, RTF and plain text documents
Folio Folio Site Director
Google Google enterprise solutions: (Based on Stanford research)
SLI Systems Learning search
Lycos Inmagic
Maxum Development Corp. Phantom
Netscape Compass Server
Quadralay Corp Web Works Search
HotBot www.hotbot.com
Opentext Livelink
Verity Ultraseek opverview

Multi-Media

Product Vendor Web Site Use
UKMax - - -
ICQ - - -
Bomis - www.bomis.com -
Copernic Technologies Inc. (Quebec City) - www.copernic.com Queries multiple search engines.
Clever IBM www.almaden.ibm.com/cs/k53/clever.html Ranks search results.
Most authorative first.
Thunderstone - www.thunderstone.com -
Direct Hit - www.directhit.com Uses personal info to modify search.
Incorporates relevance ranking.
Islip - www.islip.com Indexes video closed captioned text.
Enfish - www.enfish.com Information assistant.
Network Wizards - www.nw.com -


List of Open Source options:

Search Engine Web Site
perl_site_search Simplest search to implement
SWISH Version 1.1: Use on low number of local pages only.
SWISH++ The fastest SWISH. Written in C++.
Lucene From the Apache group. Written in Java and runs on Tomcat.
WebGlimpse/Glimpse Original U of Arizona and commercial versions. Written in Perl and C.
HTML, PDF, Word and other formats.
freeWais Can perform "And", "Or" and "Not" type searches.
Also:
freeWais-sf One of the first available content indexing/search engines.
The SF is for "Structured Fields". These fields are used for informations types such as author, title, date... Can perform "And", "Or" and "Not" type searches.
Info:
DataParkSearch HTML, plain text, audio MP3 and GIF images. Supports synonyms, and fuzzy search. Multi-character support. Index and CGI. GPL
Spider/Robot Index Engine
ht/Dig Search/Index single site resident on server or spider remote WWW servers. Supports robots.txt exclusions. HTML and plain text documents. GPL. (San Diego State U.)
Harvest (Robot Indexer) Supports HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and PDF (using Xpdf) Modular. Written in Perl.


Indexing (general):


Spyders and Robots:

On line reviews.


Search Portal Lists:

Comprehensive list of search sites. See:


On-lineReviews of Search Engines:


Links:


Books:

  • "Web Publishing Unleashed, HTML, JAVA, CGI, VRML, SGML"
    ISBN #1-57521-051-7, SAMS
    This book dedicates an entire chapter to WAIS and search engines.

Return to http://YoLinux.com for more Linux links, information and tutorials
Return to YoLinux Tutorial Index
Feedback Form

Copyright © 1999, 2006 by Greg Ippolito