|
Related YoLinux Tutorials:
°Linux Networking
°Linux Sys Admin
°Internet Security
°Security Tools
°Web server configuration
°Web Tricks
°Internet Gateway
°YoLinux Tutorials Index
Free Information Technology Magazines and Document Downloads
|
Adding search to your website:
There are a plethora of search options available for your web site.
- Outsourced Search: Use a search service to spider and index your site and provide the search box
on your web pages, search engine and provide the search results which
point back to your site.
The actual search index and spidering of the web site is handled by the
search service. Google and others can provide this service.
It is the equivalent of a Google search with:
search-word site:your-domain.com
The HTML form which calls the Google search engine can also embed the
domain to acheive the same effect. Google and other search firms
provide free and for fee services.
- Your own Search: Index your site and provide your own search capabilities.
Commercial and open source solutions exist. This can be a cgi program
which performs a grep/search on the site contents when called or it can
use a previously generated index of the contents of the web site for faster results.
- Search Appliance: A separate search "appliance" or search server can spider your web site
and provide the search facility for your site.
This works best for sites with multiple web servers or for intranets with
multiple file and web servers.
| Commercial Search Services: |
-
| Commercial Search Engine Software Vendors: |
-
| Vendor |
Product |
| Focuseek |
Searchbox2: Index HTML. PDF, MS/Word, RTF and plain text documents |
| Folio |
Folio Site Director |
| Google |
Google enterprise solutions:
(Based on Stanford research)
|
| SLI Systems |
Learning search |
| Lycos |
Inmagic |
| Maxum Development Corp. |
Phantom |
| Netscape |
Compass Server |
| Quadralay Corp |
Web Works Search |
| HotBot |
www.hotbot.com |
| Opentext |
Livelink |
| Verity |
Ultraseek opverview |
Multi-Media
-
| List of Open Source options: |
-
| Search Engine |
Web Site |
| perl_site_search |
Simplest search to implement |
| SWISH |
Version 1.1: Use on low number of local pages only. |
| SWISH++ |
The fastest SWISH. Written in C++. |
| Lucene |
From the Apache group. Written in Java and runs on Tomcat. |
| WebGlimpse/Glimpse |
Original U of Arizona and commercial versions. Written in Perl and C.
HTML, PDF, Word and other formats. |
| freeWais |
Can perform "And", "Or" and "Not" type searches.
Also:
|
| freeWais-sf |
One of the first available content indexing/search engines.
The SF is for "Structured Fields".
These fields are used for informations types such as author, title, date...
Can perform "And", "Or" and "Not" type searches.
Info:
|
| DataParkSearch |
HTML, plain text, audio MP3 and GIF images. Supports synonyms, and fuzzy search. Multi-character support. Index and CGI. GPL |
| Spider/Robot Index and Engine |
|
| ht/Dig |
Search/Index single site resident on server or spider remote WWW servers.
Supports robots.txt exclusions. HTML and plain text documents. GPL. (San Diego State U.)
See the YoLinux htDig Web Site Search installation and configuration tutorial (default Red Hat/Fedora/CentOS web site search)
|
| Harvest (Robot Indexer) |
Supports HTML include TeX, DVI, PS, full text, mail, man
pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C
sources and PDF (using Xpdf) Modular. Written in Perl. |
| Solr and Lucene |
From the Apache group. Software works together to provide an enterprise search solution:
Solr: fulltext search, HTML administration interface, distributed seach, Hit highlighting, ...
Lucene: Available in Java, C++, PHP, Python, ... Will index text from PDFs, HTML, Microsoft Word, and OpenDocument documents, ... Indexing and search.
|
| Adding Search to your web site: |
Search Recommendations for your web site:
- The most simple solution is to use outsourced search.
Google and others can provide a search box for your web page and the service to index your site and provide this search capability. Let them handle it.
- The next most simple solution is only for small simple web sites with static web pages: perl_site_search. It can index your local pages on your hard drive and provide a simple search CGI. This can not be used for dynamic content or with server side includes. The entire web page must reside in a single HTML file.
- If your site is more complex and produces dynamic content, a spider must make HTTP requests from your web server to gather and index the content.
I have found ht/Dig to be easy to employ as it is provided with most Linux distributions and just requires configuration.
See the YoLinux htDig Web Site Search installation and configuration tutorial (default Red Hat/Fedora/CentOS web site search)
- For a fully featured, high performance and very sophisticated enterprise search, look at Solr/Lucene. This will require the installation of a Java App server such as Tomcat and a fair bit of configuration and system administration.
YoLinux.com Search Tutorials:
- ht://Dig - Comes with most Linux distributributions
- WAIS - One of the originals - Wide Area Information Server
On line reviews.
Comprehensive list of search sites. See:
| On-lineReviews of Search Engines: |
Books: |
-
|
|