Swish-E Logo


CHANGES - List of revisions


Table of Contents:

[ TOC ]

Revision History

This document contains list of bug fixes and feature additions to Swish-e.

[ TOC ]


Version 2.4.3 December 9, 2004

Improved error messsages when using incremental indexing

There was a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

[ TOC ]


Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

"Fixed" libxml2's change in UTF8Toisolat1() return value

Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

 
   http://bugzilla.gnome.org/show_bug.cgi?id=153937

Added swish-config and pkg-config

Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

Fixed rank bias in merge

Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

Added SwishFuzzy function

SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

Fixed translate character table

Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

MetaNamesRank documentation

Changed the 'not yet implemented' caveat to 'implemented but experimental'.

Added Continuation option to config processing

You can now use continuation lines in the config file:

 
    IgnoreWords \
        the \
        am \
        is \
        are \
        was

There may not be any characters following the backslash.

Fixed Buzzwords (and other word lists entered in the config)

Words entered in config were not converted to lower case before storing in the index.

Fixed metaname mapping problem in Merge

Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

SWISH::Filters and spider.pl updates

The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

Updates to Documentation

Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

Added -R option to support IDF word weighting in ranking. (karman)

Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

Also added Rank discussion to the FAQ.

Updates to the example scripts

Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

Leak when using C library

David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

Swish.cgi now kills swish-e on time out

The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

The template search.tt was renamed to swish.tt

The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

Updates to the search.cgi

The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

New MS Word Filter

James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

Change in way symbolic links are followed

John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

Could not set SwishSetSort() more than once

David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

Access MetaNames and PropertyNames from API

Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

SwishResultPropertyULong() bug fixed

David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

Null written to wrong location in file.c

Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsers. This resulted in a segfault while indexing a large set of XML documents.

Fixed problem when indexing very large files

Steve Harris reported a problem when indexing a very large document that caused an integer overflow. José Ruiz updated to used unsigned integers.

Bump word position on block tags with HTML2 parser

Peter Karman pointed out the the libxml2 HTML parser was allowing phrase matches across block level html elements. Swish now bumps the word position on these elements.

[ TOC ]


Version 2.4.2 - March 09, 2004

[ TOC ]


Version 2.4.1 - December 17, 2003

[ TOC ]


Version 2.4.0 - October 27, 2003

[ TOC ]


Version 2.4.0 (Release Candidate 4) September 26, 2003

[ TOC ]


Version 2.4.0 (Release Candidate 3) September 11, 2003

[ TOC ]


Version 2.4.0 (Release Candidate 2) September 10, 2003

[ TOC ]


Version 2.4.0 (Release Candidate 1) May 21, 2003

[ TOC ]


Version 2.2.3 - December 11, 2002

Multiple -L options were ORing instead of ANDing. Catch by Patrick Mouret. [moseley]

[ TOC ]


Version 2.2.2 - November 14, 2002

Pass non- text/* files onto indexing code IF there is a FileFilter associated with the *extension* of the URL. Fixes the problem of not being able to index, say, pdf files by using the FileFilter configuation option.

Fixed bug where nulls were stripped when using FileFilter with -S prog. Catch by Greg Fenton. [moseley]

[ TOC ]


Version 2.2.1 - September 26, 2002

[ TOC ]


Version 2.2 - September 18, 2002

[ TOC ]


Version 2.2rc1 - August 29, 2002

Many large changes were made internally in the code, some for performance reasons, some for feature changes and additions, and some to prepare for new features in later versions of Swish-e.

Changes to Configuration File Directives. Please see SWISH-CONFIG for more info.

Changes to command line arguments. See SWISH-RUN for documentation on these switches.

[ TOC ]