See: Description
| Class | Description |
|---|---|
| AccentFoldingFilter |
Improves query results by converting accented characters to normal
characters by removing diacritics.
|
| CrimsonBugWorkaround |
There's a very nasty bug in the Apache Crimson XML parser.
|
| CrimsonBugWorkaround.BlockEnum |
Presents the input stream as a series of blocks of data
|
| DocSelCache |
This class represents the contents of the Document Selector Cache maintained
by the indexer.
|
| DocSelCache.Entry |
One entry in the docSelector cache
|
| FacetTokenizer |
Performs special tokenization for facet fields.
|
| HTMLIndexSource |
Transforms an HTML file to a single-record XML file.
|
| HTMLToString |
This class provides a single static
convert()
method that converts an HTML file into an XML string that can be
pre-filtered and added to a Lucene database by the
XMLTextProcessor class. |
| IdxTreeCleaner |
This class purges "incomplete" documents from a Lucene index.
|
| IdxTreeCuller |
This class provides a simple mechanism for removing documents from an index
when the source text no longer exists in the document library.
|
| IdxTreeDictMaker |
This class provides a simple mechanism for generating a spelling correction
dictionary after new documents have been added or updated.
|
| IdxTreeOptimizer |
This class provides a simple mechanism for optimizing Lucene indices
after new documents have been added , updated, or removed.
|
| IndexDump |
This class dumps the contents of user-selected fields from an XTF text
index.
|
| IndexerConfig |
This class records configuration information about the current state of
the TextIndexer application.
|
| IndexInfo |
This class maintains configuration information about the current index that
the TextIndexer program is processing.
|
| IndexMerge |
This class merges the contents of two or more XTF indexes, with certain
caveats.
|
| IndexMerge.DirInfo | |
| IndexRecord |
A single record within a
IndexSource. |
| IndexSource |
Represents a single source of data for an XTF index.
|
| IndexStats |
This class calculates and prints out some useful statistics about an
existing index, such as number of documents, size, etc.
|
| IndexSync |
Takes care of copying the differences between a source index and a dest
index to make them exactly equal.
|
| MARCIndexSource |
Supplies MARC data to an XTF index, breaking it up into individual MARCXML
records.
|
| MSWordIndexSource |
Transforms a Microsoft Word file to a single-record XML file.
|
| PDFIndexSource |
Transforms a PDF file to a single-record XML file.
|
| PDFToString |
This class provides a single static
convert()
method that converts the text in a PDF file into an XML string that can be
pre-filtered and added to a Lucene database by the
XMLTextProcessor class. |
| PluralFoldingFilter |
Improves query results by converting plural words to their singular
forms.
|
| SectionInfo |
This class maintains information about the current section in a text
document that the TextIndexer program is processing.
|
| SectionInfoStack |
This class maintains information about the current nesting of sections
in a text document that the TextIndexer program is processing.
|
| SpellWritingFilter |
Adds words from the token stream to a SpellWriter.
|
| SrcTreeProcessor |
This class is the main processing shell for files in the source text
tree.
|
| StartEndFilter |
Ensures that the tokens at the start and end of the stream are indexed both
with and without the special start-of-field/end-of-field markers.
|
| StructuredFileProxy |
Used to put off actually creating a structured store until it is needed.
|
| TagFilter |
Spots XML elements in a token stream and marks them specially.
|
| TextIndexer |
This class is the main class for the TextIndexer program.
|
| TextIndexSource |
Transforms an HTML file to a single-record XML file.
|
| UnicodeNormalizingFilter |
Apply Unicode Normalization to the tokens.
|
| XMLConfigParser |
This class parses TextIndexer configuration XML files.
|
| XMLIndexSource |
Supplies a single file containing a single record to the
XMLTextProcessor. |
| XMLTextProcessor |
This class performs the actual parsing of the XML source text files and
generates index information for it.
|
| XtfSpecialTokensFilter |
The
XtfSpecialTokensFilter class is used by the
XTFTextAnalyzer class to convert special "bump" count values in
text chunks to actual position increments for words prior to adding them
to a Lucene index. |
| XTFTextAnalyzer |
The
XTFTextAnalyzer class performs the task of breaking up a
contiguous chunk of text into a list of separate words (tokens
in Lucene parlance.) |
| Exception | Description |
|---|---|
| TextIndexerException |
This exception is thrown by classes related to the textIndexer tool.
|
Contains all the classes that make up the textIndexer tool.
The TextIndexer class is the main command-line interface, while XMLTextProcessor does most of the heavy lifting (scanning documents, breaking them into chunks, passing the chunks to Lucene.)