Explained: Indexing and searching content
Cuyahoga uses DotLucene to index the content that has to be searched. DotLucene is a .NET port of the famous Jakarta Lucene engine. It is not a complete search engine but more like a very powerful API that can be integrated in almost every application that needs full text search capabilities.
All content in DotLucene is a Document. To index content, you have to feed Document instances to DotLucene and after a query, it will (indirectly, via Hits) return Document.
Modules in Cuyahoga that want to have their content indexed need to implement the ISearchable interface. This interface consists of one method (GetAllSearchableContent) that retrieves all documents for that module and three events (ContentCreated, ContentUpdated, ContentDeleted), that have to be fired by the module after changing content. These events contain the document that has to be indexed or removed from the index.
Implementing was pretty easy because all module adminstration pages inherit from the same base class. This base class handles the events and calls the appropriate methods of the index builder.
The advantage of the event-based indexing is that all content is indexed immediately. There is no need for a scheduler that crawles the site to update the search index. If anything goes wrong you can always manually build a new search index.
When rebuilding the entire search index from scratch, Cuyahoga traverses the sites, nodes and sections and when it finds a module connected to a section that implements ISearchable, it calls GetAllSearchableContent() to retrieve the documents that have to be indexed.
2/19/2005 5:31:00 PM
Published by
Martijn Boland
Category
Developers
Back