The DNN search engine is made up of 3 parts: The DataStoreProvider, the Search Indexer and the core search engine which is a thin layer of glue for coordinating between the two. In a previous MSDN webcast, I showed how to create a new DataStoreProvider using Lucene.net. I have not tackled the indexer yet because there are some issues involved that make it a much more difficult task.
1. Lucene does not have any concept of DotNetNuke roles or permission. Using a standard spider approach does not work for sites, where much of the content is hidden depending on the role of the user. What we tried to do in the standard indexer was create a mechanism for tagging content so that only people with the appropriate permission can view the content. This allows us to only display search results for people with the correct permission for the associated content. Also, it is nearly impossible for an app to index content that may be reliant on "context" identifiers like querystrings, httpcontext, viewstate, etc. Our solution in the first iteration was to not guess at how to solve this, but instead to let module developers provide the content in a format that could be stored and searched. There are some problems with this approach, but it can provide much better results than trying to spider partial content, or show search result sets that are inaccessible by the user.
2. Right now there are some issues in the glue code and the data formats used for indexing and storing the search content. Before we can fix the indexing issues, we have to first identify a better core API to allow modules to index content on the fly rather than relying on a batch process. We also have to improve the method we use for gathering the search content so that we have a much richer set of metadata to work with.
In my opinion, relying on an engine like Lucene.net will only replace the existing issues with new problems. Instead, I think we need to work on correcting the existing infrastructure such that we can better integrate any indexing or storage solution. We also need to make it easier for module developers to correctly index their content. One of our biggest problems is that the current API is to difficult for module developers to implement correctly and that the core engine is too flaky and non-performant to make it worth the module developers' time investment.