I am on DNN 5.01 and find that over half of my database space is being used by the tables that support the search function. There are about 2255 items in my DNN database being searched as evidenced by the count of records in the SearchItem table, and there are abou 12,200 records in the SearchWord table. When I examined the SearchWord table, I found about half the records have what I consider to be "bad" words, or at least words that don't need to be supported by a search. For example, there is a record where the word consists of 4 spaces, and another record for 5 spaces, another for 6 spaces, and so on up to the maximum length of the search word in the search settings (50 in my case). I suppose the 4 spaces derives from the minimum word length specified in search settings (4), which is the default. There are also entries where the word begins or ends with a single or double quote, a hyphen, or a slash, or one or more spaces before the word. My thought was that these SearchWord records that have these "special" characters are not needed and in fact might confuse the code that later creates the records in the SearchItemPosition table, since that table is the one that takes up half of my database storage space.
Rather than mess with the code that is burried inside the core search functionality, I decided to put a bandage on my database by making changes to the code that actually puts the records into the database. There are two stored procedures that add and update records in the SearchWord table and they are very simple as I found them. My site only uses the english language, so I only coded a fix that seems to work for me using the english language and character set. The fix adds some checks to the stored procedures to ensure that characters that I don't want to see in search results don't make it into the SearchWord table. For example, simply aborting the insert or update if a disallowed character is found in the word being added, and using a replace() function on the @Word parameter in the stored procedure to replace these special characters with nothing before the words are added to (or updated in) the SearchWord table. This has reduced the size of the SearchWord table by about 50%
I admit that this is not the best way to fix this issue, but it works for me and is easy for me to track exactly what is going into my database. So far, my database has gone down in size by almost 30% just due to the diet-sized search word tables.
If anyone has a better way to fix this, I am very interested in learning what it is.
So,... here is the fix that seems to be working for me: alter the two stored procedures to check for and remove the special characters that I don't want included in the SearchWord table. Like so:
/****** Object: StoredProcedure [dbo].[AddSearchWord] Script Date: 05/09/2009 08:47:57 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[AddSearchWord]
@Word nvarchar(100)
AS
DECLARE @id int
DECLARE @CommonWordID int
-- check to see if the word contains any character that disqualifies it
IF CHARINDEX('.',@Word) > 0
RETURN
IF CHARINDEX(':',@Word) > 0
RETURN
IF CHARINDEX('@',@Word) > 0
RETURN
IF CHARINDEX('_',@Word) > 0
RETURN
IF CHARINDEX('0',@Word) > 0
RETURN
IF CHARINDEX('1',@Word) > 0
RETURN
IF CHARINDEX('2',@Word) > 0
RETURN
IF CHARINDEX('3',@Word) > 0
RETURN
IF CHARINDEX('4',@Word) > 0
RETURN
IF CHARINDEX('5',@Word) > 0
RETURN
IF CHARINDEX('6',@Word) > 0
RETURN
IF CHARINDEX('7',@Word) > 0
RETURN
IF CHARINDEX('8',@Word) > 0
RETURN
IF CHARINDEX('9',@Word) > 0
RETURN
-- remove special characters like leading or trailing spaces, “, ”, ', ", ‘, ’, -. /, \
SET @Word = Ltrim(Rtrim(@Word))
SET @Word = REPLACE(@Word,'“','')
SET @Word = REPLACE(@Word,'”','')
SET @Word = REPLACE(@Word,'''','')
SET @Word = REPLACE(@Word,'"','')
SET @Word = REPLACE(@Word,'‘','')
SET @Word = REPLACE(@Word,'’','')
SET @Word = REPLACE(@Word,' ','')
SET @Word = REPLACE(@Word,'-','')
SET @Word = REPLACE(@Word,'/','')
SET @Word = REPLACE(@Word,'\','')
-- check to make sure resulting word is still long enough to be included
IF LEN(@Word) < 4
RETURN
-- check to make sure resulting word is not a common word
SELECT @CommonWordID
FROM SearchCommonWords
WHERE CommonWord = @Word
IF @CommonWordID IS NOT NULL
RETURN
SELECT @id = SearchWordsID
FROM SearchWord
WHERE Word = @Word
IF @id IS NULL
BEGIN
INSERT INTO SearchWord (
[Word],
[IsCommon],
[HitCount]
)
VALUES (
@Word,
0,
1
)
SELECT SCOPE_IDENTITY()
END
ELSE
SELECT @id
GO
/****** Object: StoredProcedure [dbo].[UpdateSearchWord] Script Date: 05/09/2009 09:06:00 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[UpdateSearchWord]
@SearchWordsID int,
@Word nvarchar(100),
@IsCommon bit,
@HitCount int
AS
DECLARE @CommonWordID int
-- check to see if the word contains any character that disqualifies it
IF CHARINDEX('.',@Word) > 0
RETURN
IF CHARINDEX(':',@Word) > 0
RETURN
IF CHARINDEX('@',@Word) > 0
RETURN
IF CHARINDEX('_',@Word) > 0
RETURN
IF CHARINDEX('0',@Word) > 0
RETURN
IF CHARINDEX('1',@Word) > 0
RETURN
IF CHARINDEX('2',@Word) > 0
RETURN
IF CHARINDEX('3',@Word) > 0
RETURN
IF CHARINDEX('4',@Word) > 0
RETURN
IF CHARINDEX('5',@Word) > 0
RETURN
IF CHARINDEX('6',@Word) > 0
RETURN
IF CHARINDEX('7',@Word) > 0
RETURN
IF CHARINDEX('8',@Word) > 0
RETURN
IF CHARINDEX('9',@Word) > 0
RETURN
-- remove special characters like spaces, “, ”, ', ", ‘, ’, -. /, \
SET @Word = Ltrim(Rtrim(@Word))
SET @Word = REPLACE(@Word,'“','')
SET @Word = REPLACE(@Word,'”','')
SET @Word = REPLACE(@Word,'''','')
SET @Word = REPLACE(@Word,'"','')
SET @Word = REPLACE(@Word,'‘','')
SET @Word = REPLACE(@Word,'’','')
SET @Word = REPLACE(@Word,' ','')
SET @Word = REPLACE(@Word,'-','')
SET @Word = REPLACE(@Word,'/','')
SET @Word = REPLACE(@Word,'\','')
-- check to make sure resulting word is still long enough to be included
IF LEN(@Word) < 4
RETURN
-- check to make sure resulting word is not a common word
SELECT @CommonWordID
FROM SearchCommonWords
WHERE CommonWord = @Word
IF @CommonWordID IS NOT NULL
RETURN
UPDATE dbo.SearchWord SET
[Word] = @Word,
[IsCommon] = @IsCommon,
[HitCount] = @HitCount
WHERE
[SearchWordsID] = @SearchWordsID
GO