Products

Solutions

Resources

Partners

Community

Blog

About

QA

Ideas Test

New Community Website

Ordinarily, you'd be at the right spot, but we've recently launched a brand new community website... For the community, by the community.

Yay... Take Me to the Community!

Welcome to the DNN Community Forums, your preferred source of online community support for all things related to DNN.
In order to participate you must be a registered DNNizen

HomeHomeOur CommunityOur CommunityGeneral Discuss...General Discuss...Search tables are taking up half the database spaceSearch tables are taking up half the database space
Previous
 
Next
New Post
5/10/2009 2:39 AM
 
 
I am on DNN 5.01 and find that over half of my database space is being used by the tables that support the search function.  There are about 2255 items in my DNN database being searched as evidenced by the count of records in the SearchItem table, and there are abou 12,200 records in the SearchWord table.  When I examined the SearchWord table, I found about half the records have what I consider to be "bad" words, or at least words that don't need to be supported by a search.  For example, there is a record  where the word consists of 4 spaces, and another record for 5 spaces, another for 6 spaces, and so on up to the maximum length of the search word in the search settings (50 in my case).  I suppose the 4 spaces derives from the minimum word length specified  in search settings (4), which is the default.  There are also entries where the word begins or ends with a single or double quote, a hyphen, or a slash, or one or more spaces before the word.  My thought was that these SearchWord records that have these "special" characters are not needed and in fact might confuse the code that later creates the records in the SearchItemPosition table, since that table is the one that takes up half of my database storage space.
 
Rather than mess with the code that is burried inside the core search functionality, I decided to put a bandage on my database by making changes to the code that actually puts the records into the database.  There are two stored procedures that add and update records in the SearchWord table and they are very simple as I found them.  My site only uses the english language, so I only coded a fix that seems to work for me using the english language and character set.  The fix adds some checks to the stored procedures to ensure that characters that I don't want to see in search results don't make it into the SearchWord table.  For example, simply aborting the insert or update if a disallowed character is found in the word being added, and using a replace() function on the @Word parameter in the stored procedure to replace these special characters with nothing before the words are added to (or updated in) the SearchWord table.  This has reduced the size of the SearchWord table by about 50% 
 
I admit that this is not the best way to fix this issue, but it works for me and is easy for me to track exactly what is going into my database.  So far, my database has gone down in size by almost 30% just due to the diet-sized search word tables.
 
If anyone has a better way to fix this, I am very interested in learning what it is. 
 
So,... here is the fix that seems to be working for me:  alter the two stored procedures to check for and remove the special characters that I don't want included in the SearchWord table.  Like so:

/****** Object:  StoredProcedure [dbo].[AddSearchWord]    Script Date: 05/09/2009 08:47:57 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[AddSearchWord]
	@Word nvarchar(100)
 
AS

DECLARE @id int
DECLARE @CommonWordID int
-- check to see if the word contains any character that disqualifies it
IF CHARINDEX('.',@Word) > 0
	RETURN
IF CHARINDEX(':',@Word) > 0
	RETURN
IF CHARINDEX('@',@Word) > 0
	RETURN
IF CHARINDEX('_',@Word) > 0
	RETURN
IF CHARINDEX('0',@Word) > 0
	RETURN
IF CHARINDEX('1',@Word) > 0
	RETURN
IF CHARINDEX('2',@Word) > 0
	RETURN
IF CHARINDEX('3',@Word) > 0
	RETURN
IF CHARINDEX('4',@Word) > 0
	RETURN
IF CHARINDEX('5',@Word) > 0
	RETURN
IF CHARINDEX('6',@Word) > 0
	RETURN
IF CHARINDEX('7',@Word) > 0
	RETURN
IF CHARINDEX('8',@Word) > 0
	RETURN
IF CHARINDEX('9',@Word) > 0
	RETURN

-- remove special characters like leading or trailing spaces, “, ”, ', ", ‘, ’, -. /, \
SET @Word = Ltrim(Rtrim(@Word))
SET @Word = REPLACE(@Word,'“','')
SET @Word = REPLACE(@Word,'”','')
SET @Word = REPLACE(@Word,'''','')
SET @Word = REPLACE(@Word,'"','')
SET @Word = REPLACE(@Word,'‘','')
SET @Word = REPLACE(@Word,'’','')
SET @Word = REPLACE(@Word,' ','')
SET @Word = REPLACE(@Word,'-','')
SET @Word = REPLACE(@Word,'/','')
SET @Word = REPLACE(@Word,'\','')

-- check to make sure resulting word is still long enough to be included
IF LEN(@Word) < 4
	RETURN

-- check to make sure resulting word is not a common word
SELECT @CommonWordID
  FROM SearchCommonWords 
	WHERE CommonWord = @Word
IF @CommonWordID IS NOT NULL
	RETURN

SELECT @id = SearchWordsID 
	FROM SearchWord
	WHERE Word = @Word

IF @id IS NULL
	BEGIN
		INSERT INTO SearchWord (
			[Word],
			[IsCommon],
			[HitCount]
			) 
		VALUES (
			@Word,
			0,
			1
		)

		SELECT SCOPE_IDENTITY()
	END
ELSE
	SELECT @id

GO
/****** Object:  StoredProcedure [dbo].[UpdateSearchWord]    Script Date: 05/09/2009 09:06:00 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[UpdateSearchWord]
	@SearchWordsID int, 
	@Word nvarchar(100), 
	@IsCommon bit, 
	@HitCount int 
AS
DECLARE @CommonWordID int
-- check to see if the word contains any character that disqualifies it
IF CHARINDEX('.',@Word) > 0
	RETURN
IF CHARINDEX(':',@Word) > 0
	RETURN
IF CHARINDEX('@',@Word) > 0
	RETURN
IF CHARINDEX('_',@Word) > 0
	RETURN
IF CHARINDEX('0',@Word) > 0
	RETURN
IF CHARINDEX('1',@Word) > 0
	RETURN
IF CHARINDEX('2',@Word) > 0
	RETURN
IF CHARINDEX('3',@Word) > 0
	RETURN
IF CHARINDEX('4',@Word) > 0
	RETURN
IF CHARINDEX('5',@Word) > 0
	RETURN
IF CHARINDEX('6',@Word) > 0
	RETURN
IF CHARINDEX('7',@Word) > 0
	RETURN
IF CHARINDEX('8',@Word) > 0
	RETURN
IF CHARINDEX('9',@Word) > 0
	RETURN

-- remove special characters like spaces, “, ”, ', ", ‘, ’, -. /, \
SET @Word = Ltrim(Rtrim(@Word))
SET @Word = REPLACE(@Word,'“','')
SET @Word = REPLACE(@Word,'”','')
SET @Word = REPLACE(@Word,'''','')
SET @Word = REPLACE(@Word,'"','')
SET @Word = REPLACE(@Word,'‘','')
SET @Word = REPLACE(@Word,'’','')
SET @Word = REPLACE(@Word,' ','')
SET @Word = REPLACE(@Word,'-','')
SET @Word = REPLACE(@Word,'/','')
SET @Word = REPLACE(@Word,'\','')

-- check to make sure resulting word is still long enough to be included
IF LEN(@Word) < 4
	RETURN

-- check to make sure resulting word is not a common word
SELECT @CommonWordID
  FROM SearchCommonWords 
	WHERE CommonWord = @Word
IF @CommonWordID IS NOT NULL
	RETURN

UPDATE dbo.SearchWord SET
	[Word] = @Word,
	[IsCommon] = @IsCommon,
	[HitCount] = @HitCount
WHERE
	[SearchWordsID] = @SearchWordsID


GO

 


Gene W
 
New Post
5/10/2009 7:16 AM
 

The search table used to remove all special characters, but elimited all non-ascii characters used in non-english languages as well. I haven't checked yet the current behaviour, but I am sure, there is still room for improvement. Please log this issue into the public issue tracker (2nd from top) at support.dotnetnuke.com.


Cheers from Germany,
Sebastian Leupold

dnnWerk - The DotNetNuke Experts   German Spoken DotNetNuke User Group

Speed up your DNN Websites with TurboDNN
 
Previous
 
Next
HomeHomeOur CommunityOur CommunityGeneral Discuss...General Discuss...Search tables are taking up half the database spaceSearch tables are taking up half the database space


These Forums are dedicated to discussion of DNN Platform and Evoq Solutions.

For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:

  1. No Advertising. This includes promotion of commercial and non-commercial products or services which are not directly related to DNN.
  2. No vendor trolling / poaching. If someone posts about a vendor issue, allow the vendor or other customers to respond. Any post that looks like trolling / poaching will be removed.
  3. Discussion or promotion of DNN Platform product releases under a different brand name are strictly prohibited.
  4. No Flaming or Trolling.
  5. No Profanity, Racism, or Prejudice.
  6. Site Moderators have the final word on approving / removing a thread or post or comment.
  7. English language posting only, please.
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out