We came across an issue where the bad word filter is replacing text that should not be. For example the word "passage" will come back "pfannyage". While working on this I came across two issues applying regular expressions: 1. Sureptitious badwords using numbers and special characters and 2. replacing the text more precisely. For example when doing a replace, you do not want to do a replace on white space which could concatenate two words. "This is a badword to see" might become This is afannyto see.
I do not know the status of the new bad word filter, but the following is a start (no time to perfect it) . Any improvement that could be made to this, would be really appreciated here. So here are the changes I made to the ForumWordFilterController class. Actually, it is only the relevant code:
This is only a start! But we need a "really good" bad word filter and maybe contributions to this thread could help us all:
at the top of the page: Private Const RegexMetaCharacters As String = "(\[|\\|\^|\$|\.|\||\?|\*|\+|\(|\))"
in the functions where the work is done:
'
'***************************************************************
'09-17-07 improvement to bad word filter via regular expressions
'for more precision - jfs.
Dim filter As StringBuilder
Dim exp As String = String.Empty
Dim wordCapture As String = String.Empty
Dim rx As Regex = Nothing
If (Regex.IsMatch(objFilterWord.BadWord, RegexMetaCharacters)) Then
wordCapture = Regex.Escape(objFilterWord.BadWord)
Else
wordCapture = objFilterWord.BadWord
End If
'(?i)(\b(\d?|_?)*a\$\$(\d?|_?)*)
filter = New StringBuilder(30)
filter.Append("(?i)(\b(\d?|_?)*")
filter.Append(wordCapture)
filter.Append("(\d?|_?)*)")
exp = filter.ToString().Trim()
'
rx = New Regex(exp)
'
If (rx.IsMatch(Text)) Then
Text = rx.Replace(Text, objFilterWord.ReplacedWord)
End If
I have not "thoroughly" tested this, but it will capture
a$$ A$$ 59a$$777 "e;a$$& "A$$" 'a$$' _a$$, a$$ followed by "_", but not the "_" itself
But not pa$$age