Products

Solutions

Resources

Partners

Community

Blog

About

QA

Ideas Test

New Community Website

Ordinarily, you'd be at the right spot, but we've recently launched a brand new community website... For the community, by the community.

Yay... Take Me to the Community!

Welcome to the DNN Community Forums, your preferred source of online community support for all things related to DNN.
In order to participate you must be a registered DNNizen

HomeHomeUsing DNN Platf...Using DNN Platf...Administration ...Administration ...Search Crawler Duplicate urlSearch Crawler Duplicate url's
Previous
 
Next
New Post
5/13/2011 2:48 PM
 
Hi have a search crawler that is pointed to an intranet site and is showing dozens or even hundreds of duplicate links.  As an example when I search on the word Customer a specific document that has the word customer in it 30 times appears, the document is:

http://sca.intranet.bell.ca/bebn/process/assurance/schedule-g020912.doc

however this document also appears 350 more times with an added forward slash for each additional hit after bebn, something like this:

http://sca.intranet.bell.ca/bebn/////////////////////////////////////...

When I search on the word 'customer' I only need to see this document once.  How can I get the search crawler to only hit this document once?  Is there a regular expression that I can use to stop this duplication, perhaps filtering out multiple '/' slashes after the http://  ?
Does anyone know why this would be happening?
Any help would be greatly appreciated.

 
Previous
 
Next
HomeHomeUsing DNN Platf...Using DNN Platf...Administration ...Administration ...Search Crawler Duplicate urlSearch Crawler Duplicate url's


These Forums are dedicated to discussion of DNN Platform and Evoq Solutions.

For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:

  1. No Advertising. This includes promotion of commercial and non-commercial products or services which are not directly related to DNN.
  2. No vendor trolling / poaching. If someone posts about a vendor issue, allow the vendor or other customers to respond. Any post that looks like trolling / poaching will be removed.
  3. Discussion or promotion of DNN Platform product releases under a different brand name are strictly prohibited.
  4. No Flaming or Trolling.
  5. No Profanity, Racism, or Prejudice.
  6. Site Moderators have the final word on approving / removing a thread or post or comment.
  7. English language posting only, please.
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out