Hi all,
I am looking at developing a search/indexing module for DNN but I am noticing an issue on several DNN sites, where it seems to be possible to have a multiple, different looking URLs point to essentially the same page (or a slightly different page but with the same content).
For example, on this website: http://www.alkihomes.com there are some common links on every page, namely the "Register" and "Login" at the top of the page, and "Terms Of Use" and "Privacy Statement" at the bottom of the page.
If we look at the "Terms Of Use" link on this front page, it appears like so: http://www.alkihomes.com/Home/tabid/36/ctl/Terms/Default.aspx
However, if we were to click on the "Neighborhoods" tab of the sidebar, and then click on the "Terms of Use" link, it will instead point to: http://www.alkihomes.com/Neighborhoods/Alki/tabid/344/ctl/Terms/Default.aspx
Now, these two URLs actually go to the same page theoretically (that is, we want to consider these the same pages from a search engine point of view). However, they are technically different pages because the second URL will have the "Neighborhoods" tab on the sidebar selected. And I understand that this is the reason why the URLs are different. Unfortunately, this is bad for search engines. There is no way that a search engine can tell that these two pages are actually supposed to be similar - and it may consider them as duplicate pages with 99% similar content, in which case, it may penalize your website.
The question is - is there a way of turning this behaviour off? What settings or modules actually trigger this behavior, and are these settings enabled by default?
I would also be interested in hearing from other people who are indexing DNN websites, and any tips or advice in this area. For example, is there a list of URLs or parameters which should be commonly ignored by the indexer.
John