Products

Solutions

Resources

Partners

Community

Blog

About

QA

Ideas Test

New Community Website

Ordinarily, you'd be at the right spot, but we've recently launched a brand new community website... For the community, by the community.

Yay... Take Me to the Community!

Welcome to the DNN Community Forums, your preferred source of online community support for all things related to DNN.
In order to participate you must be a registered DNNizen

HomeHomeUsing DNN Platf...Using DNN Platf...Administration ...Administration ...how to spider/index protected content?how to spider/index protected content?
Previous
 
Next
New Post
5/25/2006 2:24 AM
 
We've got a Google Mini search appliance and would like to use it to provide search capabilities for our DNN based site with 300+ pages of content which is pulled from a legacy database and displayed via custom modules.

Trouble is, most of this content is supposed to be available only to registered users, yet we'd like the search results to be available to all users.

While the Mini does provide a way to access content that is protected with HTTP Basic Authorization or NTLM authorization, it does not provide a way to get past the ASP.NET forms based login page that DNN uses.

Any suggestions?
 
New Post
7/22/2006 1:15 PM
 

We are looking to implement Google Mini for number of our DNN portals.... Would you be willing to share your expirience with Mini and DNN and how you integrated both?

Thank you,

Soft

 
New Post
7/22/2006 9:54 PM
 

One approach would be to setup the spider to index a virtual website that's a mirror of the DNN site. On this mirror, enable Basic Authentication or Windows Authentication so the spider can get authenticated without cookies. You don't have to physically copy files on the mirror site other than files in the root app folder. All the sub-folders can be virtual folders pointing to the original location. Since the spider will never "edit" anything, the virtual folders will not be an issue.

Caveat: There may be URL issues if the spider records absolute URL's instead of relative ones. However, this can be addressed with a URL re-write rule based on the user-agent.

Nik

 


Nik Kalyani
Co-founder
DotNetNuke Corporation
Blog | Twitter | FaceBook
 
New Post
3/29/2007 12:00 PM
 
My colleague is a genius and got it to work with cookies.
 
One of my big mistakes was trying to crawl a page like www.site.com/default.aspx?tabid=123 The cookie never gets sent that way.
 
start crawl:
 
Follow and crawl with these patterns:
 
In the HTTP Headers page, be sure to change the agent name so it includes your e-mail address (gsa-crawler (Enterprise; GID01065; e-mail here))
 
In the headers:
 
Cookie: DotNetNukeAnonymous=somelongstring;.SITENAME=Somelongstring;portalaliasid=somelongstring;portalroles=somelongstring
 
where somelongstring is all that encrypted hoo-hah.
 
He said he had to wait about 30 minutes after something was successfully crawled to see it in the search results.
 
Previous
 
Next
HomeHomeUsing DNN Platf...Using DNN Platf...Administration ...Administration ...how to spider/index protected content?how to spider/index protected content?


These Forums are dedicated to discussion of DNN Platform and Evoq Solutions.

For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:

  1. No Advertising. This includes promotion of commercial and non-commercial products or services which are not directly related to DNN.
  2. No vendor trolling / poaching. If someone posts about a vendor issue, allow the vendor or other customers to respond. Any post that looks like trolling / poaching will be removed.
  3. Discussion or promotion of DNN Platform product releases under a different brand name are strictly prohibited.
  4. No Flaming or Trolling.
  5. No Profanity, Racism, or Prejudice.
  6. Site Moderators have the final word on approving / removing a thread or post or comment.
  7. English language posting only, please.
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out