Products

Solutions

Resources

Partners

Community

Blog

About

QA

Ideas Test

New Community Website

Ordinarily, you'd be at the right spot, but we've recently launched a brand new community website... For the community, by the community.

Yay... Take Me to the Community!

Welcome to the DNN Community Forums, your preferred source of online community support for all things related to DNN.
In order to participate you must be a registered DNNizen

HomeHomeOur CommunityOur CommunityGeneral Discuss...General Discuss...Custom module request - Wikipedia parserCustom module request - Wikipedia parser
Previous
 
Next
New Post
1/25/2007 5:28 PM
 

email me at dylan.barber@earthlink.net and i'll send you my code - its primitive and buggy but it gets the stuff -

 

ps: its for DNN44 so far


Dylan Barber http://www.braindice.com - Dotnetnuke development classes - skins and modules
 
New Post
1/25/2007 5:34 PM
 
Steve White wrote

That would be great. I have a working version in classic asp and a php script I got from someone. The problem is that the generated XML contains a lot of wikicode and would need a lot of regex, which I'm not good at.

Anyone else who's good at formatting with regex want to take this on?

Isn't the purpose of an Extensible Stylesheet (XSL file) to display an XML file in a user viewable manner?  It would seem to me the correct way to go about this project would be to have the user specify a stylesheet to use.  There are probably already style sheets out there that will parse the wiki XML file...

regex... <shudder>.

 

 
New Post
1/25/2007 5:42 PM
 

I wanted the wikipedia generated XML to be cleaned into HTML and store it in the DNN HTML/Text table. The idea was to build a module that could generate any wikipedia article XML, cleanse and then insert it into an existing HTML module. In theory this could also be scheduled to synch the HTML module(s) with wikipedia updates.

Yes regex blows my head off, but the XML definitely needs cleaning so the content then becomes searchable in DNN.


Steve White
 
New Post
1/25/2007 5:50 PM
 
Steve White wrote

I wanted the wikipedia generated XML to be cleaned into HTML and store it in the DNN HTML/Text table. The idea was to build a module that could generate any wikipedia article XML, cleanse and then insert it into an existing HTML module. In theory this could also be scheduled to synch the HTML module(s) with wikipedia updates.

Yes regex blows my head off, but the XML definitely needs cleaning so the content then becomes searchable in DNN.

 

I could be incorrect here.  I haven't actually used XST files in practice - only read about them, and that was years ago...  But from what I understand:

XML + XST = HTML...

XML is the data

XST is how to display the data

and HTML is the output of an XML file having been parsed by an XST file.

an XST parser does that difficult regex work for you...  making the project a whole lot easer..  plus if you later decide that the output HTML should look a little different; all you need to do is edit the XST.

...

Again - i could be way off base here - i'm working off old memory; this is all stored in SDRAM area of my brain, i've been operating off DDR for some time now :-p

 
New Post
1/25/2007 6:19 PM
 

Maybe you're right Matt, I'm not an expert in XST.

Have you ever seen the XML output from a wikipedia article?

It is well formed XML, but the article text is all in one node. And the article data within that node is the stuff that needs formatting, not the whole XML response. It also contians a lot of wikicode, usually characterised by [text] or [[text]] or other things. Those are the things that need parsing by regex, not the whole XML file, but just the contents of one node. 


Steve White
 
Previous
 
Next
HomeHomeOur CommunityOur CommunityGeneral Discuss...General Discuss...Custom module request - Wikipedia parserCustom module request - Wikipedia parser


These Forums are dedicated to discussion of DNN Platform and Evoq Solutions.

For the benefit of the community and to protect the integrity of the ecosystem, please observe the following posting guidelines:

  1. No Advertising. This includes promotion of commercial and non-commercial products or services which are not directly related to DNN.
  2. No vendor trolling / poaching. If someone posts about a vendor issue, allow the vendor or other customers to respond. Any post that looks like trolling / poaching will be removed.
  3. Discussion or promotion of DNN Platform product releases under a different brand name are strictly prohibited.
  4. No Flaming or Trolling.
  5. No Profanity, Racism, or Prejudice.
  6. Site Moderators have the final word on approving / removing a thread or post or comment.
  7. English language posting only, please.
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out