Hi guys,
Let me try to give you better insights on what's possible and what the challenges are.
Sebastian, I know you know most of this, yet I've got a feeling you believe there are more restrictions on what's possible than there really are.
I'm about to write a detailed summary of the main principles I figured out. Please give it a go: I promise you'll learn something.
Url Rewritting implies 2 mirrored actions:
- Turning regular "real" urls into friendly counter parts.
- In DNN, this is normally done during the controls life cycle, through the method DotNetNuke.Common.Globals.NavigateUrl(...), which all developers have learnt to systematically use. That method calls in turn whatever Friendly Url Provider was configure to output a friendly url.
- In other systems and in a few 3rd party DNN providers, you can also make a pass by the end of the rendering phase to batch update the html links directly, yet the result is the same really
- Turning a friendly url into it's real counter part.
This relies mainly on the method HttpContext.RewritePath(...), which allows to updated an incoming requests such that all controls and later code during the life cycle "sees" the current request with an updated url (the real one for that matters) This is needed because a lot of code relies deeply on the many primary keys that you usually find on the real urls (page id, module id, item id, category id, sub item id etc.). Accordingly, this is precisely the first operation that DNN does when an incoming request is dealt with, since it is needed to create the current Portal Settings (target portal / target page etc.)
Achieving Url Rewriting is then mainly about dealing those 3 issues:
- What should I do with the real url to make it "friendly"?
- The main idea is generally to
- turn primary keys into their corresponding name/label/description (human readable/SEO counterparts)
- Inject the resulting parts within the path/page name rather than keeping them as query string parameters
- What are the constraints if I want to be sure I can reverse the real url from the friendly one?
- What are the constraints if I want to keep the process automated?
Those 2 last questions are actually related to each other:
- If I allow myself to make the process manual ie explicitly define the real/friendly mappings, then there is literally no constraint on the rewriting apart from making sure only one real url corresponds to a given friendly url. The reversing Http module would only have to look up the corresponding dictionary provided the mapping is known. For that matters, DNN native rewriting rules actually allows you to define such arbitrary mappings, although it does not deal with performing the replacement at writing time, and the regex nature of the engine would not scale so well. Still, you can check by yourself: define an arbitrary mapping and insert the corresponding friendly url in a text/html module: as long as your url reaches IIS and fires up the asp.net worker process, DNN will find his way to the real counter part defined in your rule.
- Now the problem is manual enumeration is generally not an option (think of the millions of urls to deal with on dnn.com). So if you want both the writing phase and the reading phase to be automated, you'd need
- a writing engine
- a reversing engine
- make sure the first one does not outperform the latter
The main approach is then usually the following:
- Base the writing engine on transformations, that can be strictly reversed
- For a restricted set of well known urls, add a capability for extra semi-automated/manual mappings to be checked out at reversing time
To my knowledge, this is what most 3rd party components do.
We at Aricie decided to choose another strategy more than 2 years ago, and although our product did not reach much to the ecosystem yet, I'm quite confident this strategy is about the best you can get
The idea is the following:
- Get full control on the writing process: the engine relies on rules with friendly url patterns. Those patterns are built with what we called "groups" representing those human readable bits earlier mentioned, whom computation is performed by dedicated providers: in your example for instance, our dnn forum native provider supplies amongst others the group "ThreadName" which it computes from the corresponding thread id.
- Discard reversibility by explicitly storing the resulting dictionary of generated mappings. This may sound a bit dumb, but there is an optimized way of doing it without killing performances, and it offers as a bonus the capability to automatically redirect obsolete friendly urls to the most recent ones with a 301 status code (when a page name has changed for instance as detailed in that recent blog entry from Will)
Now there are several ways to push this further which I won't get into, but detecting duplicates would be one of them. For now our engine allows to log an event when 2 different real urls are detected to resolve into the same friendly url. That technically means that the 2 different set of primary keys are mapped to the exact same labels in the selected pattern rule. Forum wise, that could mean a thread with the same title was posted in the same forum in the same page, which indeed happens, but in most other cases this is rare because you keep control on the those labels (for instance you wouldn't have 2 articles with the same title or 2 pages with the same name underneath the same parent page).
We could treat this differently by automatically adding a extra differentiating bit/number when the conflict is detected, and I'll make sure this is actually an available option in the next release.
Well that's it really, the rest you can find by yourself and we'll be happy to answer questions.
Thanks for reading so far.