We have moved to GitHub Issues
Created by Stefan Kip 29 Nov 2012, 14:06:35 Updated by Stefan Kip 23 Jun 2013, 19:30:44
Today I digged into the issue of the 301 URL Tracker of being unable to process .php/.html requests. With Failed requests tracing I found out the UmbracoModule (HttpModule) is reached when request something with *.php at the end (at least with IIS 8), but the Module bounces the request because it isn't a directory URL or *.aspx URL. For the 301 URL Tracker to work with .php/.html, it's necessary to be able to configure the allowed URL extensions.
So my proposal is: add a config item to umbracoSetting.config where you can configure the allowed extensions for requests, which is ".aspx" by default. Then the source has to change to listen to this configuration entry, the only change would be in UmbracoModule.cs on line 210: if (maybeDoc && lpath.Contains('.') && !lpath.EndsWith(".aspx"))
If I understand correctly the 301 URL Tracker hooks into the Not Found handlers to find out which document to redirect the request to. Since we consider that *.php or *.html requests are not for Umbraco, we just ignore them and therefore the Not Found handlers do not trigger... and the 301 URL Tracker does not see the request. Right?
And using a rewriting rule to map *.php to *.aspx is not ideal because then users have to know they have to enter "foo.aspx" in the 301 URL Tracker so that "foo.php" is redirected, which is not intuitive?
But... if Umbraco starts to handle *.html urls then no actual static html file can be served anymore?
Another way is to create a HTTP Module that handles 404 request and put some logic in there. I wouldn't like to have Umbraco handle *.php and *.html etc extensions since that will introduce canonical issues.
Yes that is a problem. One will have to choose between letting the 301 URL Tracker handle *.html requests, or be able to use static html files and create a redirect rule to redirect certain *.html requests tot *.aspx.
Though most people have problems with *.php URL's and for those this will really help. It's an addition and won't replace anything (like static html handlers) by default. It's a choice to enable it.
@Richard First: I don't see the canonical issue here. Umbraco/301 URL Tracker will only redirect *.php / *.html requests to the new umbraco node url. Second: There are some concerns for me to use an HTTP module:
If Umbraco is handling the HTML extension it's not only handling the HTML extension for broken links but also for normal pages so any extension can be used and you will have canonical issues
From what I've seen the HTTPModule is called after the 404 handlers.
And yes you need to rewrite some of your code but you can still use most of the Umbraco API since you have a HTTPContext.
Various points: If the UmbracoModule considers that the request is not an Umbraco request, various things (think contexts) might not initialize and your module may have issues. I understand you want to run before the final 404 handler so that if the 301 tracker does not match, the user gets the 404 page.
Use an IIS rewrite rule to map *.php to *.aspx and keep the original, pre-rewrite url somewhere. I know the IIS Rewrite Module can put it in a server variable of some sort. Or add it to the query string eg foo.php becomes foo.aspx?ext=php. Then, in your Not Found handler, check that server variable or the extension, and rewrite the url back to what it was to begin with, before scanning your table.
Yes it does. I really like the clean and elegant solution Richard suggests, i.e. using an HttpModule, but I'll have to dig into it if it gives me the right order of actions; 1. umbraco checks for node/template/etc. 2. 301 URL Tracker looks for mapped URL. 3. 404 handler is executed.
Maybe the HttpModule can overwrite the 404 handler's action, though in the custom NotFoundHandler I've built (which uses the web.config CustomErrorsSection) I call HttpContext.Current.Response.End(); at the end, which will kill the HttpModule I'm afraid...
The way the pipeline works... you won't be able (as of the 4.x versions) to insert code between the check for nodes & templates, and the 404 handlers. And in addition for foo.php Umbraco will not check for nodes/templates/etc... you have to map your request to some request that Umbraco believes it should handle, hence my rewriting proposal. The IIS Rewrite Module preserves the original url in server variables HTTP_X_ORIGINAL_URL and UNENCODED_URL ... and you can use that original url, if any, in your 301 tracker? If you don't want ppl to install IIS Rewrite Module + rules, just implement a module that rewrites the request early enough, and saves the original url in the httpcontext.
My fear being that I really don't know to which extend Umbraco will be ready, in terms of contexts, to answer you if it has decided it should not handle the request... thinking about it, that's also how the entire back-end works, so maybe it can work in an module. But again, I fail to see how you could plug into the right order of actions.
I agree this shouldn't be a reason to change the way umbraco works, so I'll look into it when I'm going to build the 301 URL Tracker from scratch. For now I guess this issue can be closed; no change required.
Type: Feature (request)
Backwards Compatible: True
Fix Submitted: Inline code
Affected versions: 4.10.0, 4.11.0, 4.11.1
Due in version: