Mobile app version of vmapp.org
Login or Join
Ogunnowo487

: URL rewrite - should I write a fake file suffix (.html) or something more realistic to the platform? (like .asp or .cfm) First, I'd like to preface this question by stating that I insist upon

@Ogunnowo487

Posted in: #Filenames #Seo #UrlRewriting

First, I'd like to preface this question by stating that I insist upon publishing a file name suffix, but only on the outer most entity of our site, the detail pages. I do realize the suffix is not necessary, and as such, some people just don't use it in their URL rewrite rules.

Our basic logic is as follows:


Top Level: h t t p://no-host-name.just-the-domain.tld/
Next Level Deeper, append a "directory" of: /most-general-groups-of-entities/
Next Level: /additional-specificity-like-location/
Next Level: /more-specificity-making-a-smaller-group/


Each level is one click deeper, making all site content no more than four clicks deep -- good for getting spidered. ALSO -- and I like this "feature" -- you can actually remove the right-most piece of each rewritten "directory" and it will serve a page that is a list of links to all of the groups of data belonging to the classes depicted in the directories comprising the rewritten URL.

The fourth click down results in a detail page, such as: name-of-entity.html

So the question: what should I write as a suffix to the file name?

It seems to be mis-leading to rewrite the URI having a suffix of html. I do believe the consensus suggests (still?) that .html is most favored. However, our technology platform, more accurately, would publish content using a page suffix such as .php or .asp. I do recognize there is a slight security benefit to masquerading the suffix that tells the world what your platform is.

However, isn't it kind of black-hat-like to use a suffix of .html? Paranoia causes me to believe that Google may detect URL rewrite and potentially trigger the so-called over optimization penalty.

Supporting use of the HTML suffix is the fact that we are, indeed, serving HTML content to a browser. It would make less sense to arbitrarily pick .pdf or .doc -- which sometimes scare away clicks when seen among search results.

Also, to reiterate my earlier insistence that I prefer to use a suffix, it's because it completes our depiction of a logical hierarchy:


the site
rewrite directory one - the general silos of information
rewrite directory two - a folder containing more folders
rewrite directory three - the folder that has the documents
rewrite document name


All directories end in a forward slash, and in contrast, documents, at least typically, have a suffix and do not have the trailing slash.

I recognize there are certainly other dragons to slay in the course of a workday, but at the moment I am trying to finish up our URL rewrite, which makes this top of mind for me.

Can anyone cite examples to encourage or discourage the rewriting of a suffix of any particular type? And, please, if you see errors in my logic or directory hierarchies, I want to hear what you have to say. Thank you.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Ogunnowo487

3 Comments

Sorted by latest first Latest Oldest Best

 

@Welton855

I would go with .html. The W3C actually recommends this practice in its CHIPs Note, and for two good reasons:


To disguise the technology you are using today
To keep your options open on the technology you will use in the future


Even Tim Berners-Lee himself recommends not using technology-specific file extensions in his famous article “Cool URIs don't change”. In one example, he points to how old-fashioned a Perl .pl extension looks – and Tim wrote this article back in 1998!

10% popularity Vote Up Vote Down


 

@XinRu657

However, isn't it kind of
black-hat-like to use a suffix of
.html?


Web servers have been returning URI's ending in .html from the inception of HTTP - nothing wrong with using it today.


Paranoia causes me to believe that
Google may detect URL rewrite and
potentially trigger the so-called over
optimization penalty.


Unless Googlebot has access to your server configuration or log files, I think you really are being paranoid (i.e. suspicious to the extent that productive behavior is affected).

The idea of an "over-optimization penalty" seems pretty far-fetched ... after all, if you're being penalized for it, it must not be optimization, n'est pas?

Use a canonical URL for each document to avoid any potential indexing problems and return a 404 if the URI is called without the extension you have made canon. (Google can't very well detect a rewrite for which no redirect header is returned)

10% popularity Vote Up Vote Down


 

@Jamie184

Google is not going to care what you use for the suffix.

Personally I would suggest you use .html simply because it says nothing about your underlying technology platform. Or, be deliberately deceptive and use something like .php on an ASP.NET site.

From a security perspective this gives as little information as possible to anyone who might want to hack your site. I know it's "security through obscurity", but that doesn't mean it isn't effective in discouraging a casual hacker (or an automated bot).

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme