: Should I use a file extension or not? I've always wondered about this and never found a good solution. But this question reminded me of it. When I have a URL on my website it can be displayed
I've always wondered about this and never found a good solution.
But this question reminded me of it.
When I have a URL on my website it can be displayed and accessed any of the following ways:
www.somesite.com/subdirectory http://www.somesite.com/subdirectory/ www.somesite.com/subdirectory/index.htm http://www.somesite.com/subdirectory/index.html www.somesite.com/subdirectory/index.php http://www.somesite.com/subdirectory/index.asp www.somesite.com/subdirectory/some-relevant-keywords http://www.somesite.com/subdirectory/some-relevant-keywords.htm www.somesite.com/subdirectory/index.php?page=some-relevant-keywords http://www.somesite.com/subdirectory/?page=some-relevant-keywords www.somesite.com/subdirectory/?page=some-relevant-keywords&even=more-keywords
etc...
Now, I can understand the merits of adding keywords in the URL. Even the most basic SEO guide will mention to do just that. ... but for the sake of sanity, clarity, ease of reading, ease of use, and so on, including web compliance ...
Is it preferred to have a file-extension or not?
Really, deep down my logic tells me: yes, it should. The reason being is this stems back to the days of the past when the internet was mostly USENET, FIDONET, FTP and GOPHER.
See, if a URL has no filename, then it normally is considered a directory. This is where index.htm came about, because this by default lists the directory if no index file is found. However, soon enough, web programmers started overriding this and using index.htm to actually serve the content of that web directory as a page. The main difference, was markup language was added in, and this was parsed in the browser. With this markup language, the Content-Type:text/html; tag in the response header became the indicator to what filetype it was for any file. HTML seems to be the only "filetype" that just doesn't have consistently named extensions, except for when they are saved.
Unfortunately, once web pages became the main thing, it became a security error to actually display the directory contents, so everything stayed hidden with only the actual URL content being displayed.
Not to mention the cross-platform file-naming wars.. windows based require a 3 or less digit extension, and unix/mac can have more. So should it be .HTM or .HTML or NONE and let the platform decide?
So in essence, I guess what I am trying to figure out is beyond SEO and dealing more with aesthetics and web compliance.
More posts by @Lengel546
7 Comments
Sorted by latest first Latest Oldest Best
You should only add a file extension, if the content behind the URI is actually a file. But even then you could drop it, if there's only one representation of it (JPG, PDF, ...).
If there are multiple representations, the HTTP-way would be to have the format negotiated through the Accept header. But if you want your users to have a say in it, you probably would want to have an extension so they can choose which representation they want (JPG, PNG, ...) by requesting the one or the other URI.
No, you shouldn't use a file extension for normal page types unless you absolutely need it for a technical reason. How does it improve the user experience? It's more to type, yet it tells them nothing useful. What will they be able to do knowing that your site is PHP, ASP, etc? A URL is simpler, cleaner, more usable, and more memorable without a file extension.
See, if a URL has no filename, then it
normally is considered a directory.
I don't think I agree. Generally, a URL is a directory only when it has a trailing slash. Without a trailing slash, it's considered a file.
I've done a little informal experimentation, and what I discovered surprised me but makes some sense.
From a content-being-delivered-to-the-user standpoint, as well as screen scraping, the Content-Type rules the day.
However, the presence or absence of an extension, as well as what that extension is, seems to sway search engine visits.
When I omitted any extension at all, I got relatively few hits -- as if the URL were a location or dynamic content and therefore not worth indexing very much.
When I changed the same links to use an .xml extension, because the pages were actually generated by XSLT (on the server side), the indexing actually dropped further -- perhaps because it thought it was merely data or the result of some programmatic request.
When I changed the same links to use .html, search engines went wild with the site.
At the moment, my site handles all three transparently, but when it provides a clickable link, I return the .html version of the URL.
I'd like to think search engines were a little smarter, or a little less biased, but that's what I've observed happen with my pages.
Is it preferred to have a file-extension or not?
There is nothing in the RFCs to mandate having file-extensions, neither is there anything requiring you to leave them out. It's a choice you make.
Conformant HTTP URI's don't need file extensions for anything. There is a rich set of HTTP headers (especially the MIME type) to handle everything that file-extensions are otherwise used for.
That said, most browsers today do in fact rely on a combination of MIME type, extension, and binary 'fingerprint' of the first bytes to determine content type. This can sometimes give surprising results, and so it's important that we webmasters set the right headers (and possibly disable content type sniffing if we are 101% sure our headers are correct).
There is one situation where file extensions are useful: If the end user saves content from your site to his local computer for later use. Theoretically a 'smart' browser should ensure that saved content works for the local computer type; but in practice you can help everyone by serving up content with industry-standard extensions like .jpg , .mp4 , .css etc. In my experience all browsers handle the HTML type properly. You don't need to add a .htm / .html extension on HTML yourself, the browser will handle this specific content type correctly.
Security: One could argue that there is a security benefit in hiding which platform you're using (.php / .asp etc). That's true. In practice I think any good hacker will discover this right away, so I don't think hiding these extensions for security alone is worth the trouble.
Special consideration: If you plan to use a CDN in the future, and your CDN is of the "push" type (content is uploaded to the CDN beforehand fx via SFTP), then you might want to keep file extensions. Most 3rd party systems look at file extensions to discover which MIME type to serve up the content with.
My personal choice has become:
When HTML is generated dynamically by my webapp, I do not add a 'fake' .html extension to mimic a directory and file structure that isn't actually there. I normalize URLs and I standardize the URL format used for reasons of SEO. I personally prefer having a trailing slash on the last leaf of the URL, i.e. example.org/first/second/ , but that's a matter of taste.
When we are in fact talking about actual files that are uploaded to a harddisk somewhere, then I keep the 'normal' file extension for the type. So .css / .js / .exe / .mp4 etc are in use for these kinds of content.
Cool URIs don't change. (Go to to the section titled "So what should I do? Designing URIs")
Use a .extension where there is more than one representation or where the client software is absolutely stupid and refuses to accept the Content-Type alone (QuickTime, RealPlayer, Outlook, etc I am looking at you):
www.somesite.com/subdirectory - this can be your auto-negotiation version that uses Canonical META tags to point to the actual representation www.somesite.com/subdirectory/ - it is always worth supporting trailing slashes on any URL but using Canonical META tags (not redirects as this is an unnecessary slow down) to point to the correct URL www.somesite.com/subdirectory/index.htm and www.somesite.com/subdirectory/some-relevant-keywords.htm - the three character extension limit doesn't apply to HTTP (only the underlying FileSystem/OS) so the client can save this as index.html or a.a if they wanted to, whilst still being able to access it www.somesite.com/subdirectory/index.html - if you serve a .atom, a .xml or similar version then it makes sense to also honour the .html version (and Canonically link to it via LINK tags on the auto-negotiated version) - use HTTP Content-Location headers to point to the auto-negotiation version though - remember you can also go multi-lingual (.en, .es, etc...) or multi-charset (.utf8, .utf16, etc...) www.somesite.com/subdirectory/index.php and www.somesite.com/subdirectory/index.asp - unless you are serving the source code then these make no sense to support www.somesite.com/subdirectory/some-relevant-keywords - SEO is a constantly changing art and if this works for you then great www.somesite.com/subdirectory/index.php?page=some-relevant-keywords, www.somesite.com/subdirectory/?page=some-relevant-keywords and www.somesite.com/subdirectory/?page=some-relevant-keywords&even=more-keywords - if there are an infinite number of ways to manipulate the content then this is great - but usually pages deserve their own URL not a query string and these type of URLs are to be avoided (try getting someone computer illiterate to type one of those in)
I would say don't include the file extension if the software you're using allows you to omit it. So from your list of examples, my preference would be:
www.somesite.com/subdirectory/some-relevant-keywords
Browsers don't care whether something is a directory or not on the site, or whether it's a HTML file, a .asp file or whatever - they simply make a HTTP request and get a HTTP response. So if the extension is superfluous, drop it.
This also has the added benefit of making your URLs more concise (and easier to read out on the phone - "example dot com slash products" is much nicer sounding than "example dot com slash products dot h t m l"), and making it easier to switch technology in the future (as no URL change would be required).
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.