Linking to from one document to another is basically the foundation of the internet. The URL of a document is the address you point your browser at to retrieve some content. Content on the internet is meant to be read. You drive readers to your content by publishing the URL. Ideally you want your content to be available to readers not only now, but for years hence.
The problem lies with the issue of permanency. Lets say you have a small company that sells products over the internet. You put your return and exchange policies at
example.com/help/returns.htm. You buy a large batch of paper invoices, and have your return policy URL proudly displayed at the bottom of each invoice. A few months down the road, the website is re-designed, and the new return policy is now at
example.com/returns. Apparently nobody remembered that your invoices are pointing to a resource that no longer exists. This happens more often than you might think, and often involves both digital and physical links which are broken.
Common Reasons For Removal
- The content of the page becomes outdated, and is removed entirely.
- The website is redesigned, and old information is moved to an entirely new location
- The type of information changes (a pdf becomes a web page)
- Static content becomes dynamic
Coping With The Problem
There are a few basic ways to cope with out of date URLs. You can display a custom 404 error, perhaps with a search box to allow the user to more easily find the page he was looking for. You can redirect the user back to the root of your website, and hope they can find their way from there. Ideally, you can configure your web-server to redirect all requests from the old URL to the new URL. Unfortunately, the latter process can become quite unwieldy, and is often just ignored, especially on larger websites and over long time periods.
The best solution is to plan ahead when first launching your website. With proper planning, you can configure your website’s URL structure to cope with changes in underlying design and technology without changing the basic URL structure.
If a piece of content is going to be advertised or linked to, its URL should be a folder.
example.com/resume is far preferable to
If your website is highly dynamic, generating content on the fly, then your URL structure is even more critical.
example.com/home.php?page=intro&lang=en&spash=yes is confusing, and painful to type by hand.
example.com/intro is simple and easy to type.
How I Handle It
Everybody seems to have their own opinions on how to set up your blog with permanent links. The main concerns seem to be SEO, which isn’t terribly important to me, but I took it into account. I’ve never had a lot of publicly accessible links on my website, so there weren’t many links to preserve. I did choose to maintain most of the hosted files and private content, as I’d planned for that ages ago.
I settled on this structure:
http://curtistasker.com/blog/%category%/%post_id%/%posttitle%. I wanted the word ‘blog’ in the url somewhere, to indicate at a glance (and to search engines) that this was a blog post. I plan on using a small number of categories with little overlap, so having the category in the URL is reasonable. It provides an additional keyword for search engines, and offers users a bit more information to go along with the post title. Post ID is there mostly to make search engines happy, as quite a few people seemed to think a three or more digit numeric in your URL was beneficial. And finally it closes with the post title.
Tweaking WordPress to handle this structure wasn’t terribly bad, though its since gotten easier. I’ve been able to get rid of a custom plugin I wrote to handle category links, as newer versions of WordPress can handle what I want by default. I also use a lot URL Rewriting to help prevent duplicate links, by forcing
example.com/files/file1 to all redirect to the same URL. I also discourage duplicate indexing of content (category and tag pages with summaries).