Absolute and Relative URLs
I have this ongoing discussion with the development team of one of my clients. They insist on using relative URLs on their numerous development servers. Naturally, I tell them that those can lead to trouble when the pages go live and, of course, they do.
What is the difference between an absolute URL and a relative URL? For newbies out there, a relative URL points to links on a server in a local manner like this – a href="contact.html" – where you just point to the page link like you are right there working on the server.
An absolute URL includes the actual domain name – a href="http://www.sampledomain/contact.html" – as though it is being pointed to from elsewhere.
The problem that can occur is most pronounced with e-commerce sites that have some pages that begin with https: rather than http: because the https pages are secure. All of the links within the secure pages must be absolute or what can happen is this. Let’s say you click on a link to a secure, https: page but decide you want to go back to where you were. If the link on the https: page that goes back to where you were is a relative URL (doesn’t include the domain information in the link) you’ll wind up on what appears to be the same page you came from, but if you look at the URL, it will now be an https: URL.
So, what’s the problem? The problem is that Google and the other search engines can see that single page as two identical pages – one https: and the other http: – and consider them duplicate content. Worse, if the spiders have access to that https: page through even a single misplaced link, they can follow those relative links, potentially indexing a duplicate https: version of your entire web site.
Here’s an example. Take a look at the following Google search for "https" that I ran:
Notice that the URL for SourceForge.net that came up in this search is an https URL – https://sourceforge.net?
A search in Google for "sourceforge" brings up their site without the https:
As I write this article, they’ve got two URLs showing up as identical pages with identical content. In fact, if you click around on the https version, it appears the entire site has been spidered as https, too. That means that in the eyes of most search engines, they’ve got two identical web sites indexed.
Most of us don’t want that to happen.
If the spiders follow the relative links on a secure (https) page, you’re off to do damage control.
My client has had this problem more than once. All it takes is one secure page created with relative links on it to go live.
Absolute links are the best solution to the problem. Make sure that each and every link on your secure (https) pages is absolute. That way once a visitor goes into secure pages, when they leave they are directed out of the secure area and onto non-secure pages.
Naturally, you should block the secure areas of your site from the spiders through robots.txt. Adding no index, no follow meta tags to your secure pages would be added insurance that those pages won’t get indexed or followed.
What about the multiple development servers I mentioned? Using absolute URLs could become a nightmare with those.
One possible solution is the base href tag. The development team could add the following to their standard meta tags:
The code above would be for their live server. If they have a development server, instead of http://www.sampledomain.com as the base, it might be something like http://sampledomain.dev.
Now, as long as appropriate paths are included with all relative links, every page with this code included in the head section will consider http://sampledomain.com to be the "base" from which to point relative links (in the case of the live server).
One downside to using the base href tag is that your team will have to come up with linking conventions and stick to them. For instance, using the base tag above, you’ll have to make sure that your internal links to pages, files, images, stylesheets and so forth all start with a "/" and that the URLs specify the full path to the file.
So, using the base tag above, you’ll need your relative links to be like this:
If you add the "/" to the end of the domain in the base tag (http://www.sampledomain.com/), then you can link like this:
Just take care while setting it up and run tests using something like Xenu LinkSleuth to check for problems. Of course, links to and from secure pages should still always be absolute.
In a nutshell, absolute links can keep your secure pages secure and your other pages, well, not secure. The base href tag can help you avoid broken links from relative URLs and, as you know, broken links are bad news for SEO.
Now, if I can just get the development team on board…