13 comments »

Canonicalization: is it killing your website?

Canonicalization is a file methodology that exposes a flaw in the modern search engine and the way it indexes websites.

WARNING: this is super geeky but you do need to know this, so here goes: Canonicalization is a file methodology that exposes a flaw in the modern search engine and the way it indexes websites.  If you learn to exploit the flaw, your page rank and traffic for both your website and blog will soar, if not, your website can flounder. This first part will explore the flaw and some of its impacts.  The second part will go into the gory detail of how the wrong canonicalization can literally kill a website and how to prevent that.

What is Canonicalization (C14N)

So, beyond a serious point scorer in Scrabble, what exactly is it and why do I care?  Canonicalization according to Wikipedia is the process of converting data that has more than one possible representation into a "standard" canonical representation.  A more concise description of it and how it relates to the web is Matt Cutts explanation: Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages.

 

Let’s jump right into an example.  Search engines read the following urls as if they are totally different websites:

http://digg.com/

http://www.digg.com/

You see the exact same thing when you go to these different urls, right? 

Now run a “site:” against each of those in Google:

Here’s the link to my results:

Site:http://digg.com/  

You get 1,510,000 page results
 

Site:http://www.digg.com/

You should get 2,340,000 page results

As you can see, Google was clearly confused by the small difference in the canonicalization of those 2 urls.  While they should have returned the same number of results but they didn’t (while many of the pages between the two sets of results had the exact same content, Google saw them as two different pages on two different sites).  This proves that what we as users know to be one website, Google believes is two. 

What is the impact of inconsistence canonicalization?

What would end up happening because you have 2 different sets of indexed pages for the same site is that some traffic will go to the www address while other traffic will go to the non www addy.  What this proves is that subdomains matter.  You need to keep your blog and website on a common subdomain to keep all pages and traffic in a place where Google and Alexa can index and measure it.

Duplicate content filters and canolicalization

Think about it.  If Google thinks these results comes from two different sites, how do you think the duplicate content filters will respond?  Exactly- when you splinter your subdomains and pull back the same content on each of them (that is inadvertent- it just happens), it can trip Google’s duplicate content filters.  Penalized for your own content on your own site.  That stings, doesn’t it?

Non-uniform URL’s caused by C14N- means you have two sites with different traffic and page rank stats fighting one another in the SERPs

I think no one will dispute the fact that page rank and SERP are related to inbound and outbound links so what happens when you have a www domain and a non-www domain?  You will get a certain amount of links to one and also links to the other but it is still the same website.  Since there is an imbalance in linking you will get different page ranks for the same site.

Consider the cost: A user does a search for “the next big thing” and Google’s indexes have this listed in two different places as demonstrated by running the site: query so you are now essentially fighting yourself for a search result.  Wouldn’t it have been better if they were always in the one index?

BTW – As nice as it may be, your web server is not terribly smart and it reports the hits to www and non-www as two separate domains so now you have to weed through the logs to find your true hits.

Now, let’s prove that search engines treat sub-domains differently and in a way that can cause uneven traffic and lack of visitors to the primary site AKA www.  So, a search engine sees your one site as two different ones. So what? They are both still you, right?  Hold up- it means you have 2 different page ranks, two different traffic statistics. 

OK, let’s stop there…  For all of you that actually got down to this part.  I am holding up my Secret Decoder Dork Ring and saying Wonder Geeks Activate!

In the next part- I’ll go through what all this really means, how it can kill your website if you have a blog on another subdomian and how to fix it easily.

Related Posts
Canonicalization: getting it right
SEO Autopsy: see your site like Google does
Long tail, short tail and coat tail searches
Learn How To SEO Your Blog
Stop Word List


http://www.rsspieces.com/0003AF
http://www.rsspieces.com/canonicalization-part-1-what-s-killing-your-website
Posted on January 28, 2007 15:13:22
Comment from: Jonathan Dalton [Visitor] Email · http://myblog.daltonsazhomes.com
Form of a waterspout!

Simple question, at the risk of soliciting advice. My SERP results for "Phoenix Arizona Real Estate" indicate I have the new blog set up correctly ... a subdomain on my primary server.

Am I reading the indications correctly?
PermalinkPermalink January 28, 2007 17:09:57
Comment from: Steve Leung [Visitor] Email · http://www.1siliconvalley.com
Great post, would a redirect help solve this issue? How would that affect your PageRank?
PermalinkPermalink January 28, 2007 17:30:19
Comment from: admin [Member] Email
admin
Jonathan, if the number of pages indexed for www. adress and the non www. address are the same, you have likely setup your blog correctly.

Steve, the simple answer is "eventually." If it is an older blog you are doing that to- the search engines will have to catch up to the change. if it is a new blog, yes.
PermalinkPermalink January 29, 2007 11:46:56
Comment from: Condo Blog [Visitor] Email · http://condodomain.com/blog
OK...HELP!!!

How do we shut off the other URL's? For exmaple we have www.condoDomain.com and http://condodomain.com and same for all of our sub-domains and blog...which probably makes us worse than EVERYONE else....do we do re-directs, help RSS help...
PermalinkPermalink January 29, 2007 14:31:38
Comment from: Dave Smith [Visitor] Email · http://www.realestatebloglab.com
It isn't fair to the semi geek bloggers to keep us hanging on for part 2.

While we laugh at the sexcupaides of porcupines we really want to know more about our subdomains and mixed URL allocations.

Please a kindness to the geeks. After all we did read to the very bottom.
PermalinkPermalink January 29, 2007 23:54:18
Comment from: Mariana Wagner [Visitor] Email · http://realtyscoop.wordpress.com
How do you KNOW these things? Huh? I read the whole thing and was delighted to know that I am not the only one w/ a Secret Decoder Dork Ring...
PermalinkPermalink January 31, 2007 21:23:46
Comment from: admin [Member] Email
admin
Holy copper cow, Batman, it's Mariana Wagner... You can see our site... how exciting! We missed you.
PermalinkPermalink January 31, 2007 23:56:16
Comment from: Ian Brown [Visitor] Email · http://www.search-and-submit.net/
We need to be told!
PermalinkPermalink February 05, 2007 23:15:24
Comment from: Claudia [Visitor] Email · http://www.revealrealestate.com/blog
Waiting for part two - t thought maybe, just maybe, I was a geek so looked up canonicalization but didn't understand a word that I read - so need help. I get big differences when I search for links to http://www.coldwellbankerbelize.com whether I include the www or not. Huge differences.
PermalinkPermalink February 06, 2007 18:33:21
Comment from: klein [Visitor] Email · http://www.designersyard.com
hmmm very interesting article, i will test my site in google the http://www.designersyard.com to see results.
PermalinkPermalink April 01, 2007 01:08:03
Comment from: Jezebelus [Visitor] Email · http://www.divxtitles.com

Sorry but I can't understand what so big deal if google index both sites they are both yours, I mean its the same thing as registering more domains and redirecting it to one location, what's wrong about it?

PermalinkPermalink June 18, 2007 09:36:51
Comment from: Joe Boylan [Visitor] Email · http://www.springshomes.com

Thanks for posting about this.


I've been concerned with this since I came across the "Preferred Domain" setting in the Google Webmaster Tools. Kind of had an ah-ha, oh-no moment.


Does setting that particular preference to the www version help?

PermalinkPermalink July 03, 2007 22:00:19
Comment from: Jakov [Visitor] Email · http://www.diseasesarchive.com

What happend if google delete your site from his index???

PermalinkPermalink May 12, 2008 10:35:53
Comment on this article


Your email address will not be displayed.


Your URL will be displayed.
Poor Excellent

Standard HTML is allowed in posts

Line breaks become <br />


Remember me


Allow users to contact you through a message form.
Captcha image.

Please enter the characters from the image above. (case insensitive)

This post has no feedback awaiting moderation...


real estate blogs

Like what you have been reading on this site?
Subscribe to our feed below.







Valid XHTML 1.0 Transitional