written by Danny Baggs, 6. May 2011
As I recently made a foolish mistake, I thought I would share it to help others avoid it in the future. It was to do with my quest to get certain pages of the Solution Exchange Community platform indexed in Google, Bing, and Yahoo etc. Specifically, the valuable forum threads.
First of all, it is worth mentioning how these threads are delivered. The forum itself is an object of the OpenText Social Communities (OTSC) product, which interacts with the Delivery Server through the OTSC XML API.
Therefore, the forum thread pages are dynamically delivered with the shell of the page being the same physical page with the content influenced by parameters. In this case, I’ve chosen to utilise sensible URL structures that contain the parameters for simplification and SEO. I mention more about this in this forum post. The use of rewrite rules in this way for SEO is one of the key values of a Front Controlling Web Server.
As the shell of the page is the same, I initially had the same <title> tag for all threads and thought that this was the problem. After changing to adapt the <title> value to the title of the forum thread (along with waiting for re-indexing to happen) there was no change.
Finally, through checking the index of Solution Exchange on Bing with a “site:” search, I noticed to my surprise that one of the threads was indexed but was associated with the URL http://www.solutionexchange.info/forum.htm!!! This was strange due to the fact that externally, the forum thread was only accessible through a URL like http://www.solutionexchange.info/forum/thread/{ID} meaning that I must be explicitly telling the search engines the wrong URL.
This was the clue I needed to realise that my problem was due to something I had implemented many months before.
To address the potential SEO penalty that the home page of the community was able to be reached through http://www.solutionexchange.info/ and http://www.solutionexchange.info/index.htm, I introduced the use of the following html header link tag – the example below is the home page value but I included this across the whole site:
<link rel="canonical" href="http://www.solutionexchange.info/index.htm" />
You can read more about this on the Official Google Webmaster Central Blog. In summary, it tells the search engines that this page is to be associated with the given URL and page ranking (or “Google juice”) is to be associated with that and not the entry URL that the crawler bot used. This avoids the possibility of page ranking for the same page being split across two or more URLs or being penalised for duplicating content across multiple URLs.
With this knowledge, I was able to update the page template that houses this dynamic content to form the correct URL within this canonical link. Now it’s back to the waiting game to see if the indexes will pick the content and forgive me for positioning different pages as one.
Although a small detail, the end goal and potential gain is huge as it opens up the rich content that continues to grow within the forum for discovery via the big search engines. This in turn will only help those within the wider community who are not aware of Solution Exchange discover the content, which may help them resolve an issue or encourage them to take part in the community platform moving forwards.
Source: Canonical URLs and SEO
© copyright 2011 by Danny Baggs