rdb: Duplicate content publishing – SEO and Open Text Web Solutions

When pages are connected to multiple links allover the project – for example by using keywords – you will end up with one page published multiple times in several folder. This leads to duplicate content and search engines like Google, Yahoo, Bing, … basically any engine crawling your page will assume that you are trying to get a higher ranking by duplicating your content and keyword density. This can unfortunately lead to a full exclusion from Google’s search index and none of your pages within your domain would be found. Disturbing, isn’t it?

Why does it happen?

So why does the publishing of multiple pages happen?
Let’s have a look at the architecture of link elements within the Web Solutions Management Server RedDot CMS:
The PageBuilder is the core piece which follows the link structure of your Content Management Project and verifies every link and page within your project. Beside that the PageBuiler has been rewritten recently in .NET and shows now a performance than before when configured properly.

Different link types

There are two or furthermore three types of link elements:

  1. References
    they just reference/point to the origin place where a page “lives” within the project and
  2. Connected pages
    which can be expanded in SmartTree and are ‘truly connected’ to the link element
  3. Mainlink
    One exception is the Mainlink, this is the place where the page usually has been created at first and it defines the place where the page really lives in the CMS project.

What happens during publishing?

During the publishing process within WSMS RedDot CMS referenced links are not followed by the PageBuilder and hence don’t produce a published page follwing this link.
Connected pages to links are recognized by the PageBuilder which picks up the publication package assigned to the link and according to the settings in this package the page gets published. If a page is connected multiple times in a project, not just referenced, this structure creates multiple pages with the same content.

How do we solve multiple publishing?

We will convince the CMS PageBuilder that it is looking at a reference instead of a connected page link. The code for this is easy:

	<% =Replace("<%list_teaser%>","islink=2","islink=10") %>

The only limitation here is:

  • The link element needs to be set to ‘insert path- and filename only’.
  • The code block runs in a prexecute so that the code is executed and returned as html when published

What the code does is that it replaces the link type

  • 2 = link, follow and publish the site based on the publication package attached to this link element
  • 10 = reference, don’t follow this page and don’t publish it, just use the MainLink the page is connected to

The (almost) ready to go template code

 RenderTags & VBScript 
	<ul id="following-pages">
			<h3><a href="<%!! Context:Pages.GetPage(Guid:<%info_PageGuid%>).GetUrl() !!%>"><%hdl_pagetitle%></a></h3>
			'treat the link as reference not as link to avoid duplicate publishing
			<% =Replace("<%list_teaser%>","islink=2","islink=10") %>

EXPLANATION: So what does this all do?

	'treat the link as reference not as link to avoid duplicate publishing
	<% =Replace("<%list_teaser%>","islink=2","islink=10") %>

This code hides the link from the HTML structure. Although it is hidden now from the HTML it doesn’t stop the PageBuilder from following the link and publishing the page, so we have to replace the two link types as discussed above.

Render Tag Code? What do we do here with the Render Tag?

	<%!! Context:Pages.GetPage(Guid:<%info_PageGuid%>).GetUrl() !!%>

This nice Render Tag sets the link URL to the MainLink of the page. If this is used everywhere all links will point to the same place and make sure that you won’t have duplicate content or unwanted sites published in the wrong place.