rdb: How to get a tidy website

Whenever a page or an entire website is published, there is the problem of producing valid XHTML code. Let’s face it: Open Text Websolutions is not (and RedDot CMS was not) the most eager CMS in the world to achieve this goal.
Some open source solutions (e.g. WordPress ;-) ) do a better job.

So what can we do? Since nearly everyone is able to easily create a standards compliant webpage, Open Text and us partners have to cope with a big challenge in explaining why an expensive system is not as good as a free one.

So there are two goals for us:

  • Give the users the freedom of formatting they expect
  • Show your fellow keep-to-the-code developers that you don’t consider the standards just for more or less guidelines.

Are you ready? Let’s go…

Just ASCII for the editors?

First of all, let’s have look at where do the problems come from:
Assuming that we developers are able to program tidy and standards compliant templates, there is only one source causing all of our problems: The text editor!

Uppercase tags, mso-styles and incorrect empty tags like <BR> instead of <br /> spread our code that was once so beautiful.

So the most simple way to avoid all of that would be to keep the editors away from anything where they could produce one single tag without our control. This ends up in denying them to use the text editor, at least the “not in ASCII mode” one.
But I have never seen a project like this…

Use tidy

Another good way to publish standard compliant web pages is to use tidy. Simply activate it in your project variant settings and all problems are gone!

Well, almost all:

  • As tidy will not only touch code from the text editor but the whole page, you will sooner or later run into the situation that it also changes some of your template code while publishing.
  • If you publish just static HTML, tidy may help you a lot.
  • If you publish server side scripts, too, it might interfer a lot and make very harmful changes!
  • Tidy checks and corrects the published code which is transferred to the web server. But this code does not necessarily need to be tidy yet – just the code the web server finally send to the client has to be!
  • Never use it in a LiveServer (Delivery Server) project! It will strip out all of your nice Dynaments as it does not recognize them (when in XHTML mode – I did not try out one of the other modes). If you start to declare these tags (you can of course configure tidy), it will keep them in the BODY, but not in the HEAD.

At this point, I did not investigate any further.

Use another text editor

Version 9 introduced Telerik RadEditor as a new approach. This seems to be a good and well integrated tool that produces really cool tidy code. Good for brand new projects. Changing a running project can be a bit tricky, but the main problem is that you would have to open and close every single text element to see the effect, however.

And there are some issues around, which have been discussed yet in other posts here.

Use a conversion table

Have a look at the /cms/asp folder of your server. There you should find a file named HtmlConvertTable.txt.

It’s a simple plain text document containing tabulator separated strings. Matches left will be replaced by the text right. It has been used in ancient times (before unicode) to convert these strange characters like our german umlauts to their corresponding HTML entity (e.g. ä became ä).

The great advantage of this solution is that it only changes the content of elements, but not one single character of the template code!

In your project variant settings, you can choose between three options (section Conversion of RedDot content):

  • Do not convert characters (I think this means in fact: use the standard file HtmlConvertTable.txt if the element is not set to "Do not convert characters to HTML")
  • Convert characters to XML (changes just these five characters: &, ", <, > and ‘ – but all and everywhere)
  • Convert characters to the following file format (followed by a text entry field)

So let’s create a conversion table for XHTML code, save it to the /cms/asp folder of your server, name it e.g. HtmlConvertTableXhtml.txt and enter this file name into the text entry field.

Into this file, write down all uppercase tags to the left and the corresponding lowercase tags to the right, separated by a tabulator.

Here’s a little example (when you copy it ensure that you get the tabulators right):

<BR> <br />
<IMG  <img
<P  <p
<P> <p>
</P> </p>
<A  <a
</A> </a>
<STRONG> <strong>
</STRONG> </strong>
<EM> <em>
</EM> </em>

Very simple, but powerful. What do we see in this example?

  • You must list both the start tags and the end tags.
  • Some start tags appear twice, because they exist both with and without attributes.
    Attention: You should then note the version with attributes with a following space to avoid unwanted conversion of other tags starting with the same character or string.
  • You can also list any attribute to be converted.
    Attention: Don’t list the href attribute, because this confuses the pagebuilder – internal links will no longer be published!

You can list the ampersand, too, of course, if you do it as follows:

&    & 

Both the left and the right one must have a leading and a trailing space to ensure that only the ampersands standing alone will be converted and not those which are already part of an entity.

Last but not least you can convert deprecated tags into a standard compliant way:

<NOBR> <span class="nowrap">
</NOBR> </span>

The behaviour of the “nowrap” class is then defined using CSS.

In my experience so far, this solution delivers the best results for projects using the built-in RedDot text editor (I’ve tried it with version 7.5), although it’s not possible to convert all tags. For example, you cannot convert empty tags that must have attributes (e.g. img) into their XHTML variant.

That’s it. Now I’d like to hear about your experience with this.