Previously I’ve looked at a couple of ways of entering your data into a Publishing Page. Well, it turns out that really there are 3 ways of putting content in:
- Writing content directly using the Content Editor control (the RichHTMLField control)
- Writing content in Word 2007 and using the Document Conversion service
- Writing content in Word 2007 and cutting and pasting(!)
I decided to test and compare these techniques. For options 2. and 3. I used this document:
Let’s look at the results of this and the code that is behind each resulting page…
Authoring Directly with the Content Editor
Obviously, the styles available in my Content Editor control didn’t (out of the box) match those I had available in my Word 2007 document. However, you can configure these – I didn’t bother. Thus, the output for this test was similar but not the same as the test document:
So, what does this code show us? Well, we can see that our Content Editor Control’s styles appear in the code as styles – e.g. ms-rteCustom-ArticleHeadLine – and that other changes to the text appear as inline code – e.g. <font color=”#ff0000″>.
These inline styles are a problem. You can use a stylesheet to control the look of the styles available in the Content Editor control, but that’s not true for style code that appears inline. Fortunately, you can prevent other styling options in the Content Editor control. (A lot of information in that article!)
So, we can lock down the styles to prevent users from causing the creation of inline style code, and we can give them a set of styles to use. Thus, we can enforce the use of a consistent set of styles, which is always great if you might want to change the styles you’re using throughout your site.
The down side of this approach is, well, it’s not a familiar authoring environment. A lot of content authors won’t want to learn a new system.
Author in Word and use the Document Conversion Service
Microsoft Content Management Server 2002 (MCMS) had an Authoring Connector – a plug-in that let you write your document in Word and convert it to a document. SharePoint 2007 has similar functionality in a feature called either ‘Rich Client Authoring’ or ‘Smart Client Authoring’.
Essentially, this lets you convert certain files (.docx, xml, infopath forms) to a publishing Page. There are some limitations to this, such as what happens to metadata, and what happens to embedded content. This last one is particularly relevant as many Word documents do contain images as diagrams, etc.. We’ll come back to this later.
Anyway, what is the output of the document conversion service when run on our test file?
This code contains a series of Paragraph and Span tags for various styles – e.g. Normal-P and Normal-H. The Document conversion service puts the definitions of these styles from the document into another field in the head of the converted HTML page, although defaults are also defined by a standard stylesheet called RCA.css. (Note that in the screenshot above I’ve told the conversion service not to save the style definitions from the document – though it does retain where in the document the styles are used.)
Again, we also have inline styles from formatting text in Word – but we can restrict users to only using the predefined styles. These predefined styles are translated to CSS classes. So again, we can force the styles to conform to a particular set, and then define these styles in a CSS stylesheet.
I guess the other point of note is that the actual structure of the HTML involved is quite different to what you’d create by authoring through the Content Editor control.
Author in Word then Cut and Paste into the Content Editor Control
This option took me by surprise a bit, but it makes sense – the authoring is in a familiar environment, and, well, cut and paste isn’t scary. I cut and pasted the content of my test document into a new page…
Again, our custom style in our .docx file has been translated to a CSS class. The Heading 1 style has just been translated to a <h1> tag. And the normal paragraphs are <p> tags with a style of MsoNormal.
Again, we’ve got our inline style, but again, we can prevent this by restricting the word document template to predefined styles.
And again, the structure of the HTML is not the same as either of the two previous methods. My own feeling is that this code is not as neat; I don’t like the way that the Heading 1 and Heading 1 Red styles are translated differently – one into a paragraph, the other into an <h1> tag. But you could define CSS to style this content too.
One issue is images. The Document Conversion Service (bizarrely) doesn’t handle embedded images. You can work around that though, by linking images into the Word document rather than embedding – just don’t try and work with them offline!
Similarly, cutting and pasting looses any embedded images (though SharePoint is kind enough to warn you – kudos). And I’ve not been able to cut and paste with Linked images rather than embedded ones, so in that regard this method is actually a little worse.
I’ve gotta say, I’d love the time to have a go at making the document conversion service also store embedded images into a library and map images in the document to those images. Perhaps that’d make a good codeplex project?
So, online authoring through the Web Browser is the simplest option and doesn’t have any pitfalls – but it is most likely to scare/annoy your users. Conversely authoring through Word 2007 will be more familiar (if they aren’t bewildered by the user interface changes there), but has a number of issues. Users might well get annoyed by having the styles they can use restricted down also.
One final thought – frustratingly, as we’ve found, all three methods result in different structures of HTML, and require different CSS to style them up. Consequently, I strongly recommend that you find a suitable method and stick to it.
I’ve got to say, I think that I’ve gone full circle now, and I’m back to the idea that it’s just better to author these pages through the Web Browser. The document conversion service not supporting embedded images is a killer.