Document Conversion Service doesn't map column data – Part II

(For the purposes of this post, I’ll use Pages with a capital P to mean items in SharePoint of a Page content type, or a child content type of Page. I’ll also refer to all content types in italics)

Previously, I found that the document conversion service doesn’t map site column data from the Document type to the Page type. So, what are our options?

  1. Get users to fill in the metadata for the converted document
  2. Put the metadata into the Word document
  3. Bespoke coding
  4. Don’t use the conversion service

Let’s look at each in turn.

Get users to fill in the metadata for the converted document

Well, the first option is pretty obvious – get the users to fill in the Site Columns for the converted document’s Page. In my case, this would mean filling in the AWBText column on the ConvertableDocumentPage type. This will work! Unfortunately this means that the page and document’s data is not linked – a change in the AWBText field won’t be replicated between both items, or even just pushed from the ConvertableDocument the next time it’s converted. That sucks a bit, but this might be a valid option.

Put the metadata into the Word document

The second option is quite neat – Word document can have ‘Quick Parts’ – some of which are document properties, and this can be connected to the columns of the content type:

Insert a Quick part menu

You can put these into the document itself. They’re like document ‘Fields’ in Word pre-2007, but these are much, much better. For a start, you can actually type into the quickpart and it’ll update the document properties – and when you save the document to SharePoint it’ll update the columns of the library! Very cool. Anyway, I updated my Word template from my previous example…

Word document with Quickpart

I then created a new document. Note that the AWBText field in the Document Information Panel and the Quick part is linked – I typed in the value in the document and it was reflecting the Document Information Panel.

Example document for Document conversion with QuickPart

I then converted these document. This resulted in:

Converted Document - AWBText value in Page content

Okay, so I’ve scribbled on this a bit. The area outlined in the purple-pink colour is the content of our document that we converted. You can see that this includes the value of the ConvertableDocument‘s AWBText column. Hurrah! However, above this is the value of the AWBText column on the ConvertableDocumentPage – and it is still empty. In other words, the original document’s metadata is now in the page content – but it still isn’t stored against the Page as metadata. This isn’t really suitable for our customer – they need that column data against the Pages for their navigation. Bah!

Bespoke Coding

Okay, I started to wonder if I could fix this via custom code (i.e. some sort of Feature). I dug through some of the hidden properties of my source ConvertableDocument and destination ConvertableDocumentPage using SharePoint Manager. I knew that there must be some sort of connection as if you Edit the ConvertableDocumentPage is shows you that is has a source document, and lets you edit that document instead. Therefore, they must know where they came from.

In SharePointManager, I found some interesting fields. The ConvertableDocument content type I’d created had a property RcaPageID, which was a GUID. ‘Rca’ stands for ‘Rich Client Authoring’, which is what they seemed to call this page authoring technique until some decided to call it ‘Smart Client Authoring‘ instead. Certainly, internally it’s normally referred to as Rich Client Authoring, or ‘Rca’.

I then checked the ConvertableDocumentPage type, which had a property RcaSourceDocID . This was a GUID, and this ID matched the RcaPageID of the document we used to create the page. Thus, and I’m pretty sure about this, it’s the connection between the source Document and destination Page.

Therefore, I could build an event handler that (when a page is updated or created) gets the Page’s source document, sees what columns they share, and copies across the values of those shared columns. Actually, it’d probably have to exclude some (like title), but you get the idea. Also, it’d have to run a query across all the documents in the site collection, but I’m pretty sure that this is possible.

I like this solution, and think it’s a fairly straight forward, generic candidate for a feature, but unfortunately our customer is unable to make server configuration changes – like installing new features. So that rules that idea out… damn.

Don’t use the Document Conversion Service

I know, this seems a bit crazy – but you could author your content in Word and just copy and paste the content into your pages. This is what our customer was doing. I know, it seems a little crazy to me too, but if you lock down the styles available in a Word template, then the code you’ll copy will have consistent CSS styles in it, and you can prevent any inline CSS through that restriction too. It has no server footprint and no duplication of metadata – but you still have to store the documents (which might require column data too).

So those were the options I was able to come up with. I like the coding option – an event handler could be a very elegant way of dealing with this.

Advertisement
Document Conversion Service doesn't map column data – Part II

Take care when adding or removing columns from Site Content Types

As mentioned before the content types on a list are actually children of the site content types. I’ve also looked at adding columns to list content types, which naturally enough doesn’t affect their parent site content types. Anyway, there are issues to consider when dealing with adding and removing list content types – I suggest you refer to this post for more information.

So what about adding and removing columns from Site Content Types – are there issues with this? Well, yes, there are (unsurprisingly). If you add a new column to a Site Content Type, you have the option to ‘Update all content types inheriting from this type’

Update Child Content Types

If you select ‘no’, then the change only applies to that Site Content Type. The next time you add that site content type to a list, the new List Content Type that is created will have the new column, but pre-existing list content types that inherit from the site content type will be unchanged.

If you select ‘yes’, then the List Content Types (or other Site Content Types) that inherit from this content type will have the new column. For the List content types, this means that there will be a new column on the list. Carrying on from an earlier example, here I’ve added a new column (‘Job Title’) to the Example Travel Expenses site content type, and updated all content types inheriting from that. If we then go an look at the List Settings page, we can see our List Content Type has a new column:

Extra List Columns 2

Great! Now what happens if I remove that column from our Site Content Type? Well, again, I get the option to ‘Update all content types inheriting from this type’. If I choose no, then the existing List Content Types derived from this Site Content Type remain as they are. If I choose yes, though, I get a fairly large warning saying:

This column will be removed from all content types that are based on this type. If you are sure you want to remove this column from all content types based on this type, click OK. To remove this column from this content type only, click Cancel to close this dialog box, click No in the Update Lists and Columns section, and then click Remove.

Snappy message that:

Silly Warning Dialog

Anyway, if you click OK, that column is removed from child content types. H0wever, the column is not deleted from lists that were using those child content types. I removed the ‘Job Title’ column from my ‘Example Travel Expenses’ site content type. If we return to our list settings page, we can see that the column still exists, although it isn’t used in any content types:

Extra List Columns 3

This makes sense, as the column could actually contain data, and it could be used in multiple places throughout our sites (potentially hundreds!) However, maybe you do want to remove that column from that list, or potentially those hundreds of lists. In that case (and this is why this is important) you have to delete the ‘orphan’ column on a list by list basis. Therefore, if your content type was used in hundreds of lists, you will have to delete this extra column hundreds of times, once for each list.

Therefore, be very careful when adding or removing columns from a Site Content Type – make sure that you really want to add it (as removing it might be hard), and be aware that removing the column is not the same as deleting it in the lists that use it already.

Take care when adding or removing columns from Site Content Types

Content Types – Who's your daddy?

Content Types are great, but can cause a little confusion. Because you normally define a content type at a site level, that’s pretty much how we think of them – as centrally defined types of item. Often, we actually create these content types on the root site of a site collection, because all subsites will be able to use them then.

However, this isn’t really the case. We do have site content types – but we also have list content types. These are the content types that are actually used on the lists themselves, and they are children of those site content types. This can be most easily seen by clicking on a content type on the List Settings page.

List Content Type

Notice that our ‘Example Travel Expenses’ content type says it has a parent of… …’Example Travel Expenses’! This is our List Content Type telling us that it’s parent is the Site Content Type of the same name. Click on it, and it’ll take you to the Site content type description, and you can work on up the chain of content types until you reach Item.

Site Content type

A consequence of this is that, as our content types are actually used by lists, I can’t think of a way to use a Site Content Type directly (though I may be wrong about that).

That are also issues related to this in terms of modifying content types, but that’s the subject for another post…

Content Types – Who's your daddy?

What happens to content types when you add a column to a list in SharePoint?

This is sort of relevant to an earlier post on the Document Information Panel, and showing fields in it.

The behaviour depends on if you’ve enabled ‘Allow management of content types’ on the Document Library Settings > Advanced Settings page.

If you’ve not allowed management of content types, well, you just add the column and it’ll appear in the document information panel. That’s great! The new column will not show as belonging to any content types, as the ‘Test‘ column is in the screenshot below:

Extra List Columns

However, if you have allowed management of content types for the library, things get a little more complicated. When you add a new column, in the ‘Additional Column Settings’, there will be an option for ‘Add to all content types’. If you check this, well, it’ll add that column to all the content types currently on that list. This will make it appear on the document information panel. This is what I did with the ‘Test2′ column above (but before I added the Picture content type to the library).

(As a side note, the content types on the list are actually ‘children’ of your Site content type rather than instances of it. This means that if you update the content type on that list, it won’t update the parent content type, or other lists that use that content type. Similarly, it mean that if you update the parent content type – say, you edit the site content type – you need to update child content types with those changes to affect lists that are using them already. But that really should be another post, sometime.)

If you don’t check that ‘Add to all content types’ option, well, it doesn’t add it to the document information panel. Finally, what if you have added a column all content types, and you add a new content type to the list? Well, your new content type will not have that new column applied to it – and the only way I can see of applying it to the new content type is actually to delete and recreate it. Of course, that means deleting a column that contains data, so that is less than ideal. This is what happened with the Picture content type – I added it afterwards, and you can see that it doesn’t use the ‘Test2‘ content type.

For that reason, be very careful when adding columns to library that is allowing management of content types. If possible, keep the columns in the Site Content Type (i.e. the parent).

What happens to content types when you add a column to a list in SharePoint?

Missing Content Type fields in the Document Information Panel

The Document Information Panel is great – it allows you to surface metadata to be filled in about a Word 2007 document in the client.

Document Information Panel Correct

This is great, but I had a bit of a puzzling problem. I’ve set Libraries up to use this features many times now, and it’s pretty straight forward – I’ve added columns to the library, and then the template document for the library has included those columns. Thus, you just go into your document library, click new, and you get a blank word document with the correct document information panel thing. Sometimes I’ve modified that template, but that’s pretty straight forward through the Library Settings pages (Document Library Settings > Advanced Settings > Edit Template).

This time, though, I was using content types (i.e. setting up the library properly), rather than just adding columns directly to a list. Content Types encapsulate (amongst other things) their own set of metadata to be captured – so in other words, they define columns to be added to a list. That’s fine (and very useful).

However, when I went to my document library, clicked ‘New’ and selected my Content Type, I got a blank word document with only one field in the document information panel – title. The blankness was expected (I’d not defined my own template) but none of the other bits of metadata I’d defined for my content type were there. This was a bit of a puzzle. What was different?

Well, after much thinking, I realised something – Content Types ‘inherit’ from each other. My Content Type derived from the Document content type, which specified just one field of metadata – Title. Then it hit me – content types themselves have document templates. My new content type was inheriting from Document, and it was still using the Document content type’s template document. I specified my own template document for my content type and suddenly I had all of my fields available in the document information panel.

It is interesting that there is this difference between the document information panel fields being defined by the library when just using the default ‘Document‘ content type and no others, and the fields being defined by the content type you’ve created if you’re using other types (I.e. you’ve enabled ‘Allow management of content types’ on the Document Library Settings > Advanced Settings page).

Related to this, then, is the question of what happen if you add a column to a list. However, I’ll cover this in another post.

Missing Content Type fields in the Document Information Panel

The circular logic of the WSS Lists Webservice GetListContentTypes call

So, I need to get a list of the content types applied to a library. The Lists webservice has a call for this:

<getlistcontenttypes xmlns="http://schemas.microsoft.com/sharepoint/soap/">
<listname>string</listname>
<contenttypeid>string</contenttypeid>
</getlistcontenttypes>

So, I’m connected to a site, I supply a list name to identify the list… …but what do I give as a contentTypeId? I don’t have a f$%king clue – I was calling this function to try to find out what content types were valid!

Well, good news. It doesn’t seem to matter what you put in, you always get a list of all the content types on the list. I used the following code…

WS_Lists.Lists lsts = new WS_TestApp.WS_Lists.Lists();
lsts.Credentials = System.Net.CredentialCache.DefaultCredentials;
XmlNode node = lsts.GetListContentTypes(libraryName, contentTypeID);

I figured that maybe they wanted the base content type for what I wanted back, so I tried setting contentTypeID to documents (0x0101) and got back a list of all the content types. I then set contentTypeID to “fish”, and still got back a list of all the content types. As far as I can tell, the second parameter doesn’t do anything.

Side note: The first content type returned is your default content type.

Side note 2: There doesn’t seem to be an easy way of identifying non-visible content types…

The circular logic of the WSS Lists Webservice GetListContentTypes call

Why Geeks shouldn't write Documentation…

Just came across this paragraphy in some of the documentation for Windows Workflow Foundation. The first sentence is okay, but it goes downhill from there…

Workflow Task Content Types

By default, all SharePoint task types are assigned content types. If you do not specifically assign a content type to a task type, the task type uses the Task base content type. All task-type content types must be based on the Task base content type.

WTF?

Why Geeks shouldn't write Documentation…