Document Conversion Service doesn't map column data – Part II

(For the purposes of this post, I’ll use Pages with a capital P to mean items in SharePoint of a Page content type, or a child content type of Page. I’ll also refer to all content types in italics)

Previously, I found that the document conversion service doesn’t map site column data from the Document type to the Page type. So, what are our options?

  1. Get users to fill in the metadata for the converted document
  2. Put the metadata into the Word document
  3. Bespoke coding
  4. Don’t use the conversion service

Let’s look at each in turn.

Get users to fill in the metadata for the converted document

Well, the first option is pretty obvious – get the users to fill in the Site Columns for the converted document’s Page. In my case, this would mean filling in the AWBText column on the ConvertableDocumentPage type. This will work! Unfortunately this means that the page and document’s data is not linked – a change in the AWBText field won’t be replicated between both items, or even just pushed from the ConvertableDocument the next time it’s converted. That sucks a bit, but this might be a valid option.

Put the metadata into the Word document

The second option is quite neat – Word document can have ‘Quick Parts’ – some of which are document properties, and this can be connected to the columns of the content type:

Insert a Quick part menu

You can put these into the document itself. They’re like document ‘Fields’ in Word pre-2007, but these are much, much better. For a start, you can actually type into the quickpart and it’ll update the document properties – and when you save the document to SharePoint it’ll update the columns of the library! Very cool. Anyway, I updated my Word template from my previous example…

Word document with Quickpart

I then created a new document. Note that the AWBText field in the Document Information Panel and the Quick part is linked – I typed in the value in the document and it was reflecting the Document Information Panel.

Example document for Document conversion with QuickPart

I then converted these document. This resulted in:

Converted Document - AWBText value in Page content

Okay, so I’ve scribbled on this a bit. The area outlined in the purple-pink colour is the content of our document that we converted. You can see that this includes the value of the ConvertableDocument‘s AWBText column. Hurrah! However, above this is the value of the AWBText column on the ConvertableDocumentPage – and it is still empty. In other words, the original document’s metadata is now in the page content – but it still isn’t stored against the Page as metadata. This isn’t really suitable for our customer – they need that column data against the Pages for their navigation. Bah!

Bespoke Coding

Okay, I started to wonder if I could fix this via custom code (i.e. some sort of Feature). I dug through some of the hidden properties of my source ConvertableDocument and destination ConvertableDocumentPage using SharePoint Manager. I knew that there must be some sort of connection as if you Edit the ConvertableDocumentPage is shows you that is has a source document, and lets you edit that document instead. Therefore, they must know where they came from.

In SharePointManager, I found some interesting fields. The ConvertableDocument content type I’d created had a property RcaPageID, which was a GUID. ‘Rca’ stands for ‘Rich Client Authoring’, which is what they seemed to call this page authoring technique until some decided to call it ‘Smart Client Authoring‘ instead. Certainly, internally it’s normally referred to as Rich Client Authoring, or ‘Rca’.

I then checked the ConvertableDocumentPage type, which had a property RcaSourceDocID . This was a GUID, and this ID matched the RcaPageID of the document we used to create the page. Thus, and I’m pretty sure about this, it’s the connection between the source Document and destination Page.

Therefore, I could build an event handler that (when a page is updated or created) gets the Page’s source document, sees what columns they share, and copies across the values of those shared columns. Actually, it’d probably have to exclude some (like title), but you get the idea. Also, it’d have to run a query across all the documents in the site collection, but I’m pretty sure that this is possible.

I like this solution, and think it’s a fairly straight forward, generic candidate for a feature, but unfortunately our customer is unable to make server configuration changes – like installing new features. So that rules that idea out… damn.

Don’t use the Document Conversion Service

I know, this seems a bit crazy – but you could author your content in Word and just copy and paste the content into your pages. This is what our customer was doing. I know, it seems a little crazy to me too, but if you lock down the styles available in a Word template, then the code you’ll copy will have consistent CSS styles in it, and you can prevent any inline CSS through that restriction too. It has no server footprint and no duplication of metadata – but you still have to store the documents (which might require column data too).

So those were the options I was able to come up with. I like the coding option – an event handler could be a very elegant way of dealing with this.

Advertisements
Document Conversion Service doesn't map column data – Part II

8 thoughts on “Document Conversion Service doesn't map column data – Part II

  1. Joe says:

    Hi,

    What if the source document is deleted?
    is there a way of relinking the converted document to the source doc?
    When editing the aspx page i’m getting ths ierror:
    System.IO.FileNotFoundException: The site with the id “GUID in the RcaSourceDocID property” could not be found.

    Like

  2. Hmm. I thought that deleting the source removed that linking, but I might be wrong. Maybe if you delete the whole site the source document sits in?

    I don’t know any way of converting back, from web page to document.

    There is a way of relinking – but it’d require coding. As the error implies, the ‘RCASourceDocID’ property contains the ID for the source document (I don’t remember, but it must be a combination of Site/List/ListItem IDs). Through code you should be able to update that document for your ‘new’ source document.

    However, as you can’t ‘convert back’, you probably better just creating a new document with you content, converting it, and deleting the currently unlinked on. No code, and probably less effort.

    Like

  3. Joe says:

    Hi,
    thanks for your reply. the converted aspx page contains this link:
    b61840ef-ae09-4ebf-9330-f89ab9fbfe56|a49c3f3d-2d7d-463c-a3a8-aff9ae7daddd|cd4a96dd-2e5e-417d-9a1e-375ffeaed757

    this site has been deleted. no way of finding out where it was exactly. and that wouldn’t help because the GUID will not be the same if the site or list are re-created. the solution I found is either to deleted this line in the aspx file or reconvert again from the source file which is sitting in another location.
    The problem is that we have a lot of converted files showing the same problem and going through all those files is a pain 🙂
    is it safe to delete this line?

    Cheers !

    Like

  4. Joe,

    That link contains 3 GUIDs – probably for Site, List and Document.

    Deleting that line is unlikely to make any difference, I’d’ve thought. Did that work for you? ‘Cos the GUIDs are almost certainly also stored in the hidden metadata for the page inside SharePoint.

    To be honest, reconverting the documents is far, far safer.

    Like

  5. Joe says:

    I did delete the link in the converted aspx file, just deleted the whole line, and the edit page link works. I haven’t edited the page to see if I will have any errors later. I will do that and let you know what happens.

    Like

  6. Oooo – cool. That’s worth knowing. I hadn’t realised that the values were being read from the ASPX page itself. On second thoughts, though, it makes sense – after all, that’s what happens with .DOCX document properties, InfoPath promoted fields, etc..

    Well spotted!

    Hey, there’s probably a feature in that – a ‘disconnect from source document’ feature to delete the line for you! I might look into that (eventually, if I ever get time!)

    Like

  7. Joe says:

    That feature would be welcome 🙂
    but we have to make sure deleting the reference to the source file doesn’t cause issues later. Might be worthwhile asking MS. I will ask our MS TAM to get me some info and will let you know.

    Cheers,
    joe

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s