SharePoint’s Taxonomy (or “Managed Metadata”) fields are a bit strange in how they work. In a lot of ways they’re actually like a lookup field, and part of this lookup field is that site collections that use them have a hidden list – called TaxonomyHiddenList – in the root of the site collection. You can find it at /lists/TaxonomyHiddenList/.
Each term that is used in a Taxonomy field has it’s label (actually, labels) and other details kind of ‘cached’ in this list, and there is a timer job that will push changes to the terms out into these lists – and therefore to items that refer to the entries in the list.
Thus, the TaxonomyHiddenList is like a list that Managed Metadata fields use similarly to a lookup column. Here’s some more of the details that one shows:
You’ll notice that it has the term’s ID and termset ID in the taxonomy service (or ‘Managed Metadata service’). These are important if you want to get information about a term from the taxonomy service.
What I needed to do in some Client Side Object Model (CSOM) code I was writing recently was, given a Term‘s details, I needed to query for that term being referred to in a large list (5000+ items). I could have tried something like:
CamlQuery qry = new CamlQuery(); qry.ViewXml = string.Format("<View><Query><Where><Eq><FieldRef Name='{0}'/><Value Type='Text'>{1}</Value></Eq></Where></Query></View>", "MyField", "My Value");
…but this would run into problems with the query throttle. Even if the column is indexed, you’re not using the LookupId of the field, so you’re not using the index. A working query would be:
CamlQuery qry = new CamlQuery(); qry.ViewXml = string.Format("<View><Query><Where><Eq><FieldRef Name='{0}' LookupId='true' /><Value Type='Lookup'>{1}</Value></Eq></Where></Query></View>", "MyField", 4);
… but to do this you need the LookupId of the item you’re looking for – so you’ll have to use the TaxonomyHiddenList to resolve that Term‘s GUID to the TaxonomyHiddenList‘s integer ID.
“Fine”, I thought, “But what if the TaxonomyHiddenList itself has more than 5000 items?”. I went and checked it, assuming there’d be an index on the important fields, such as IdForTerm and IdForTermSet. I was wrong:
What! No index? So if there are more than 5000 terms used in a Site collection there would be an exception? That can’t be right!
Okay, let’s test it. I put 5001 terms into my TermSet, and created 5001 ListItem‘s that used those terms (1 each), and checked what happened.
Everything still worked. I checked the TaxonomyHiddenList again:
The event receivers that populate the TaxonomyHiddenList (the one that’s added to a list if you add a ‘Managed Metadata’ field to a list) clearly also create the indexes when the 5000 item count is reached.
Okay, so there are indexes (indices?) – how would I query them? Well, I started by trying the following query on IdForTermSet and IdForTerm. This was a mistake.
There are more than 5000 items in the term set, so querying against IdForTermSet caused a query throttling exception. Instead, I queried for just the term ID:
And that worked, despite the over 5000 terms, and I could use that ListItem‘s integer ID to query my other list using the LookupId, and therefore using that list’s index.