Dealing with Large Lists Part 2 – Coping with large queries

In Part 1 we looked at what throttling was, why queries could be throttled, the settings that control throttling, and some of the references to other useful information. In this post we’ll look at ways of trying to get your query to run anyway, depsite the throttle. “Writing Efficient Code In SharePoint Server” is a good introduction to the general principles.

Option 1 – Use Indexed Fields.
SharePoint supports indices on Lists, and these make querying a list much more efficient. It’s a lot like querying an indexed column on a database.

Let’s take an example. Imagine we’ve got a list. That list contains columns for:

  • Colour (Red, Green, etc.)
  • Vehicle (Car, Train, Plane, etc.)

The list contains 5100 items. The ‘Colour’ column is Indexed, but the ‘Vehicle’ Column isn’t.

Note: Setting a field as being ‘indexed’ is something you’d do through the “List Settings” page > “Columns” Section > “Indexed Columns”.

If we ran:

SPQuery qry = new SPQuery();
qry.Query = "<Where><Eq><FieldRef Name='Vehicle'/><Value Type='Text'>Train</Value></Eq></Where>";
SPListItemCollection items = list.GetItems(qry);

… then we would receive an SPQueryThrottledException. This is because the ‘Vehicle’ field isn’t indexed, and so our query has to touch all 5100 items on the list. This is above our 5000 item threshold, and so we receive an exception.

However, if we ran…

SPQuery qry = new SPQuery();
qry.Query = "<Where><Eq><FieldRef Name='Colour'/><Value Type='Text'>Pink</Value></Eq></Where>";
SPListItemCollection items = list.GetItems(qry);

…then this query would probably work. This is because our process doesn’t have to touch all 5100 records. Rather, it can get the items matching our colour from the index. However, if there are still more than 5000 items returned – that is, there were more than 5000 matching items – we can still get an SPQueryThrottledException.

So this might go some way to mitigating the problems of large lists. But what if I want to query by a combination of fields? If I ran the query:

qry.Query = "<Where><And><Eq><FieldRef Name='Vehicle'/><Value Type='Text'>Train</Value></Eq><Eq><FieldRef Name='Colour'/><Value Type='Text'>Pink</Value></Eq></And></Where>";

… this would probably still work. Our process would get all ‘Pink’ items, and then it would scan those items for ‘Trains’.

But let’s return to the point above – say I had more than 5000 items that were Pink, and more that 5000 that were Trains. How could I get all ‘Pink Train’ items?

Well, I could make both columns indexed, but that doesn’t actually help. The indices aren’t matched in any way, so whichever one our process looks at first will generate an exception.

We could try setting up a compound index. SharePoint does allow this, although the options for the secondary field on the index do seem a bit limited. Not all field types are allowed as secondary columns.

In my case, I couldn’t set up an index for both of my fields. So, what else can you do? Well, if you’re comfortable duplicating data, you could have a third column that contains the values from the other two. I’ll call it ‘Description’, and it can have values like ‘Yellow Submarine’. I’ll index it, too. Then I can run queries like:

qry.Query = "<Where><Eq><FieldRef Name='Description'/><Value Type='Text'>Pink Submarine</Value></Eq></Where>";

… and they work (again, provided you get back less than 5000 items!). Obviously, you’ve got the overhead of populating this ‘Description’ column; you could do it with an event handler. I did try using a calculated column to generate those values, but unsurprisingly you can’t index a calculated column.

All of this continues to go some way to overcoming the issues of large lists – but what if indexing isn’t enough? What if you’re bound to have more than 5000 results?

Option 2 – Run your query during the Daily Time Window for Large Queries

The problem with large queries is that they can cause your system to be slow for other users. So why not set a time window to allow large queries when there is little else using the system? Well, it’s a good idea, so Microsoft built this into SharePoint.

You can set this in Central Administration:

This is quite a cheap and easy way of avoiding throttling issues – but only for the queries you can afford to run at odd times of the day. It’s ideally suited to timer jobs, though.

Option 3 – Page your results

The article “Handling Large Lists and Folders” does suggest that if you’re going to retrieve lots of items, you should paginate the list, and shows an example of doing that:

SPQuery qry = new SPQuery();
qry.Query = "<Where><Eq><FieldRef Name='Colour'/><Value Type='Text'>Pink</Value></Eq></Where>";
qry.Query = qry.Query + "<OrderBy UseIndexForOrderBy='TRUE' Override='TRUE' />";
qry.RowLimit = 200;
int page = 1;
int count = 0;
do {
    SPListItemCollection collListItems = list.GetItems(qry);
    foreach (SPListItem oListItem in collListItems)  {
        count++; //Do stuff with the item
    }
    qry.ListItemCollectionPosition = collListItems.ListItemCollectionPosition;
    page++;
} while (qry.ListItemCollectionPosition != null);

This example will work for situations where there are more than 5000 items that would be returned by the query. Instead of getting them all at once, we retrieve 200 at a time, and keep getting batches of 200 while there are more items to get!

Note the curious clause – this is the same as the value in the ContentIterator.ItemEnumerationOrderByNVPField constant. Also note that the Colour column is indexed. Without both these things being true, the query doesn’t work.

Option 4 – The Content Iterator

The Content Iterator is another way of getting items either in batches (as shown above) or as individual items, and it is worth a blog post of it’s own. See Part 3.

Advertisements
Dealing with Large Lists Part 2 – Coping with large queries

2 thoughts on “Dealing with Large Lists Part 2 – Coping with large queries

  1. clinton says:

    Hi Andrew,

    Thanks for your wonderful post on list throttling. I was getting a hard time from my company on the same issue and this helped me understand a lot.

    My problem is also the same but i am using Javascript Object Model JSOM to retrieve data from a large list and display in a user friendly manner. I have done all the index and the display results are fine if total result is less than 5000.

    I am unable to create compound index.

    I am trying to follow your [Option 3 – Page your results]. But unfortunately it is c# code. Is there any JS version of paging method where i can handle this scenario ?

    Your great suggestion is very much appreciated.

    Thanks,
    Clinton

    1. Hi Clinton,

      Sadly, no, I don’t think that there is. It’s a problem I’ve not been able to overcome, either. The best I can suggest is to actually use SharePoint’s Search service – but then you’ve latency to worry about (i.e. that changes to list items might not appear on the results that your web part gets for a while). But that’s what I’ve done in a recent project, and it does work pretty well – search latency accepted.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.