Rob's Blog Coding, cooking and, well, not much else

6Oct/098

Chunking Large Queries with Iterators in PHP

When executing large queries it's usually best not to load the whole result set in one go. Memory isn't infinite and PHP isn't renowned for handling it very well. So the obvious answer is to chunk the large query in to lots of smaller queries. This is a simple method I use for hiding the fact the query is being chunked behind an iterator.

We want something to which we can just provide a PDO object, an SQL query and the chunk size. We should then be able to iterate over the resulting object as though it were a single result set. Check out the code for ChunkedQueryIterator.

$sql    = "SELECT * FROM test LIMIT :offset, :limit";
$limit  = 1000;
$result = new ChunkedQueryIterator($pdo, $sql, $limit);
foreach ($result as $row) {
  // do stuff
}

This works well but the interface is a little clunky. What would be really nice is if we could execute chunked queries directly from our PDO object, like in the example below. We can achieve this with another common design pattern, the decorator. We only need to add the one method used above, here's the source for ChunkedQueryDecorator.

$pdo = new ChunkedQueryDecorator($pdo);
foreach ($pdo->chunkedQuery($sql, $limit) as $row) {
  // do stuff
}

How do you handle large database queries? Let me know in the comments...

Comments (8) Trackbacks (2)
  1. should the offset not be a modifiable parameter? and what if you don’t want to set a limit and just retrieve all?

  2. Very good point, you may well want to set the offset. I can think of one use case off the bat; when resuming a stopped execution. With regard to not setting the limit, surely you just wouldn’t use the chunked query. If you’re suggesting adding limit and offset to as parameters of PDO::query(), then I disagree. I think in this case it’s much better to be explicit.

  3. Good idea, this is what SPL is for :)

    I’d probably add a little factory (to get rid of the new in a separate line) and add a default value for $limit.

  4. The Decorator and the Iterator are a nice example of good object-oriented programming in php (although adding some docblocks can help). Good work.

  5. I never use “SELECT * FROM”. I always specify exactly which fields I am going to need. This way I don’t have unused fields in my query, which depending on the table could account for a lot of wasted space.

  6. @Pawel Good point, I almost always work in chunks of 1000 so why not make that a default. With regard to the factory, the PDO decorator hides the instantiation. If you found yourself using a series of decorators on the PDO object it might be useful to wrap that up in a factory.

    @Giorgio I know, I’m being a bit lazy but it is just an example.

    @derby Yes, very true. I think it’s OK for an example though.

  7. Two things I have to say:
    1. Why do you prefer making multiple(costly) database calls and not use the result set from one query? The memory problem? You’re apache or sql server will fall before memory is even half full if you do something like that.

    2. And where in the hell are you using 1000 results in a page?? Ok you have a big table but you show that data in a paginated page with limits.. so no need of those expensive iterators..

    Sorry but I just don’t get where or in what situation will you extract and show 1000 results.. and believe me I worked on a lot of projects from ecommerce, vod and online communities and never encountered such a situation.

  8. Sorin, you’re absolutely right, doing this sort of query in a web page would be just plain silly. However, I quite often have to do this kind of thing in offline scripts; migrating data from one system to another, taking items from the index queue for our search server or processing log data. There are a lot times that data has to be accessed and processed outside of a web page.


Leave a comment