Chunking Large Queries with Iterators in PHP
When executing large queries it's usually best not to load the whole result set in one go. Memory isn't infinite and PHP isn't renowned for handling it very well. So the obvious answer is to chunk the large query in to lots of smaller queries. This is a simple method I use for hiding the fact the query is being chunked behind an iterator.
We want something to which we can just provide a PDO object, an SQL query and the chunk size. We should then be able to iterate over the resulting object as though it were a single result set. Check out the code for ChunkedQueryIterator.
$sql = "SELECT * FROM test LIMIT :offset, :limit";
$limit = 1000;
$result = new ChunkedQueryIterator($pdo, $sql, $limit);
foreach ($result as $row) {
// do stuff
}
This works well but the interface is a little clunky. What would be really nice is if we could execute chunked queries directly from our PDO object, like in the example below. We can achieve this with another common design pattern, the decorator. We only need to add the one method used above, here's the source for ChunkedQueryDecorator.
$pdo = new ChunkedQueryDecorator($pdo);
foreach ($pdo->chunkedQuery($sql, $limit) as $row) {
// do stuff
}
How do you handle large database queries? Let me know in the comments...
Extending Lucene’s Scoring
Lucene's tf-idf scoring algorithm is fast and effective and is undeniably one of the features that has made Lucene the most popular text search library around today. Not only does it provide really effective text ranking but it also allows us to provide boosts to different parts of the process. We can boost documents, fields and even query components. This is great when we know that particular documents or fields are more important than others at index time, with premium results or a title field for example. And boosting query components can be even more powerful. However, sometimes we need even more.
Sometimes we need to affect a document's score with an external variable. Take blog or news search for example, we want the documents to be scored by relevancy, obviously, however we would also like a document's age have an effect. It isn't really possible to achieve this with just boosts, so what can we do?
As of Lucene 2.2 there's been a little documented package that can be really helpful here. org.apache.lucene.search.function allows us to build queries that affect the score of a document in ways that we define. By the end of this post we're going to have a simple score modifier that takes a document's age into account.
Jiles van Gurp wrote a great post a few months back about using FieldScoreQuery and CustomScoreQuery to bring a documents age into play. FieldScoreQuery is used to interpret a field as a float and use it to derive a score. We can create a simple 'agescore' field of the form "0.yyyyMMddhhmm" and use the FieldScoreQuery to give newer documents a higher score.
TermQuery termQuery = new TermQuery(new Term("title", "foo"));
FieldScoreQuery scoreQuery = new FieldScoreQuery("agescore", FieldScoreQuery.Type.FLOAT);
Query query = new CustomScoreQuery(termQuery, scoreQuery);
In the example above we combine a simple TermQuery with our FieldScoreQuery using the CustomScoreQuery.
Although this model works well it's not perfect. The difference between a document that's a week old and one that's two weeks old should not be the same as the difference between a document that's a year old and one that's a year and a week old. We could improve on this model by basing the derived score on the log of the document's age. To achieve this we need to dive a little deeper into the function package. FieldScoreQuery is based on the class ValueSourceQuery, this class gets values for documents using a ValueSource. In the example below we create a custom ValueSource similar to the ones used by FieldScoreQuery, however, rather than just returning the value we return a weight based on the reciprocal of the log of the documents age.
public class AgeFieldSource extends FieldCacheSource {
private int now;
public AgeFieldSource(String field) {
super(field);
now = (int)(System.currentTimeMillis() / 1000);
}
@Override
public boolean cachedFieldSourceEquals(FieldCacheSource other) {
return other.getClass() == MyFieldSource.class;
}
@Override
public int cachedFieldSourceHashCode() {
return Integer.class.hashCode();
}
@Override
public DocValues getCachedFieldValues(FieldCache cache, String field, IndexReader reader) throws IOException {
int[] times = cache.getInts(reader, field);
float[] weights = new float[times.length];
for (int i=0; i<times.length; i++) {
// Here be the nuts and bolts
weights[i] = new Double(1/Math.log((now - times[i]) / 3600)).floatValue();
}
final float[] arr = weights;
return new DocValues() {
public float floatVal(int doc) {
return (float) arr[doc];
}
public int intVal(int doc) {
return (int) arr[doc];
}
public String toString(int doc) {
return description() + '=' + intVal(doc);
}
};
}
}
And the example below shows how we can use this class.
TermQuery termQuery = new TermQuery(new Term("title", "foo"));
ValueSourceQuery valueQuery = new ValueSourceQuery(new AgeFieldSource("created"));
CustomScoreQuery query = new CustomScoreQuery(termQuery, valueQuery);
This model seems to work pretty well but your mileage may vary. I'd love to head other peoples experiences with this package and thoughts on this solution in the comments.
Effective Method Naming
It sounds like a no brainer doesn’t it? I don’t need lessons on how to name methods, I’ve much more important stuff to think about, like actually coding! I know how you feel, I’m guilty of it myself. However I’ve found that the names given to methods in an application do have a profound effect on it’s readability, and ultimately it’s maintainability.
In an effort to keep my method naming standards up to scratch I’ve come up with three rules. They aren’t the be all and end all of method naming and there are exceptions but as general rules they seem to work pretty well.
A method should do what it says it does and nothing else
The aim of this rule is twofold, to avoid confusion and to avoid surprise. Both of those come back to one point, the goal of every lazy developer, think as little as possible.
To avoid confusion the method name should be explicit, it should say exactly what it’s method is going to do, not a vague impression. It should also not contain abbreviations unless they’re universally understood; qFetch() is much less obvious than fetchQueueItem()
To avoid surprise the method should not do things that aren’t stated or implied by the method name. This applies even to larger methods that do a lot, you don’t need to name the method after every little thing that’s done but name it after the whole process. If you’re pulling together lots of smaller methods don’t just name the method doStuff(), give it a name that describes the reason they’ve been pulled together. If you can’t think of one, maybe they should be pulled together in a different way.
A method name should not be longer than it has to be
This is the flip side of the previous rule. We want the method name to be as clear and explicit as possible but at the same time, we don’t want to be inundated with unmanageably long method names. A method name should describe its function but not define it. I think the old database design couplet describes the relationship quite well. Normalize ‘till it hurts, de-normalize ‘till it works.
Method names should follow an application wide standard
This is a rule I’ve seen broken a lot. Never by myself though, because my convention is the clearest, the most universally understood and of course, the most bestest. Yes, conventions are always a tricky (religious) area but they are important. It’s not important whether you fetch stuff from the database or retrieve it, so long as you do the same everywhere. And when I say everywhere I don’t just mean throughout this class, or package, or component, I mean throughout the entire application.
I’ve found these three rules really helpful when naming methods, especially those exposed in external interfaces. Before you make any changes though, remember Martin Fowler’s work on refactoring.
Playing with Carrot2 Clustering in PHP
The Carrot2 clustering engine has been on my radar for a couple of months now. It calls itself a 'search results clustering engine' which means that provided with a set of search results (titles and snippets) it will give back that same set grouped into clusters. In this post I'm going to show you how you can use Carrot2 and PHP to cluster your search results.

You can see a demo of Carrot2 running on their site which clusters results from a variety of popular search engines. The online demo, however, isn't what interests me about this project, there are other, better, clustering search engines out there like Clusty. What I'm interested in is the fact that Carrot2 is an open source project which can be downloaded and run locally. For integrating with PHP it is probably easiest to use the Document Clustering Server available from the download page. The DCS exposes the interface over REST or XML-RPC. Although there is an example of using RPC with PHP, for this post I'm going to integrate using the REST interface.
Setting Up The Document Clustering Server
Before I get any further I should point out that you need to have a recent version of Java, I'm using Java 1.5. The first step to getting the DCS up and running is downloading it from the download page and unpacking it. The zip file doesn't contain a top level directory so you'll need to unpack it into one. Once you have it unpacked start up the server as directed in the readme.txt and test with the index.html found in the root directory.

I've knocked together a quick and dirty class for interacting with the REST interface to show you how easy Carrot2 is to work with. The example below is much like the Forage faceting example. A bunch of feeds are downloaded and fed into Carrot2, then we show a summary of the first 5 clusters.
<?php
require 'Carrot2.class.php';
require 'Zend/Feed.php';
$urls = array(
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/business/rss.xml',
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/entertainment/rss.xml',
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/education/rss.xml'
);
$carrot = Carrot2::createDefault();
foreach ($urls as $url) {
echo "Loading field: " . $url . "\n";
$feed = Zend_Feed::import($url);
foreach ($feed as $item) {
$carrot->addDocument(
(string)$item->link(),
(string)$item->title(),
(string)$item->description()
);
}
}
$i=0;
foreach ($carrot->clusterQuery() as $cluster) {
if ($i++>5) break;
echo $cluster . "\n";
}
?>
Gives:
Loading field: http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/business/rss.xml
Loading field: http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/entertainment/rss.xml
Loading field: http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/education/rss.xml
Schools (9)
Head Teachers (6)
Funds (3)
Language (2)
Gold Hits (2)
Savings Scheme (2)
It seems to work really well for these feeds. I have found that it doesn't always work so well. If you're dealing with small result sets or sets with too much diversity Carrot2 won't be able to extract clusters and you'll end up with a few clusters with only one item in and loads of documents in the '(other)' group.
Forage 0.1 Released – Offline Wikipedia Example
Today the first release of Forage is out. It comes with core support for Solr, Xapian and ZSL and faceting support for Solr. Faceting support for Xapian will hopefully be arriving with the release of Xapian 1.1 in the near future. You can download Forage here. In this post I'm going to walk through the main example found in the Forage source code, offline Wikipedia.
This example is a bit more involved than the previous examples as it shows the need for using backends other than ZSL when dealing with large sets of data. To run these examples you're going to need between 15Gb and 30Gb of disk space, the uncompressed wikipedia download alone is 14Gb. You're also going to need a computer which you can leave chugging away for a few hours at a time, we're dealing with some pretty serious amounts of data here.
The Wikipedia Example
The example I'm going to walk through today is to produce an offline, vertical Wikipedia search engine for a particular field. You could do a full Wikipedia search engine but that would take quite a bit more time and space. As you may know, I'm also quite interested in food and cooking so my vertical is going to be food related, but we'll come to that later.
Step 1. Getting Wikipedia
The first step in indexing Wikipedia is to download it's entire content. Go to the Wikipedia data dump download page and have a look at what's there, the one we'll be using is enwiki-latest-pages-articles.xml.bz2. This file contains all Wikipedia article and redirect pages in an easy to parse XML format. This is a big file (the one I downloaded was 3.2Gb) so don't even try to download it through your browser, use something like wget, also make sure the tool you use has the ability to restart downloads, you don't want to get 80% through the download and then have to restart. With wget it's as easy shown below.
rob@home:~/wikipedia# wget -c http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
This will take a while, even if you have a fast connection Wikipedia won't serve it up at lightning speeds, I left it over night and it was done in the morning.
Once you have the XML dump of Wikipedia you need to unpack it and prepare it. It is compressed with bzip2 so you will need to do unpack it, which again, is as easy as calling bunzip2.
rob@home:~/wikipedia# bunzip enwiki-latest-pages-articles.xml.bz2
This will take a while and hammer the CPU so go make a cup of tea and read the paper for a few minutes...
Once it has unpacked you will have one enormous XML file (about 14Gb). Unless you have configured PHP specifically it will not be able to handle files this big (if you have then you probably know about it so you can skip to the next section). The easiest solution to this problem is to just chop the file up into lots of smaller files, UNIX provides a very good tool for this, split. We need files less than 1Gb so let's split it into 900Mb chunks. The following command will give us 15-16 files named enwiki-chunks-aa, enwiki-chunks-ab, etc.
rob@home:~/wikipedia# split --bytes=900m enwiki-latest-pages-articles.xml enwiki-chunks-
Step 2. Preparing The Data
Now that we have our Wikipedia data in a form that PHP can read we can move to the Forage install directory and start playing with the example. Change directory to the examples/wikipedia directory of your Forage install. First we want to filter down the full Wikipedia dump to our vertical, this is done with the writer.php script. If you want to index Wikipedia in it's entirety you can skip straight to the Indexing section This script will parse the split Wikipedia dump extracting articles by their category and inserting those which match your patterns into a stripped down xml file. You can get a brief manual of this script with the following command.
rob@home:~/forage/examples/wikipedia# php writer.php --help
For this script to work you need to have a categories.lst file containing PCRE regular expressions, one per line. The categories.lst I'm using is in the Forage release and looks like this.
#food#i #cook#i #recipe#i #meat#i #vegetable#i #poultry#i
So, for me to run this script with verbose output and write the output to the same path as the wikipedia dump I would use the following command. Which will give you a much more manageable XML file containing just the data we're going to index on a subset of the articles in the main dump. This will take a seriously long time, it has to parse over 4 million documents, extract their categories and match them against the set of regular expressions before deciding whether or not it should go into the vertical search. When I ran it on my laptop it took almost 9 hours!
rob@home:~/forage/examples/wikipedia# php writer.php --verbose --categories=./categories.lst --target=~/wikipedia/vertical-food.xml ~/wikipedia/enwiki-chunks*
Step 4: Indexing
At this point we have a stripped down XML document containing Wikipedia articles for our chosen vertical. Next step is to get them into an index so that we can search over them. For this example I'm going to use Solr because it is the only backend to support faceting at the moment. I will not discuss the installation and configuration of Solr beyond providing the required schema.xml. To run the indexer we need to use the provided parser which works the same way as writer.php but rather than writing to another XML file it writes to an index through Forage. The following command will index our newly created food vertical with Solr (we have to use source='mini' because writer.php uses a stripped XML format, if we were indexing straight from the raw Wikipedia dump we would use source='full').
rob@home:~/forage/examples/wikipedia# php parser.php --verbose --index='solr:127.0.0.1:8080' --source='mini' ~/wikipedia/vertical-food.xml
This is pretty damned quick actually, on my little laptop again, it managed to index almost 8000 documents just under 4 minutes, that works out at about 35 documents per second.
Step 5: Searching
This tutorial wouldn't be much if it didn't get on to searching so here we are, we have extracted almost 8000 articles from Wikipedia which we are interested in and we have indexed them, via Forage, in Solr, now it's time to search over them. Let's start off with a simple query for biscuits.
rob@home:~/forage/examples# php searcher.php --index='solr:127.0.0.1:8080' --limit=5 biscuit Broken biscuits (2.7441695) Bourbon biscuit (2.553938) Empire biscuit (2.4821944) Malted milk (biscuit) (2.3644874) Category:United Biscuits brands (2.3402355)
Comes back in no time at all! There are a couple of other features shown in the examples, let's take a look at them now. We can sort the results by something other than the score, in the Solr schema.xml we've got a special field to help sorting by title called sort_title.
rob@home:~/forage/examples# php searcher.php --index='solr:127.0.0.1:8080' --limit=10 --sortasc=sort_title biscuit 2007 pet food recalls (0.24132447) 5-in-1 ration (0.5515988) AFC Enterprises (0.6894985) ANZAC biscuit (2.1496434) Alfajor (0.8359725)
Finally we're going to take a look at the faceting example. With the Wikipedia data we're faceting on the category that appears at the bottom of all Wikipedia pages using the example script faceting.php. Let's look at a quick default example with our search term, biscuit. Note that this example doesn't output the actual document titles because it would clutter the output but they are still available in the ForageResponse object as in all the other examples.
rob@home:~/forage/examples# php faceting --index='solr:127.0.0.1:8080' biscuit Querying for: biscuit Total: 262 Cookies (38) Brand name cookies (27) British snack foods (23) Food manufacturers of the United Kingdom (14) Filtering on 'Cookies' Total: 38 Christmas food (4) Scottish cuisine (3) Australian snack foods (2) Brand name cookies (2) Filtering on 'Christmas food' and 'Cookies' Total: 4 Austrian cuisine (1) Danish cuisine (1) Dutch cuisine (1) French desserts (1)
You can see here that the first search for 'biscuit' returned a total of 262 documents, and the top four facets are shown. We then take the first category and filter on that which reduces the result set to 38 documents. We then take the top category again and add it to our filter and we now only have 4 documents. A perfect example of how faceting can allow us to drill down through large number of results really quickly.
I hope you enjoyed this little walk through a real-world example of using Forage and it's features. This example is bundled with the release so, with a few minutes work and a few hours waiting for your computer you could have your own offline vertical Wikipedia search engine, and have learnt how to implement search on your site to boot!
Faceted Search With Forage
Today I added support for faceted search to Forage. Faceted search is a way of drilling down into search results by filtering on particular fields or categories. The example below is relatively simple in that it only has one facet, category, but there is no reason why you can't have multiple different 'facets'. This is, in fact, done quite often and very successfully in product searches.
And on to the example
This example is an extension of the previous example, only this time we'll be using one of the BBC's news feeds. The reason we're going to use BBC's new feeds this time is that they have an extra element in the feed, category. We're going to use this category as a facet field and then list the most popular categories in the feed.
<?php
require_once dirname(__FILE__) . '/../lib/Forage.php';
require_once 'Zend/Feed.php';
$feed = Zend_Feed::import('http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml');
// we're now using Solr as the engine, more about this later...
$forage = Forage::create('solr:127.0.0.1:8080');
foreach ($feed as $item) {
$document = new ForageDocument();
$document->add('title', (string)$item->title())
->add('link', (string)$item->link(), array('indexed'=>false))
->add('description', (string)$item->content(), array('stored'=>false))
// we mark the facet field as such at index time
->add('category', (string)$item->category(), array('facet'=>true));
$forage->add($document);
}
$forage->flush();
// create an empty query and tell it that we want to
// facet on 'category' before searching
$query = $forage->getQuery();
$query->setFacetFields(array('category'));
$response = $forage->search($query);
echo "Total Results: " . $response->getTotal() . "\n";
// get the category facet from the response
$facet = $response->getFacet('category');
$i=0;
// loop over the category values showing the top three
// along with the number of documents it apears in.
foreach ($facet->values as $value) {
if ($i++>3) {
break;
}
echo $value->value . " (" . $value->count . ")\n";
}
At the moment it's only supported by the Solr engine but Xapian will be adding faceting support in version 1.1. You can check the code out of subversion and more detailed documentation is in the wiki.
Introducing Forage – Search Abstraction for PHP
Recently I've been working on a search abstraction library for PHP called Forage. The idea is
to bring to search what we've had for relational databases for quite a while, abstraction. On Friday I put up a preview release with three
backends; Solr, Xapian and Zend Search Lucene. At the moment it has the bare minimum of features but there will be more soon. In this post
I'm going to talk a little about the motivation for the project and then walk through a short example.
So why do we need search abstraction?
The reasons for wanting an abstraction library for search are pretty much the same as for databases. Ease of integration and resilience to change.
Ease of integration
If you have one interface which provides access to multiple backends then a framework (or other application) can use this interface and then allow
the user to choose which backend to use depending on their needs and abilities. It also allows the users of the framework to scale their solutions
as they grow, this is really the second point though.
Resilience to change
If you have one interface which provides access to multiple backends then once you've implemented your solution you can change the backend if you
need to. With relational databases this is rarely done but with search, certainly in PHP at the moment, there is a bit more of a need for it. Let's
say you have a small site which does something cool. You need a search solution up and running very quickly without rocking the boat too much so
you use ZSL and it works very well. However, your site starts to get more popular (as sites which do cool things do) and it starts to creak,
you decide you need to scale up to a more capable solution such as Solr. If you're not using an abstraction layer, at this point you have to
re-implement your search module. With Forage you just need to set up your Solr server and change the DSN from 'zsl:/path/to/index' to 'solr:host:port/path'
and re-index. Job done!
Enough talk, let's play!
To show you how easy it is implementing search with Forage let's run through a little example. For this example I'm going to index some data
out of an RSS feed. I'll be using Zend_Feed from the Zend Framework and for the backend
to Forage I'm going to use Xapian. I'm just going to index all the items and then run a search over the index.
<?php
require_once 'Zend/Feed.php';
require_once 'Forage/Forage.php';
// import the feed
$feed = Zend_Feed::import('http://rss.slashdot.org/Slashdot/slashdot');
// initialise forage
$forage = Forage::create('xapian:/var/xapian/slashdot');
// iterate over the feed items
foreach ($feed as $item) {
// create a new document
$document = new ForageDocument();
// add some fields to it
$document->add('title', (string)$item->title()) // will be both indexed and stored
// won't be indexed but will be stored
->add('link', (string)$item->link(), array('indexed'=>false))
// will be indexed but won't be stored
->add('description', (string)$item->content(), array('stored'=>false);
// add the document to the index
$forage->add($document);
}
// flush the changes to the index
$forage->flush();
// search over the index
$results = $forage->search('yahoo microsoft');
foreach ($results as $document) {
echo $document['title'] . "\n";
}
That's not bad is it? A feed indexing program in under 70 lines of code. If you're interested then get over to the
Forage download page and give it a whirl, and if you can, get involved.
Playing with Project Euler and PHP – Part Two
This is the second instalment of me playing with Project Euler and PHP, you can check out problem one
here. In this post I'll be using the same format as before, three different
solutions to the same problem, a simple loop oriented one, a loopless one and an over engineered, object oriented one.
As you can see from the Problem link above, in the Fibonacci sequence, the next number is the sum of previous two numbers. This is very easy to do with
a loop, we just provide it with the first two numbers and continue with the loop until the maximum is reached, keeping track of the sum of even terms as
we go. One piece of code which may not be familiar here is the variable swapping, to avoid using a temporary variable I have chosen to use
list and array.
<?php
$target = 1000000;
$sum = 0;
for ($one=1, $two=2; $two<$target;) {
list($one, $two) = array($two, $one+$two);
if ($one%2==0) {
$sum += $one;
}
}
echo $sum . "\n";
?>
The second solution to this problem is the loopless one. This is actually one that a lot of you will be familiar with as it's one of those
sequences which is often used as an example of recursion; exactly what I'm going to use it for here. It's essentialy the same process as in the first
solution. I have used the same variable names to make it easier to see what it's doing.
<?php
function fibonacci($one, $two, $target, $sum=0)
{
if ($two>=$target) {
return $sum;
}
if ($two%2==0) {
$sum += $target;
}
return fibonacci($two, $one+$two, $target, $sum);
}
echo fibonacci(1, 2, 1000000) . "\n";
?>
The third solution is the over engineered one, for this solution I'm going to use some of the same classes and interfaces used for the first problem.
Specifically I'll be using the FactorCondition and SumAction classes and the
Action and Condition interfaces. This problem doesn't lend itself to an
OO solution as well as problem one, so I'm going to have to handle this slightly differently. The executor
is going to accept two actions, one to sum values (as in the first problem) and one to encapsulate the Fibonacci sequence; and two conditions, one
to control when to stop and one to control the Fibonacci action and two conditions, one to control the fibonacci action. Let's get down to
implementing the components. I will not show the code for previously implemented classes or interfaces.
<?php
class FibonacciAction implements Action
{
private $one = 1;
private $two = 2;
public function execute($value)
{
list($this->one, $this->two) = array($this->two, $this->one+$this->two);
}
public function result()
{
return $this->two;
}
}
class LimitCondition implements Condition
{
private $limit;
public function __construct($limit)
{
$this->limit = $limit;
}
public function test($value)
{
return $this->limit>$value;
}
}
?>
Once we have all the components we need to develop the executor. This executor is a little more flexible than the one used for project one as it
seperates the stop condition from the action condition and seperates the two actions out. Hopefully this model will be more useful with later problems.
<?php
class FibonacciAction implements Action
{
private $one = 1;
private $two = 2;
public function execute($value)
{
list($this->one, $this->two) = array($this->two, $this->one+$this->two);
}
public function result()
{
return $this->two;
}
}
class LimitCondition extends Condition
{
private $limit;
public function __condition($limit)
{
$this->limit = $limit;
}
public function test($value)
{
return $this->limit>$value;
}
}
?>
Finally all there is to do is tie everything together and execute it.
<?php $executor = new Executor(); $executor->setStopper(new LimitCondition(1000000)); $executor->setCondition(new FactorCondition(2)); $executor->setAction(new SumAction()); $executor->setAlgorithm(new FibonacciAction()); echo $executor->execute() . "\n"; ?>
So there we have it, three more solutions to one of the Euler Project's maths problems.
Playing with Project Euler and PHP
I recently discovered Project Euler and fell in love. It is absolutely brilliant, concise, interesting and often fiendishly difficult mathematical problems which can be solved programatically. A little later I heard about Jorge Ortiz's post using the Project Euler problems to introduce functional programming techniques in Scala, which got me wondering how many different ways I could solve some of these problems with PHP.
So far I've tackled the problems in three different ways, a traditional PHPesque way, probably the most familiar to most PHP developers; a loopless way, using functional programming paradigms; and finally an absurdly over-engineered solution.
Problem 1: Add all the natural numbers below 1000 that are multiples of 3 or 5.
This is a pretty easy problem all round, in a traditional 'loopy' way it might look like this. We just loop through all the numbers adding those which are factors of 3 or five to the sum.
<?php
$sum = 0;
for ($i=1; $i<100000; $i++) {
if ($i%3==0 || $i%5==0) {
$sum+=$i;
}
}
echo $sum . "\n";
Doing this without loops, although not immediately obvious is also quite easy to understand. We create a range, fitler out any elements which aren't factors of 3 or 5 and sum those which are left.
<?php
$sum = array_sum(
array_filter(
range(1, 99999),
create_function('$i', 'return $i%3==0 || $i%5==0;')
)
);
echo $sum . "\n";
[php]
<p>Here's another loopless solution using array reduce and a function which keeps track of the sum.</p>
[php]<?php
$sum = array_reduce(
range(1, 99999),
create_function('$sum, $i', 'return ($i%3==0 || $i%5==0) ? $sum+$i : $sum;')
);
echo $sum . "\n";
OK, now that we've done two sensible (and hopefully interesting ones) I think it's time for something a little tounge-in-cheek. Yes, it's time for the over-engineered solution. This solution is based around three interfaces, Iterator (from the SPL), Condition and Action.
<?php
interface Condition
{
public function test($value);
}
interface Action
{
public function apply($value);
public function result();
}
So, first we need a basic Iterator to generate our numbers (from 1 to 100000 in increments of 1).
<?php
class LinearGenerator implements Iterator
{
private $start, $end, $increment;
private $current;
public function __construct($start, $end, $increment)
{
$this->start = $start;
$this->end = $end;
$this->increment = $increment;
}
public function current()
{
return $this->current;
}
public function key()
{
return $this->current;
}
public function next()
{
$this->current += $this->increment;
}
public function rewind()
{
$this->current = $this->start;
}
public function valid()
{
return isset($this->current) && $this->current <= $this->end;
}
}
Next we need to sort out our conditions. This is a little more complex as we need two different types of condition, one for testing whether a number is a multiple of of another number and one for tying together two other conditions in an 'OR'.
<?php
class FactorCondition implements Condition
{
private $factor;
public function __construct($factor)
{
$this->factor = $factor;
}
public function test($value)
{
return $value % $this->factor == 0;
}
}
class OrCondition implements Condition
{
private $conditions = array();
public function add(Condition $condition)
{
$this->conditions[] = $condition;
}
public function test($value)
{
foreach ($this->conditions as $condition) {
if ($condition->test($value)) {
return true;
}
}
return false;
}
}
Almost there now, next we need the component which will keep track of the sum so far, the Action.
<?php
class SumAction implements Action
{
private $sum = 0;
public function apply($value)
{
$this->sum += $value;
}
public function result()
{
return $this->sum;
}
}
Finally we need something to pull it all together, the Executor.
<?php
class Executor
{
private $generator, $condition, $action;
public function setGenerator(Iterator $generator)
{
$this->generator = $generator;
}
public function setCondition(Condition $condition)
{
$this->condition = $condition;
}
public function setAction(Action $action)
{
$this->action = $action;
}
public function execute()
{
foreach ($this->generator as $value) {
if ($this->condition->test($value)) {
$this->action->apply($value);
}
}
return $this->action->result();
}
}
Now that we've put together our enterprise-ready, future-proof, dynamic solution; let's use it.
<?php $executor = new Executor(); $executor->setGenerator(new LinearGenerator(1, 1000, 1)); $condition = new OrCondition(); $condition->add(new FactorCondition(3)); $condition->add(new FactorCondition(5)); $executor->setCondition($condition); $executor->setAction(new SumAction()); echo $executor->execute();
Have you got any other solutions to the problem?










