Introducing Forage - Search Abstraction for PHP
06th February 2008
Recently I've been working on a search abstraction library for PHP called Forage. The idea is to bring to search what we've had for relational databases for quite a while, abstraction. On Friday I put up a preview release with three backends; Solr, Xapian and Zend Search Lucene. At the moment it has the bare minimum of features but there will be more soon. In this post I'm going to talk a little about the motivation for the project and then walk through a short example.
So why do we need search abstraction?
The reasons for wanting an abstraction library for search are pretty much the same as for databases. Ease of integration and resilience to change.
Ease of integrationIf you have one interface which provides access to multiple backends then a framework (or other application) can use this interface and then allow the user to choose which backend to use depending on their needs and abilities. It also allows the users of the framework to scale their solutions as they grow, this is really the second point though.
Resilience to changeIf you have one interface which provides access to multiple backends then once you've implemented your solution you can change the backend if you need to. With relational databases this is rarely done but with search, certainly in PHP at the moment, there is a bit more of a need for it. Let's say you have a small site which does something cool. You need a search solution up and running very quickly without rocking the boat too much so you use ZSL and it works very well. However, your site starts to get more popular (as sites which do cool things do) and it starts to creak, you decide you need to scale up to a more capable solution such as Solr. If you're not using an abstraction layer, at this point you have to re-implement your search module. With Forage you just need to set up your Solr server and change the DSN from 'zsl:/path/to/index' to 'solr:host:port/path' and re-index. Job done!
Enough talk, let's play!
To show you how easy it is implementing search with Forage let's run through a little example. For this example I'm going to index some data out of an RSS feed. I'll be using Zend_Feed from the Zend Framework and for the backend to Forage I'm going to use Xapian. I'm just going to index all the items and then run a search over the index.
require_once 'Zend/Feed.php'; require_once 'Forage/Forage.php'; // import the feed $feed = Zend_Feed::import('http://rss.slashdot.org/Slashdot/slashdot'); // initialise forage $forage = Forage::create('xapian:/var/xapian/slashdot'); // iterate over the feed items foreach ($feed as $item) { // create a new document $document = new ForageDocument(); // add some fields to it $document->add('title', (string)$item->title()) // will be both indexed and stored // won't be indexed but will be stored ->add('link', (string)$item->link(), array('indexed'=>false)) // will be indexed but won't be stored ->add('description', (string)$item->content(), array('stored'=>false); // add the document to the index $forage->add($document); } // flush the changes to the index $forage->flush(); // search over the index $results = $forage->search('yahoo microsoft'); foreach ($results as $document) { echo $document['title'] . "\n"; } ?>
That's not bad is it? A feed indexing program in under 70 lines of code. If you're interested then get over to the Forage download page and give it a whirl, and if you can, get involved.

5 comments so far
Leave a reply