Extending Lucene's Scoring

21st November 2008

Lucene's tf-idf scoring algorithm is fast and effective and is undeniably one of the features that has made Lucene the most popular text search library around today. Not only does it provide really effective text ranking but it also allows us to provide boosts to different parts of the process. We can boost documents, fields and even query components. This is great when we know that particular documents or fields are more important than others at index time, with premium results or a title field for example. And boosting query components can be even more powerful. However, sometimes we need even more.

read more...

Effective Method Naming

30th October 2008

It sounds like a no brainer doesn’t it? I don’t need lessons on how to name methods, I’ve much more important stuff to think about, like actually coding! I know how you feel, I’m guilty of it myself. However I’ve found that the names given to methods in an application do have a profound effect on it’s readability, and ultimately it’s maintainability.

read more...

I've just made a Yahoo! Pipe pulling together a bunch of search related feeds that I read. Check it out at planet search pipe

read more...

The Carrot2 clustering engine has been on my radar for a couple of months now. It calls itself a 'search results clustering engine' which means that provided with a set of search results (titles and snippets) it will give back that same set grouped into clusters. In this post I'm going to show you how you can use Carrot2 and PHP to cluster your search results.

read more...

Today the first release of Forage is out. It comes with core support for Solr, Xapian and ZSL and faceting support for Solr. Faceting support for Xapian will hopefully be arriving with the release of Xapian 1.1 in the near future. You can download Forage here. In this post I'm going to walk through the main example found in the Forage source code, offline Wikipedia.

read more...

older posts >