<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Captain Codeman&#187; search</title>
	<atom:link href="http://www.captaincodeman.com/tag/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.captaincodeman.com</link>
	<description>Software Developer</description>
	<lastBuildDate>Fri, 15 Jul 2011 22:50:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Running ElasticSearch as a Service on Windows 2008 x64</title>
		<link>http://www.captaincodeman.com/2011/05/20/elasticsearch-windows-service-2008-x64/</link>
		<comments>http://www.captaincodeman.com/2011/05/20/elasticsearch-windows-service-2008-x64/#comments</comments>
		<pubDate>Fri, 20 May 2011 15:57:14 +0000</pubDate>
		<dc:creator>Captain Codeman</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[elasticsearch]]></category>
		<category><![CDATA[full-text]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[procrun]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[windows]]></category>
		<category><![CDATA[x64]]></category>

		<guid isPermaLink="false">http://www.captaincodeman.com/2011/05/20/elasticsearch-windows-service-2008-x64/</guid>
		<description><![CDATA[How to run ElasticSearch as a Windows Service on Windows 2008 x64


No related posts.]]></description>
			<content:encoded><![CDATA[<p>I think I first started using <a href="http://lucene.apache.org/java/docs/index.html">Apache Lucene</a> for full-text indexing as part of NHibernate Search. At some point I decided I needed more control and did my own indexing using Lucene directly. Now, it seems the easiest approach is to make use of a packaged up search service and so I’ve been looking at <a href="http://www.elasticsearch.org/">ElasticSearch</a>. So far, I’m very happy with it – it’s doing everything it say’s on the box and lets me offload all the full-text indexing and search functionality.</p>
<p>The only issue I’ve come across is trying to run it as a service on 64-bit Windows 7 or Windows 2008. While there is a <a href="https://github.com/elasticsearch/elasticsearch-servicewrapper">service-wrapper</a> available it just wasn’t working for me and I think the x64 platform may be part of that as there was only a elasticsearch-windows-x86-<strong><span style="text-decoration: underline;">32</span></strong>.exe included, no elasticsearch-windows-x86-<strong><span style="text-decoration: underline;">64</span></strong>.exe. This service wrapper seems to be based off a product that doesn’t appear to have a free community edition for 64-bit Windows.</p>
<p>So, I had a hunt around for ‘how to run a Java app as a Windows Service’ and came across the <a href="http://commons.apache.org/daemon/procrun.html">Apache Commons Daemon</a> or ‘<a href="http://commons.apache.org/daemon/procrun.html">procrun</a>‘. This worked so I thought I’d share it here in case anyone else is trying to do the same thing.</p>
<p>First of all, there are the pre-requisites: it’s a Java app so you need to have the Sun Java SDK installed and JAVA_HOME environment variable set.</p>
<p><a href="http://www.elasticsearch.org/download">Download ElasticSearch</a> and extract it to a folder. I’m using 0.16.0 and put it into D:\elasticsearch (because Program Files and UAC caused too many issues for me).</p>
<p>Before trying to set it to run as a service it’s best to make sure it runs as a regular app first. To start ElasticSearch on Windows there is a “bin\elasticsearch.bat” file to launch it which should show it running. As an extra check, there is a handy little web-based admin tool you can get called <a href="https://github.com/mobz/elasticsearch-head">elasticsearch-head</a> which will show the running status and provides a neat little browser / search interface. I extract this to D:\elasticsearch\tools. When you open the index.html file it lets you connect to your elasticsearch instance and show it’s status:</p>
<p><a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/created.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="created" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/created_thumb.png" border="0" alt="created" width="640" height="409" /></a></p>
<p>Downloading the Apache Commons Daemon or procrun is a little harder because it isn’t in the links on the <a href="http://commons.apache.org/daemon/download_daemon.cgi">download page</a>. Instead you need to follow the ‘browse native binaries download area…’ link, then look in the windows folder for the zip file. The file I used was: <a title="http://apache.skazkaforyou.com//commons/daemon/binaries/1.0.5/windows/commons-daemon-1.0.5-bin-windows.zip" href="http://apache.skazkaforyou.com//commons/daemon/binaries/1.0.5/windows/commons-daemon-1.0.5-bin-windows.zip">commons-daemon-1.0.5-bin-windows.zip</a></p>
<p>Extract this to D:\elasticsearch\service and then copy the amd64\prunsrv.exe to the D:\elasticsearch\service folder to replace the x86 version (or skip this step if you are actually running on a 32-bit OS).</p>
<p>Although we can set everything up with the exe files as they are, we’re going to rename them because it makes it clearer what is running on Windows Task Manager if you have other processes using this service runner. The convention is to use the service name and append a ‘w’ to the GUI manager exe so they become:</p>
<p>prunsvr.exe =&gt; ElasticSearch.exe<br />
prunmgr.exe =&gt; ElasticSearchw.exe</p>
<p>Because we’ll be running things as a service it will be running under a different account than the regular process does when we run it interactively. I used the ‘NETWORK SERVICE’ account which is able to handle network traffic and gave this account full permissions to the D:\elasticsearch folder so it will also be able to create data and log files.</p>
<p>Figuring out the command line to actually run the service is what took the longest. With a bit of trial and error and looking at the output from the batch file to launch elasticsearch I ended up with this which ‘works on my machine’. If it doesn’t work on yours try enabling the echo output from the batch file and checking the parameters are the same.</p>
<p>It’s easiest to put this into a create.cmd file to make editing and running it easier:</p>
<pre>ElasticSearch.exe //IS//ElasticSearch --DisplayName="ElasticSearch" --Description="Distributed RESTful Full-Text Search Engine based on Lucene (http://www.elasticsearch.org/) --Install=D:\elasticsearch\service\ElasticSearch.exe --Classpath="D:\elasticsearch\lib\elasticsearch-0.16.0.jar;D:\elasticsearch\lib\*;D:\elasticsearch\lib\sigar\*" --Jvm="C:\Program Files\Java\jre6\bin\server\jvm.dll" --JvmMx=512 --JvmOptions="-Xms256m;-Xmx1g;-XX:+UseCompressedOops;-Xss128k;-XX:+UseParNewGC;-XX:+UseConcMarkSweepGC;-XX:+CMSParallelRemarkEnabled;-XX:SurvivorRatio=8;-XX:MaxTenuringThreshold=1;-XX:CMSInitiatingOccupancyFraction=75;-XX:+UseCMSInitiatingOccupancyOnly;-XX:+HeapDumpOnOutOfMemoryError;-Djline.enabled=false;-Delasticsearch;-Des-foreground=yes;-Des.path.home=D:\elasticsearch" --StartMode=jvm --StartClass=org.elasticsearch.bootstrap.Bootstrap --StartMethod=main --StartParams="" --StopMode=jvm --StopClass=org.elasticsearch.bootstrap.Bootstrap --StopMethod=main --StdOutput=auto --StdError=auto --LogLevel=Debug --LogPath="D:\elasticsearch\logs" --LogPrefix=service --ServiceUser="NT AUTHORITY\NetworkService" --Startup=auto</pre>
<p>Phew !</p>
<p>Running that should create the service and running the ElasticSearchw.exe should how pop-up a GUI that lets us view and edit all the settings. The various tabs are shown below and should correspond to the settings defined above:</p>
<p><a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/1-general.png"><img style="background-image: none; margin: 10px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="1-general" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/1-general_thumb.png" border="0" alt="1-general" width="240" height="230" /></a> <a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/2-logon.png"><img style="background-image: none; margin: 10px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="2-logon" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/2-logon_thumb.png" border="0" alt="2-logon" width="240" height="230" /></a></p>
<p><a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/3-logging.png"><img style="background-image: none; margin: 10px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="3-logging" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/3-logging_thumb.png" border="0" alt="3-logging" width="240" height="230" /></a>\<a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/4-java.png"><img style="background-image: none; margin: 10px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="4-java" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/4-java_thumb.png" border="0" alt="4-java" width="240" height="230" /></a></p>
<p><a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/5-startup.png"><img style="background-image: none; margin: 10px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="5-startup" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/5-startup_thumb.png" border="0" alt="5-startup" width="240" height="230" /></a> <a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/6-shutdown.png"><img style="background-image: none; margin: 10px; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="6-shutdown" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/6-shutdown_thumb.png" border="0" alt="6-shutdown" width="240" height="230" /></a></p>
<p>You can also have the GUI run as a task-tray which gives you a handy way to start and stop the service while you’re developing. To do this, create a monitor.cmd file with the following command:</p>
<pre>start ElasticSearchw.exe //MS</pre>
<p>You should be able to right-click on the new tray icon and start the service:</p>
<p><a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/starting.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="starting" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/starting_thumb.png" border="0" alt="starting" width="499" height="188" /></a></p>
<p>This isn’t mandatory though – the service should appear in the normal Windows Service Manager where it can be stopped and started as usual:</p>
<p><a href="http://www.captaincodeman.com/wp-content/uploads/2011/05/windows-services.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="windows-services" src="http://www.captaincodeman.com/wp-content/uploads/2011/05/windows-services_thumb.png" border="0" alt="windows-services" width="640" height="445" /></a></p>
<p>Whether everything starts or not, you should get some useful information written to the log files. Here’s how mine looked after the service was started successfully.</p>
<p>service.2011-05-19.log:</p>
<pre style="font-size: 9px;">[2011-05-19 10:21:30] [debug] ( prunsrv.c:1494) Commons Daemon procrun log initialized
[2011-05-19 10:21:30] [info]  (          :0   ) Commons Daemon procrun (1.0.5.0 64-bit) started
[2011-05-19 10:21:30] [info]  (          :0   ) Running 'ElasticSearch' Service...
[2011-05-19 10:21:30] [debug] ( prunsrv.c:1246) Inside ServiceMain...
[2011-05-19 10:21:30] [info]  (          :0   ) Starting service...
[2011-05-19 10:21:30] [debug] ( javajni.c:206 ) loading jvm 'C:\Program Files\Java\jre6\bin\server\jvm.dll'
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[0] -Xms256m
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[1] -Xmx1g
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[2] -XX:+UseCompressedOops
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[3] -Xss128k
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[4] -XX:+UseParNewGC
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[5] -XX:+UseConcMarkSweepGC
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[6] -XX:+CMSParallelRemarkEnabled
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[7] -XX:SurvivorRatio=8
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[8] -XX:MaxTenuringThreshold=1
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[9] -XX:CMSInitiatingOccupancyFraction=75
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[10] -XX:+UseCMSInitiatingOccupancyOnly
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[11] -XX:+HeapDumpOnOutOfMemoryError
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[12] -Djline.enabled=false
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[13] -Delasticsearch
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[14] -Des-foreground=yes
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[15] -Des.path.home=D:\elasticsearch
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[16] -Djava.class.path=C:\Program Files (x86)\Java\jre6\lib\ext\QTJava.zip;D:\elasticsearch\lib\elasticsearch-0.16.1.jar;D:\elasticsearch\lib\elasticsearch-0.16.1.jar;D:\elasticsearch\lib\jline-0.9.94.jar;D:\elasticsearch\lib\jna-3.2.7.jar;D:\elasticsearch\lib\log4j-1.2.15.jar;D:\elasticsearch\lib\lucene-analyzers-3.1.0.jar;D:\elasticsearch\lib\lucene-core-3.1.0.jar;D:\elasticsearch\lib\lucene-highlighter-3.1.0.jar;D:\elasticsearch\lib\lucene-memory-3.1.0.jar;D:\elasticsearch\lib\lucene-queries-3.1.0.jar;D:\elasticsearch\lib\sigar\sigar-1.6.4.jar
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[17] -Xmx512m
[2011-05-19 10:21:31] [debug] ( javajni.c:891 ) Java Worker thread started org/elasticsearch/bootstrap/Bootstrap:main
[2011-05-19 10:21:32] [debug] ( prunsrv.c:1058) Java started org/elasticsearch/bootstrap/Bootstrap
[2011-05-19 10:21:32] [info]  (          :0   ) Service started in 2066 ms.
[2011-05-19 10:21:32] [debug] ( prunsrv.c:1369) Waiting for worker to finish...
[2011-05-19 10:21:39] [debug] ( javajni.c:907 ) Java Worker thread finished org/elasticsearch/bootstrap/Bootstrap:main with status=0
[2011-05-19 10:21:39] [debug] ( prunsrv.c:1374) Worker finished.
[2011-05-19 10:21:39] [debug] ( prunsrv.c:1397) Waiting for all threads to exit</pre>
<p>elasticsearch.log:</p>
<pre style="font-size: 9px;">[2011-05-19 10:21:32,709][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: initializing ...
[2011-05-19 10:21:32,711][INFO ][plugins                  ] [Hack] loaded []
[2011-05-19 10:21:36,149][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: initialized
[2011-05-19 10:21:36,150][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: starting ...
[2011-05-19 10:21:36,268][INFO ][transport                ] [Hack] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.0.1.8:9300]}
[2011-05-19 10:21:39,311][INFO ][cluster.service          ] [Hack] new_master [Hack][Gkn9PLFTR0KdX2X__ybpIQ][inet[/10.0.1.8:9300]], reason: zen-disco-join (elected_as_master)
[2011-05-19 10:21:39,337][INFO ][discovery                ] [Hack] elasticsearch/Gkn9PLFTR0KdX2X__ybpIQ
[2011-05-19 10:21:39,351][INFO ][gateway                  ] [Hack] recovered [0] indices into cluster_state
[2011-05-19 10:21:39,366][INFO ][http                     ] [Hack] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/10.0.1.8:9200]}
[2011-05-19 10:21:39,366][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: started</pre>
<p>Hopefully, this helps you get ElasticSearch up and running as a service on Windows x64. It’s a great app and really worth looking at. I’m hoping to make good use of it on a couple of projects, particularly the faceted search feature.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.captaincodeman.com/2011/05/20/elasticsearch-windows-service-2008-x64/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>NHibernate.Search using Lucene.NET Full Text Index (3)</title>
		<link>http://www.captaincodeman.com/2008/04/26/nhibernatesearch-using-lucene-net-full-text-index-part3/</link>
		<comments>http://www.captaincodeman.com/2008/04/26/nhibernatesearch-using-lucene-net-full-text-index-part3/#comments</comments>
		<pubDate>Sun, 27 Apr 2008 01:07:00 +0000</pubDate>
		<dc:creator>Captain Codeman</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[full text indexing]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[MVC]]></category>
		<category><![CDATA[nhibernate]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blogs.intesoft.net/post.aspx?id=98ef581c-c67c-4d0a-bb7a-c63b4be74c83</guid>
		<description><![CDATA[In Part 1 we looked at how to create a full-text index of NHibernate persisted domain objects using the Lucene.NET project. Part 2 then looked at how to query the index complete with query-parsing and hit-highlighting of the results. Now that we have a full-text index there are other things that we can use it


No related posts.]]></description>
			<content:encoded><![CDATA[<p>In <a title="Creating a full-text index of NHibernate persisted objects" href="http://blogs.intesoft.net/post/2008/03/NHibernateSearch-using-Lucene-NET-Full-Text-Index-Part1.aspx">Part 1</a> we looked at how to create a full-text index of <a title="NHibernate" href="http://www.nhibernate.org/" target="_blank">NHibernate</a> persisted domain objects using the <a title="Lucene.NET" href="http://incubator.apache.org/lucene.net/" target="_blank">Lucene.NET</a> project. <a title="Querying the Lucene.NET index and displaying results" href="http://blogs.intesoft.net/post/2008/03/NHibernateSearch-using-Lucene-NET-Full-Text-Index-Part2.aspx">Part 2</a> then looked at how to query the index complete with query-parsing and hit-highlighting of the results.</p>
<p>Now that we have a full-text index there are other things that we can use it for. The easiest and most useful is probably adding a &#8216;similar items&#8217; feature where the system can automatically display related items based on the text that they share in common. While it isn&#8217;t exact the results are often surprisingly good and while a human editor could probably pick out some links with more finesse it can quickly become an impossible task as the number of items grow &#8211; the human will typically resort to searching for similar items using the index anyway so why not automate it?!</p>
<p>This feature can be used to display related web pages or blog entries or, in this case, related books. It probably isn&#8217;t too far off from the system that Amazon uses. The benefit is that as new content is being added, the top related items can constantly be updated &#8211; even for existing items in the system. So, for example, if a new Harry Potter book is released then the existing books can immediately start linking to it and vice-versa or if a company starts offering a new training course or product then any related pages will immediately start to link together.</p>
<p>While it sounds complicated, it is actually quite easy thanks to the contrib assemblies provided with <a title="Lucene.NET" href="http://incubator.apache.org/lucene.net/" target="_blank">Lucene.NET</a>. In fact, it&#8217;s so simple it&#8217;s almost trivial so this won&#8217;t be a long post!</p>
<p>First, we need to add a new reference to the SimilarityNet.dll assembly (part of Lucene.NET contrib). This provides a SimilarityQueries class which contains a FormSimilarQuery method. Calling this will a piece of text (from an existing field), an analyzer and the field name will produce a boolean query using every unique word where all words are optional. If we repeat this with each field, boosting the relevance of the most important ones (such as title) then we end up with a query that will look for every word in each field of the original item.</p>
<p>To quote the Lucene documentation:</p>
<blockquote><p>The philosophy behind this method is &#8220;two documents are similar if they share lots of words&#8221;. Note that behind the scenes, Lucene&#8217;s scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.</p>
</blockquote>
<p>What this means in practice is that the more unique a word is, the more likely it will be taken into account when ranking the similar items. So, if our original book has &#8216;Agile&#8217; in the title and words such as &#8216;scrum&#8217; and &#8216;backlog&#8217; in the summary then chances are we will find other books that also have these more unique words &#8230; and it&#8217;s very likely that they will be related to our original book.</p>
<p>Of course, when we search our index for books with all these words there is going to be one obvious match &#8211; the original book! In fact, this should be the first result returned so we could either skip this when creating the result-set (looking for the same unique Id rather than just skipping the first one just to be safe) or, as in the example below, use a boolean search and specifically exclude the Id of the source item from the query. I haven&#8217;t experimented to see which one is quicker but I prefer to let Lucene do all the work &#8211; I trust it and it saves me writing any more code or getting results back that I am just going to discard which feels wrong.</p>
<p>Here is the code to find the best 4 similar matches to any book passed in. Note that I include the Authors and Publisher fields when doing the comparison so it will tend to favour books by the same author or publisher &#8211; you will need to experiment to see what makes most sense for your application and usage.</p>
<pre class="csharpcode"><span class="rem">/// &lt;summary&gt;</span>
<span class="rem">/// Gets similar books.</span>
<span class="rem">/// &lt;/summary&gt;</span>
<span class="rem">/// &lt;param name="book"&gt;The book.&lt;/param&gt;</span>
<span class="rem">/// &lt;returns&gt;&lt;/returns&gt;</span>
<span class="kwrd">public</span> <span class="kwrd">override</span> IList&lt;IBook&gt; GetSimilarBooks(IBook book)
{
    IFullTextSession session = (IFullTextSession)NHibernateHelper.GetCurrentSession();
    Analyzer analyzer = <span class="kwrd">new</span> StandardAnalyzer();
    BooleanQuery query = <span class="kwrd">new</span> BooleanQuery();

    Query title = Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Title, analyzer, <span class="str">"Title"</span>, <span class="kwrd">null</span>);
    title.SetBoost(10);
    query.Add(title, BooleanClause.Occur.SHOULD);

    <span class="kwrd">if</span> (book.Summary != <span class="kwrd">null</span>) {
        Query summary =
            Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Summary, analyzer, <span class="str">"Summary"</span>, <span class="kwrd">null</span>);
        summary.SetBoost(5);
        query.Add(summary, BooleanClause.Occur.SHOULD);
    }

    <span class="kwrd">if</span> (book.Authors != <span class="kwrd">null</span>) {
        Query authors =
            Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Authors, analyzer, <span class="str">"Authors"</span>, <span class="kwrd">null</span>);
        query.Add(authors, BooleanClause.Occur.SHOULD);
    }

    <span class="kwrd">if</span> (book.Publisher != <span class="kwrd">null</span>) {
        Query publisher =
            Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Publisher, analyzer, <span class="str">"Publisher"</span>, <span class="kwrd">null</span>);
        query.Add(publisher, BooleanClause.Occur.SHOULD);
    }
</pre>
<pre class="csharpcode">    <span class="rem">// avoid the book being similar to itself!</span>
    query.Add(<span class="kwrd">new</span> TermQuery(<span class="kwrd">new</span> Term(<span class="str">"Id"</span>, book.Id.ToString())), BooleanClause.Occur.MUST_NOT);

    IQuery nhQuery = session.CreateFullTextQuery(query, <span class="kwrd">new</span> Type[] { <span class="kwrd">typeof</span>(Book) })
                            .SetMaxResults(4);

    IList&lt;IBook&gt; books = nhQuery.List&lt;IBook&gt;();
    <span class="kwrd">return</span> books;
}
</pre>
<style type="text/css">.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }
</style>
<p>&nbsp;</p>
<p>That about wraps it up for using NHibernate and Lucene. I&#8217;m expecting things to change when the new NHibernate version 2.0 is released so I&#8217;ll probably post again to update you of any changes though when it is. Also, there are a few other features available in Lucene which I may blog about such as using Synonyms for the &#8216;did you mean &#8230;&#8217; type suggestions.</p>
<p>Please let me know if there is anything that I haven&#8217;t explained particularly well or you would like to see more about.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.captaincodeman.com/2008/04/26/nhibernatesearch-using-lucene-net-full-text-index-part3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NHibernate.Search using Lucene.NET Full Text Index (2)</title>
		<link>http://www.captaincodeman.com/2008/03/30/nhibernatesearch-using-lucene-net-full-text-index-part2/</link>
		<comments>http://www.captaincodeman.com/2008/03/30/nhibernatesearch-using-lucene-net-full-text-index-part2/#comments</comments>
		<pubDate>Sun, 30 Mar 2008 22:16:36 +0000</pubDate>
		<dc:creator>Captain Codeman</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[full text indexing]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[MVC]]></category>
		<category><![CDATA[nhibernate]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blogs.intesoft.net/post.aspx?id=fcaee060-fe1d-45eb-9867-5a051d29657b</guid>
		<description><![CDATA[In NHibernate.Search using Lucene.NET Full Text Index (Part 1) we looked at setting up the NHibernate.Search extension to add full-text searching of NHibernate-persisted objects. Next, we&#8217;ll look at how we can perform Google-like searches using the Lucene.NET index and some tips on displaying the results including highlighting the search-terms. Our Book class has the Title,


No related posts.]]></description>
			<content:encoded><![CDATA[</p>
<p>In <a href="http://blogs.intesoft.net/post/2008/03/NHibernateSearch-using-Lucene-NET-Full-Text-Index-Part1.aspx">NHibernate.Search using Lucene.NET Full Text Index (Part 1)</a> we looked at setting up the NHibernate.Search extension to add full-text searching of <a title="NHibernate" href="http://www.nhibernate.org/" target="_blank" rel="tag">NHibernate</a>-persisted objects.</p>
<p>Next, we&#8217;ll look at how we can perform Google-like searches using the <a title="Lucene.NET" href="http://incubator.apache.org/lucene.net/" target="_blank" rel="tag">Lucene.NET</a> index and some tips on displaying the results including highlighting the search-terms.</p>
<p>Our Book class has the Title, Summary, Authors and Publisher field indexed so we&#8217;ll allow searching in any of these fields. However, if a search-term exists in the title it is probably more relevant than if it just exists in the summary so we want to give more priority to certain fields than to others. Likewise, we probably want to be able to specify which fields to search on otherwise we would get books that make mention of &quot;Martin Fowler&quot; in the summary whereas we may want to only see books that have &quot;Martin Fowler&quot; as an author for example.</p>
<p>Also worth mentioning is the Summary field. In the Book class there is a SummaryHtml field which (you&#8217;ll never guess) contains the Html summary retrieved from Amazon and also a Summary field which is the one that is actually indexed. In the full app this text field is generated from the Html content using the <a title="HtmlAgility library" href="http://www.codeplex.com/htmlagilitypack" target="_blank" rel="tag">HtmlAgility library</a>. The reason we want a version of the Summary in plain text is to make indexing easier / more accurate (no HTML tags) and also to allow result fragments to be created: imagine if a section of the SummaryHtml was output &#8211; it could potentially split across an Html element or attribute (producing invalid markup) or include the opening tag but not the matching closing one (producing runaway bold-text for instance).</p>
<p>Back to our example though. To be able to show the highlighted search terms in the results I found it easier to create a special BookSearchResult class that I can return from the data provider &#8211; the highlighting is something Lucene.NET can do for us and avoids us having to write our own presentation code to handle it. Here is the class: </p>
<pre class="csharpcode"><span class="rem">/// &lt;summary&gt;</span>
<span class="rem">/// A wrapper for a book object returned from a full text index query/// with additional properties for highlighted segments</span>
<span class="rem">/// &lt;/summary&gt;</span>
<span class="kwrd">public</span> <span class="kwrd">class</span> BookSearchResult : IBookSearchResult
{
    <span class="kwrd">private</span> <span class="kwrd">readonly</span> IBook _book;
    <span class="kwrd">private</span> <span class="kwrd">string</span> _highlightedTitle;
    <span class="kwrd">private</span> <span class="kwrd">string</span> _highlightedSummary;
    <span class="kwrd">private</span> <span class="kwrd">string</span> _highlightedAuthors;
    <span class="kwrd">private</span> <span class="kwrd">string</span> _highlightedPublisher;

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Initializes a new instance of the &lt;see cref=&quot;BookSearchResult&quot;/&gt; class.</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="rem">/// &lt;param name=&quot;book&quot;&gt;The book.&lt;/param&gt;</span>
    <span class="kwrd">public</span> BookSearchResult(IBook book)
    {
        _book = book;
    }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Gets the book.</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="rem">/// &lt;value&gt;The book.&lt;/value&gt;</span>
    <span class="kwrd">public</span> IBook Book
    {
        get { <span class="kwrd">return</span> _book; }
    }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Gets or sets the highlighted title.</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="rem">/// &lt;value&gt;The highlighted title.&lt;/value&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">string</span> HighlightedTitle
    {
        get
        {
            <span class="kwrd">if</span> (_highlightedTitle == <span class="kwrd">null</span> || _highlightedTitle.Length == 0)
            {
                <span class="kwrd">return</span> _book.Title;
            }
            <span class="kwrd">return</span> _highlightedTitle;
        }
        set { _highlightedTitle = <span class="kwrd">value</span>; }
    }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Gets or sets the highlighted summary.</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="rem">/// &lt;value&gt;The highlighted summary.&lt;/value&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">string</span> HighlightedSummary
    {
        get
        {
            <span class="kwrd">if</span> (_highlightedSummary == <span class="kwrd">null</span> || _highlightedSummary.Length == 0)
            {
                <span class="kwrd">if</span> (_book.Summary == <span class="kwrd">null</span> || _book.Summary.Length &lt; 300)
                {
                    <span class="kwrd">return</span> _book.Summary;
                }
                <span class="kwrd">else</span>
                {
                    <span class="kwrd">return</span> _book.Summary.Substring(0,300) + <span class="str">&quot; ...&quot;</span>;
                }
            }
            <span class="kwrd">return</span> _highlightedSummary;
        }
        set { _highlightedSummary = <span class="kwrd">value</span>; }
    }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Gets or sets the highlighted authors.</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="rem">/// &lt;value&gt;The highlighted authors.&lt;/value&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">string</span> HighlightedAuthors
    {
        get
        {
            <span class="kwrd">if</span> (_highlightedAuthors == <span class="kwrd">null</span> || _highlightedAuthors.Length == 0)
            {
                <span class="kwrd">return</span> _book.Authors;
            }
            <span class="kwrd">return</span> _highlightedAuthors;
        }
        set { _highlightedAuthors = <span class="kwrd">value</span>; }
    }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Gets or sets the highlighted publisher.</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="rem">/// &lt;value&gt;The highlighted publisher.&lt;/value&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">string</span> HighlightedPublisher
    {
        get
        {
            <span class="kwrd">if</span> (_highlightedPublisher == <span class="kwrd">null</span> || _highlightedPublisher.Length == 0)
            {
                <span class="kwrd">return</span> _book.Publisher;
            }
            <span class="kwrd">return</span> _highlightedPublisher;
        }
        set { _highlightedPublisher = <span class="kwrd">value</span>; }
    }
}</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>&#160;</p>
<p>You&#8217;ll notice that the Highlighted&#8230; fields return the equivalent book field if the highlighted field does not exist. This just saves us having to check whether there is a highlighted term in each field when we&#8217;re building the search result list.</p>
<p>Our data provider will accept a single string consisting of the entered search-terms and return a list of BookSearchResult objects that match. Here is the code and I&#8217;ll then try and explain what it&#8217;s doing:</p>
<pre class="csharpcode"><span class="rem">/// &lt;summary&gt;</span>
<span class="rem">/// Finds the books.</span>
<span class="rem">/// &lt;/summary&gt;</span>
<span class="rem">/// &lt;param name=&quot;query&quot;&gt;The query.&lt;/param&gt;</span>
<span class="rem">/// &lt;returns&gt;&lt;/returns&gt;</span>
<span class="kwrd">public</span> <span class="kwrd">override</span> IList&lt;IBookSearchResult&gt; FindBooks(<span class="kwrd">string</span> query)
{
    IList&lt;IBookSearchResult&gt; results = <span class="kwrd">new</span> List&lt;IBookSearchResult&gt;();

    Analyzer analyzer = <span class="kwrd">new</span> SimpleAnalyzer();
    MultiFieldQueryParser parser = <span class="kwrd">new</span> MultiFieldQueryParser(                                   <span class="kwrd">new</span> <span class="kwrd">string</span>[] { <span class="str">&quot;Title&quot;</span>, <span class="str">&quot;Summary&quot;</span>, <span class="str">&quot;Authors&quot;</span>, <span class="str">&quot;Publisher&quot;</span>},                                    analyzer);
    Query queryObj;

    <span class="kwrd">try</span>
    {
        queryObj = parser.Parse(query);
    }
    <span class="kwrd">catch</span> (ParseException)
    {
        <span class="rem">// TODO: provide feedback to user on failed search expressions</span>
        <span class="kwrd">return</span> results;
    }

    IFullTextSession session = (IFullTextSession) NHibernateHelper.GetCurrentSession();
    IQuery nhQuery = session.CreateFullTextQuery(queryObj, <span class="kwrd">new</span> Type[] {<span class="kwrd">typeof</span> (Book) } );

    IList&lt;IBook&gt; books = nhQuery.List&lt;IBook&gt;();

    IndexReader indexReader = IndexReader.Open(SearchFactory.GetSearchFactory(session)                                         .GetDirectoryProvider(<span class="kwrd">typeof</span> (Book)).Directory);
    Query simplifiedQuery = queryObj.Rewrite(indexReader);

    SimpleHTMLFormatter formatter = <span class="kwrd">new</span> SimpleHTMLFormatter(<span class="str">&quot;&lt;b class='term'&gt;&quot;</span>, <span class="str">&quot;&lt;/b&gt;&quot;</span>);

    Highlighter hTitle = GetHighlighter(simplifiedQuery, formatter, <span class="str">&quot;Title&quot;</span>, 100);
    Highlighter hSummary = GetHighlighter(simplifiedQuery, formatter, <span class="str">&quot;Summary&quot;</span>, 200);
    Highlighter hAuthors = GetHighlighter(simplifiedQuery, formatter, <span class="str">&quot;Authors&quot;</span>, 100);
    Highlighter hPublisher = GetHighlighter(simplifiedQuery, formatter, <span class="str">&quot;Publisher&quot;</span>, 100);

    <span class="kwrd">foreach</span>(IBook book <span class="kwrd">in</span> books)
    {
        IBookSearchResult result = <span class="kwrd">new</span> BookSearchResult(book);

        TokenStream tsTitle = analyzer.TokenStream(<span class="str">&quot;Title&quot;</span>,                               <span class="kwrd">new</span> System.IO.StringReader(book.Title ?? <span class="kwrd">string</span>.Empty));
        result.HighlightedTitle = hTitle.GetBestFragment(tsTitle, book.Title);

        TokenStream tsAuthors = analyzer.TokenStream(<span class="str">&quot;Authors&quot;</span>,                              <span class="kwrd">new</span> System.IO.StringReader(book.Authors ?? <span class="kwrd">string</span>.Empty));
        result.HighlightedAuthors = hAuthors.GetBestFragment(tsAuthors, book.Authors);

        TokenStream tsPublisher = analyzer.TokenStream(<span class="str">&quot;Publisher&quot;</span>,                               <span class="kwrd">new</span> System.IO.StringReader(book.Publisher ?? <span class="kwrd">string</span>.Empty));
        result.HighlightedPublisher = hPublisher.GetBestFragment(tsPublisher, book.Publisher);

        TokenStream tsSummary = analyzer.TokenStream(<span class="str">&quot;Summary&quot;</span>,                               <span class="kwrd">new</span> System.IO.StringReader(book.Summary ?? <span class="kwrd">string</span>.Empty));
        result.HighlightedSummary = hSummary.GetBestFragments(tsSummary,                                     book.Summary, 3, <span class="str">&quot; ... &lt;br /&gt;&lt;br /&gt; ... &quot;</span>);

        results.Add(result);
    }

    <span class="kwrd">return</span> results;
}

<span class="rem">/// &lt;summary&gt;</span>
<span class="rem">/// Gets the highlighter for the given field.</span>
<span class="rem">/// &lt;/summary&gt;</span>
<span class="rem">/// &lt;param name=&quot;query&quot;&gt;The query.&lt;/param&gt;</span>
<span class="rem">/// &lt;param name=&quot;formatter&quot;&gt;The formatter.&lt;/param&gt;</span>
<span class="rem">/// &lt;param name=&quot;field&quot;&gt;The field.&lt;/param&gt;</span>
<span class="rem">/// &lt;param name=&quot;fragmentSize&quot;&gt;Size of the fragment.&lt;/param&gt;</span>
<span class="rem">/// &lt;returns&gt;&lt;/returns&gt;</span>
<span class="kwrd">private</span> <span class="kwrd">static</span> Highlighter GetHighlighter(Query query, Formatter formatter,                                          <span class="kwrd">string</span> field, <span class="kwrd">int</span> fragmentSize)
{
    <span class="rem">// create a new query to contain the terms</span>
    BooleanQuery termsQuery = <span class="kwrd">new</span> BooleanQuery();

    <span class="rem">// extract terms for this field only</span>
    WeightedTerm[] terms = QueryTermExtractor.GetTerms(query, <span class="kwrd">true</span>, field);
    <span class="kwrd">foreach</span> (WeightedTerm term <span class="kwrd">in</span> terms)
    {
        <span class="rem">// create new term query and add to list</span>
        TermQuery termQuery = <span class="kwrd">new</span> TermQuery(<span class="kwrd">new</span> Term(field, term.GetTerm()));
        termsQuery.Add(termQuery, BooleanClause.Occur.SHOULD);
    }

    <span class="rem">// create query scorer based on term queries (field specific)</span>
    QueryScorer scorer = <span class="kwrd">new</span> QueryScorer(termsQuery);

    Highlighter highlighter = <span class="kwrd">new</span> Highlighter(formatter, scorer);
    highlighter.SetTextFragmenter(<span class="kwrd">new</span> SimpleFragmenter(fragmentSize));

    <span class="kwrd">return</span> highlighter;
}</pre>
<pre class="csharpcode">&#160;</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>First, we parse the user-entered query string indicating that we want to match on the fields Title, Summary, Authors and Publisher using the MultiFieldQueryParser. This turns the user entered search expression into Lucene specific instructions. Most users when searching will enter a simple expression containing the words or phrase that they want to find. If the search term &quot;XML&#8217; is entered for example Lucene will convert this into the expression &quot;Title:XML Summary:XML Authors:XML Publisher:XML&quot; which effectively means &quot;find any record where &#8216;XML&#8217; exists in any of the fields&quot;.</p>
<p>The user can enter specific instructions directly such as &quot;Title:Architecture Authors:Fowler&quot; which means &quot;Find any books that have &#8216;Architecture&#8217; in the Title field or &#8216;Fowler&#8217; in the Authors field&quot;. Boolean expressions can be used to control this further allowing &quot;(Title:Architecture) AND (Authors:Fowler)&quot; to find any books titled &#8216;Architecture&#8217; authored by &#8216;Fowler&#8217;. When specific searches like this have been entered then the MultiFieldQueryParser doesn&#8217;t expand the search to include all fields (except for un-field-prefixed words and phrases).</p>
<p>Incidentally, in the original Book class we included attributes to control the indexing such as [Boost(10)] for the Title. This boosts the relevance of searches on certain fields so a search for &#8216;XML&#8217; in the Title <em><strong>and</strong></em> Summary of a document will rank books with &#8216;XML&#8217; in the Title higher than books that have &#8216;XML&#8217; in the summary &#8211; they are more likely to be what the user is searching for in this case.</p>
<p>Lucene does provide many other ways to define a query but this is simple and easy for this example.</p>
<p>Once we have our Lucene query object we use this to create an NHibernate.Search full-text query to return Book objects. This is where NHibernate and Lucene meet (from a querying point of view). It is possible to combine full-text-queries of Lucene with NHibernate queries of the database &#8211; NHibernate.Search handles the searching and returns the relevant objects.</p>
<p>So, we now have a list of Book objects just the same as if it had come directly from NHibernate except that the results are in order based on the rank provided by the Lucene search.</p>
<p>Now, we&#8217;ll use another part of Lucene to highlight the matches. This is done using the SimpleHTMLFormatter, QueryScorer and Highlighter objects which combined allow us to get a fragment for each field with the search terms highlighted.</p>
<p>Note that the SimpleHtmlFormatter class is <em><strong>not</strong></em> in the main Lucene.Net.dll assembly but instead in a separate contrib assembly called Highlighter.Net.dll &#8211; there are also some other interesting utilities worth exploring in the contrib folder of the Lucene.NET distribution. Remember in <a href="http://blogs.intesoft.net/post/2008/03/NHibernateSearch-using-Lucene-NET-Full-Text-Index-Part1.aspx">Part 1</a> I mentioned that I had problems with assembly references and different versions of Lucene.Net.dll being used by NHibernate.Search so if you have problems building the solution after adding references to these contrib assemblies, consider building NHibernate.Search making sure that it references the same Lucene.Net.dll as the Lucene contrib assemblies were built against.</p>
<p>The Highlighter object for each field has to be based on the query terms for that field only so the original query is re-written and split up so that only the terms searched for that field are used. This isn&#8217;t strictly necessary but I think it makes more sense if when you search for &#8216;Microsoft&#8217; in the Title of a book <em><strong>only</strong></em> that occurrences of &#8216;Microsoft&#8217; in the Summary or Publisher fields are <em><strong>not</strong></em> highlighted: the highlighted results then show clearly which found terms influenced the results. I have split this functionality into a separate GetHighlighter() method.</p>
<p>For example, without doing this a search for &#8216;Title:Microsoft&#8217; incorrectly highlights the occurrences of &#8216;Microsoft&#8217; found within the Author, Publisher and Summary fields even though they did not really contribute to the Book being included in the results or it&#8217;s rank within them:</p>
<p><a href="http://blogs.intesoft.net/image.axd?picture=WindowsLiveWriter/NHibernate.Searchu.NETFullTextIndexPart2_9721/highlight_wrong_2.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="245" alt="highlight_wrong" src="http://blogs.intesoft.net/image.axd?picture=WindowsLiveWriter/NHibernate.Searchu.NETFullTextIndexPart2_9721/highlight_wrong_thumb.png" width="695" border="0" /></a> </p>
<p>By creating the proper Highlighter for each field based on the terms used to search it the search results can be shown correctly without highlighting the un-searched fields / terms:</p>
<p><a href="http://blogs.intesoft.net/image.axd?picture=WindowsLiveWriter/NHibernate.Searchu.NETFullTextIndexPart2_9721/highlight_correct_2.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="195" alt="highlight_correct" src="http://blogs.intesoft.net/image.axd?picture=WindowsLiveWriter/NHibernate.Searchu.NETFullTextIndexPart2_9721/highlight_correct_thumb.png" width="695" border="0" /></a> </p>
<p>Also, not that the fragments produced for the Summary are different &#8211; if a separate terms are used for the Title and Summary then having the Title terms highlighted in the Summary would possibly produce incorrect or sub-standard fragments.</p>
<p>&#160;</p>
<p>Having built our Highlighters we can then iterate over the results creating a BookSearchResult to wrap each book in the result set. The same analyzer used in the initial query is then used to get a TokenStream for each field which the Highlighter instance needs to create the highlighted fragment from.</p>
<p>For the Title, Authors and Publisher fields we return a single Fragment which will normally be the field itself with the highlighted search terms wrapped in &lt;b class=&#8217;term&#8217;&gt; &#8230; &lt;/b&gt; Html tags (courtesy of the SimpleHtmlFormatter class). The highlighted Summary is set to the best 3 fragments separated by &#8216;&#8230; &lt;br /&gt;&lt;br /&gt; &#8230; &#8216;. However big the summary is this ensures that the results contain a similar sized chunk of text with the best fragments shown (those containing the most highlighted terms).</p>
<p>Here is an example of the results for &#8216;Title:Software Summary:Requirements Authors:Steve&#8217; after formatting and CSS applied to show the highlighted terms in yellow:</p>
<p><a href="http://blogs.intesoft.net/image.axd?picture=WindowsLiveWriter/NHibernate.Searchu.NETFullTextIndexPart2_9721/search_results_6.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="950" alt="search_results" src="http://blogs.intesoft.net/image.axd?picture=WindowsLiveWriter/NHibernate.Searchu.NETFullTextIndexPart2_9721/search_results_thumb_2.png" width="665" border="0" /></a> </p>
<p>&#160;</p>
<p>Lucene.NET can do a lot more than I&#8217;ve shown here. I found the best resource for learning about how to use it is the &#8216;Lucene in Action&#8217; book:</p>
<table border="0">
<tbody>
<tr>
<td valign="top"><a href="http://www.amazon.com/gp/redirect.html%3FASIN=1932394281%26tag=ws%26lcode=sp1%26cID=2025%26ccmID=165953%26location=/o/ASIN/1932394281%253FSubscriptionId=0525E2PQ81DD7ZTWTK82"><img src="http://ecx.images-amazon.com/images/I/115SA6PS8SL.jpg" border="1" /></a></td>
<td valign="top"><b>Lucene in Action (In Action series)</b></p>
<p>by Otis Gospodnetic, Erik Hatcher</p>
<p><a href="http://www.amazon.com/gp/redirect.html%3FASIN=1932394281%26tag=ws%26lcode=sp1%26cID=2025%26ccmID=165953%26location=/o/ASIN/1932394281%253FSubscriptionId=0525E2PQ81DD7ZTWTK82">Read more about this book&#8230;</a></td>
</tr>
</tbody>
</table>
<p>Note that this covers the Java version but applies equally well to the .NET port which is practically identical.</p>
<p>&#160;</p>
<p>I hope this has been useful. In Part 3 I&#8217;ll try and demonstrate using the Lucene.NET index to find similar items based on the frequency of shared terms. This can be used to provide &#8216;other books you may like&#8217; or &#8216;blog posts like this one&#8217; type functionality.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.captaincodeman.com/2008/03/30/nhibernatesearch-using-lucene-net-full-text-index-part2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NHibernate.Search using Lucene.NET Full Text Index (1)</title>
		<link>http://www.captaincodeman.com/2008/03/10/nhibernatesearch-using-lucene-net-full-text-index-part1/</link>
		<comments>http://www.captaincodeman.com/2008/03/10/nhibernatesearch-using-lucene-net-full-text-index-part1/#comments</comments>
		<pubDate>Mon, 10 Mar 2008 21:36:00 +0000</pubDate>
		<dc:creator>Captain Codeman</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[full text indexing]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[MVC]]></category>
		<category><![CDATA[nhibernate]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blogs.intesoft.net/post.aspx?id=1fc95aeb-3e10-44fb-a5eb-8b7b880d2ff6</guid>
		<description><![CDATA[Ayende added the NHibernate.Search last year but I&#39;ve never seen a great deal of documentation or examples around it so hopefully this post will help others to get started with it. Basically, this addition to NHibernate brings two of the best open source libraries together &#8211; NHibernate as the Object Relational Mapper that persists your


No related posts.]]></description>
			<content:encoded><![CDATA[<p>
<a href="http://www.ayende.com/Blog/" target="_blank">Ayende</a> added the <a href="http://www.ayende.com/Blog/archive/2007/04/02/NHibernate-Search.aspx" target="_blank">NHibernate.Search</a> last year but I&#39;ve never seen a great deal of documentation or examples around it so hopefully this post will help others to get started with it.
</p>
<p>
Basically, this addition to NHibernate brings two of the best open source libraries together &#8211; NHibernate as the Object Relational Mapper that persists your objects to a database and Lucene.NET which provides full-text indexing and query support.
</p>
<p>
So how do you use it?
</p>
<p>
The first problem you will run into is actually finding it. Unfortunately the release of NHibernate does not include it in the \bin although it is there in the source. <a href="http://www.nhibernate.org/" target="_blank">Download the latest version of the NHibernate source</a> (1.2.1 GA as of writing) and compile it to produce the NHibernate.Search.dll assembly.
</p>
<p>
Before you do this though, you <em><strong>may</strong></em> want to also <a href="http://incubator.apache.org/lucene.net/download/" target="_blank">download the latest Lucene.NET release</a> (2.0.004) and replace the Lucene.NET.dll assembly in the NHibernate \lib\net\2.0 folder (I&#39;m assuming you are using .NET 2.0). While the Lucene.NET library has the same version number and did work fine, the sizes are different and I ran into some problems when trying to use some of the extra Lucene.NET assemblies for hit-highlighting and similarity matching.
</p>
<p>
The first step is of course to add a reference to NHibernate.Search.dll to your Visual Studio.NET Project.
</p>
<p>
Next, you need to add some additional properties to the session-factory element of the NHibernate configuration section(normally stored in your web.config file):
</p>
<pre class="csharpcode">
<span class="kwrd">&lt;</span><span class="html">property</span> <span class="attr">name</span><span class="kwrd">=&quot;hibernate.search.default.directory_provider&quot;</span><span class="kwrd">&gt;</span>NHibernate.Search.Storage.FSDirectoryProvider, NHibernate.Search<span class="kwrd">&lt;/</span><span class="html">property</span><span class="kwrd">&gt;</span><span class="kwrd">&lt;</span><span class="html">property</span> <span class="attr">name</span><span class="kwrd">=&quot;hibernate.search.default.indexBase&quot;</span><span class="kwrd">&gt;</span>~/Index<span class="kwrd">&lt;/</span><span class="html">property</span><span class="kwrd">&gt;</span>
</pre>
<p>
&nbsp;
</p>
<p>
If you&#39;ve used Lucene.NET much you will know that it has the concept of different directory providers for storing the indexed such as RAM or FS (File System). The entries above are used to indicate that we want the Lucene index to be stored on the file system and located in the /Index folder of the website (it could of course be outside the website mapped folder). It&#39;s well worth reading a book such as <a href="http://www.manning.com/hatcher2/" target="_blank">Lucene in Action</a> to get a good idea of how Lucene works and what it can do (it&#39;s for the Java version but is still excellent for learning the .NET implementation).
</p>
<p>
The next step requires that you decorate your C# class with some attributes to control the indexing operation. Personally, I don&#39;t like this as it means I need to start referencing NHibernate and Lucene assemblies from my otherwise nice, clean POCO (Plain Old CLR/C# Classes) project. It would have been much nicer IMO if this information could have been put in the NHibernate .hbm.xml mapping files but it&#39;s a small price to pay and some people already use the attribute approach for NHibernate anyway.
</p>
<p>
Here is an example of a Book class for a library application with the additional attributes:
</p>
<pre class="csharpcode">
[Indexed(Index = <span class="str">&quot;Book&quot;</span>)] <span class="kwrd">public</span> <span class="kwrd">class</span> Book : IBook {     <span class="kwrd">private</span> Guid _id;     <span class="kwrd">private</span> <span class="kwrd">string</span> _title;     <span class="kwrd">private</span> <span class="kwrd">string</span> _summary;     <span class="kwrd">private</span> <span class="kwrd">string</span> _summaryHtml;     <span class="kwrd">private</span> <span class="kwrd">string</span> _authors;     <span class="kwrd">private</span> <span class="kwrd">string</span> _url;     <span class="kwrd">private</span> <span class="kwrd">string</span> _smallImageUrl;     <span class="kwrd">private</span> <span class="kwrd">string</span> _mediumImageUrl;     <span class="kwrd">private</span> <span class="kwrd">string</span> _largeImageUrl;     <span class="kwrd">private</span> <span class="kwrd">string</span> _isbn;     <span class="kwrd">private</span> <span class="kwrd">string</span> _published;     <span class="kwrd">private</span> <span class="kwrd">string</span> _publisher;     <span class="kwrd">private</span> <span class="kwrd">string</span> _binding;     [DocumentId]     [FieldBridge(<span class="kwrd">typeof</span>(GuidBridge))]     <span class="kwrd">public</span> Guid Id     {         get { <span class="kwrd">return</span> _id; }         set { _id = <span class="kwrd">value</span>; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(<span class="kwrd">typeof</span>(StandardAnalyzer))]     [Boost(2)]     <span class="kwrd">public</span> <span class="kwrd">string</span> Title     {         get { <span class="kwrd">return</span> _title; }         set { _title = <span class="kwrd">value</span>; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(<span class="kwrd">typeof</span>(StandardAnalyzer))]     <span class="kwrd">public</span> <span class="kwrd">string</span> Summary     {         get { <span class="kwrd">return</span> _summary; }         set { _summary = <span class="kwrd">value</span>; }     }     <span class="kwrd">public</span> <span class="kwrd">string</span> SummaryHtml     {         get         {             <span class="kwrd">if</span> (_summaryHtml == <span class="kwrd">null</span> || _summaryHtml.Length == 0)             {                 <span class="kwrd">return</span> _summary;             }             <span class="kwrd">return</span> _summaryHtml;         }         set { _summaryHtml = <span class="kwrd">value</span>; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(<span class="kwrd">typeof</span>(StandardAnalyzer))]     <span class="kwrd">public</span> <span class="kwrd">string</span> Authors     {         get { <span class="kwrd">return</span> _authors; }         set { _authors = <span class="kwrd">value</span>; }     }     <span class="kwrd">public</span> <span class="kwrd">string</span> Url     {         get { <span class="kwrd">return</span> _url; }         set { _url = <span class="kwrd">value</span>; }     }     <span class="kwrd">public</span> <span class="kwrd">string</span> SmallImageUrl     {         get { <span class="kwrd">return</span> _smallImageUrl; }         set { _smallImageUrl = <span class="kwrd">value</span>; }     }     <span class="kwrd">public</span> <span class="kwrd">string</span> MediumImageUrl     {         get { <span class="kwrd">return</span> _mediumImageUrl; }         set { _mediumImageUrl = <span class="kwrd">value</span>; }     }     <span class="kwrd">public</span> <span class="kwrd">string</span> LargeImageUrl     {         get { <span class="kwrd">return</span> _largeImageUrl; }         set { _largeImageUrl = <span class="kwrd">value</span>; }     }     [Field(Index.UnTokenized, Store = Store.Yes)]     <span class="kwrd">public</span> <span class="kwrd">string</span> Isbn     {         get { <span class="kwrd">return</span> _isbn; }         set { _isbn = <span class="kwrd">value</span>; }     }     [Field(Index.UnTokenized, Store = Store.No)]     <span class="kwrd">public</span> <span class="kwrd">string</span> Published     {         get { <span class="kwrd">return</span> _published; }         set { _published = <span class="kwrd">value</span>; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(<span class="kwrd">typeof</span>(StandardAnalyzer))]     <span class="kwrd">public</span> <span class="kwrd">string</span> Publisher     {         get { <span class="kwrd">return</span> _publisher; }         set { _publisher = <span class="kwrd">value</span>; }     }     <span class="kwrd">public</span> <span class="kwrd">string</span> Binding     {         get { <span class="kwrd">return</span> _binding; }         set { _binding = <span class="kwrd">value</span>; }     } }
</pre>
<p>
Now we&#39;re ready to start using it from NHibernate. To do this we need to create a FullTextSession and use this instead of the regular NHibernate Session (which it wraps / extends):
</p>
<pre class="csharpcode">
ISession session = sessionFactory.OpenSession(<span class="kwrd">new</span> SearchInterceptor());IFullTextSession fullTextSession = Search.CreateFullTextSession(session);
</pre>
<p>
&nbsp;
</p>
<p>
And that&#39;s it. You can use the IFullTextSession in place of the regular ISession (even casting it for places where you are just doing normal NHibernate operations). All the magic happens inside NHibernate.Search &#8211; when you add, update or delete records the &#39;documents&#39; in the Lucene index are automatically updated which provides you with an excellent Full Text index without a Windows Service in sight!
</p>
<p>
You can check that it&#39;s working by looking in the Index folder &#8211; there should be a &#39;Book&#39; folder containing the Lucene index files (with CFS extensions).
</p>
<p>
In the next post I&#39;ll demonstrate using the index to do some queries including hit-highlighting for presenting the results but for now you may want to download and try <a href="http://www.getopt.org/luke/" target="_blank">Luke &#8211; a Java program to browser Lucene index catalogs</a> (the file format is identical between the two implementations).</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.captaincodeman.com/2008/03/10/nhibernatesearch-using-lucene-net-full-text-index-part1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

