EmailLinkedInGoogle+TwitterFacebook

I have been using Lucene for indexing and searching through deep class structures for many years.  Somewhere along the years hibernate introduced its entity indexing and searching through hibernate-search.  I yet have not started using it.  So for the time being we have rolled in our little custom search engine with Lucene and we pack hibernate and Lucene together through Session facades.  It works perfectly well.   One of the problems I ran into with Lucene was when there was a product requirement for making the search index uniformly available across all nodes in a JEE cluster.  This was clearly a problem because Lucene likes its indexes saved on the file system.  Sharing file systems as network resources was ugly.  So we resorted to an elegant implementation thanks to compass project.  I pulled in the JdbcDirectory façade from compass.  This façade let us use database as the index store and it once again worked beautifully.  There was no visible reduction in search performance for the type of indexing we wanted to do.  Recently we ran into some problems when the code was refactored into JEE6 compliance and deployed into Jboss AS7 – the defacto production quality open source JEE6 compliant server.  The problem seemed to be coming from JdbcDirectory and more specifically had something to do with transaction management.  So I took a critical review of what we were doing and refactored the SearchEngine façade to work seamlessly with JdbcDirectory.

Hibernate-search has a promising solution that uses infini-span to provide cluster wide directory.  I have not used it yet.  It looks like a very interesting direction as well since infinispan itself is a storage abstraction.

In this example we have a SearchEngine that works closely with the database transactions. It is indexing key searchable fields from Person class.  Notably it indexes ‘hobbies’  – a filed saved to the database as @Lob.  This is particularly useful.  Finally we have all these working together nicely in a test case.  The testing is real with DataSource and EntityManager created and managed from an embedded glassfish container.  Thanks to arquillian setting it up and testing is a breeze.  SearchEngine is written as a @Singleton Session Bean.

@EJB
SearchEngine searcher;
@Test
public void doSearch() throws Exception{
    searcher.doIndex(personDao.save(new Person("Boni","IN",new String[]{"reading","coding","cycling"})));
    searcher.doIndex(personDao.save(new Person("nikhil","IN",new String[]{"painting","cycling","dancing"})));
    searcher.doIndex(personDao.save(new Person("urmi","IN",new String[]{"singing","cooking","sleeping","reading"})));
    searcher.doIndex(personDao.save(new Person("baba","IN",new String[]{"reading","cooking"})));
    searcher.doIndex(personDao.save(new Person("piku","IN",new String[]{"laughing","football"})));
    List<Person> persons = personDao.finadAll();
    assertEquals(5,persons.size());
    searchAndPrintResults("cooking",2);
    searchAndPrintResults("laughing",1);
    searchAndPrintResults("reading",3);
    searchAndPrintResults("foot",1);
    searchAndPrintResults("nonexitentsearch",0);
}
 
private void searchAndPrintResults(String searchString, int expectedCount) throws Exception{
    System.out.println("Search Query:" + searchString);
    List<Person> persons = searcher.doSearch(Person.class, searchString);
    assertEquals(expectedCount, persons.size());
    for (Person aPerson : persons){
        System.out.println("Name:" + aPerson.getName());
    }
    
}

This is the test.  I am creating persons with different hobbies.  doIndex() method indexes the data.  doSearch() searches index data and returns back matching entities for a partial query string.  To locate the entity it uses some metadata information saved along with the lucene document.  Farly simple.

SearchEngine itself is a stateless singleton session bean.  It is annotated with @Singleton to control its instantiation strategy.  During its @PostConstruct it initializes search resources including the JdbcDirectory.  The JdbcDirectory uses a table to save the search index.  This table needs to be manually created.  If you are using derby its schema ddl will look something like this.

CREATE TABLE search_index(
    NAME_ VARCHAR(50), 
    VALUE_ BLOB, 
    SIZE_ INT, 
    LF_ TIMESTAMP, 
    DELETED_ CHAR(5),
    PRIMARY KEY(NAME_)
);

For this example I took a short cut and leveraged on hibernate’s automatic schema creation parameters to create it for me.  So I just created an entity bean with the following structure and JPA configuration took care of the rest.

@Entity
@Table(name="SEARCH_INDEX")
public class SearchIndex {
    @Id
    @Column(name="NAME_")
    String name;
    @Lob
    @Column(name="VALUE_")
    Serializable value;
    @Column(name="SIZE_")
    Integer size;
    @Column(name="LF_")
    Timestamp timeStamp;
    @Column(name="DELETED_")
    String deleted;
//...

And then to use this table as the storage location for my JdbcDirectory all I need to do was instantiate the class with a reference to this table through the Datasource injected by my container.

@Singleton
public class SearchEngine {
    @Resource(mappedName="jdbc/testDs") 
    DataSource dataSource;
    
    @PersistenceContext(unitName = "arquilianPU")
    protected EntityManager entityManager;
    
    private static IndexWriter iw = null;
    private static QueryParser parser = null;
    private static Analyzer analyzer = null;
    private static Directory d = null;
    
    @PostConstruct
    public void postConstruct() throws Exception{
        d = 
        new JdbcDirectory(dataSource, "SEARCH_INDEX"){
            @Override
            public String[] listAll() throws IOException {
                return list();
            }
        };
        
        analyzer = new StandardAnalyzer(Version.LUCENE_30);
        iw = new IndexWriter(d, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
        parser = new QueryParser(Version.LUCENE_30, "searchfield",analyzer);
    }

An obvious question at this point should be how I got the datasource configured.  In this example I am using hsqldb as my database.  My persistence provider is hibernate.  The datasource definition is done in a glassfish configuration file.  This configuration is used by Arquillian to create a datasource for me.  Once created the same is used in my persistence.xml through a JNDI lookup.  Hence all the pieces connect together.  Now, thanks to arquillian all these resources can trivially be injected into all my components just the same way as I would when running my application in a JEE container.  What is even more awesome is the test code to a large extent is portable.  By changing the configuration xml files to those specific to some other container – say Jboss, I can easily test the same code in it.  This gives tremendous power to engineering to code with utmost confidence.    One large missing link in the puzzle was portable JNDI names.  Now since JEE6 has standardized it that is not a concern either.  You can be guaranteed to lookup your components with the same JNDI name across containers.  All these changes have transformed the sophistication and testability of JEE code.

Code snippet for doing the actual indexing and retrieval ironically is not as important.  It is standard boiler plate with some clever use of a GenericDao gig.  It is powerful enough to index and retrieve back different entities; though the code is not fully elaborated to that level.

public <T> void doIndex(T entity) throws Exception{
    iw.addDocument(getDocument(entity));
    iw.commit();
}
 
@PersistenceContext(unitName = "arquilianPU")
protected EntityManager entityManager;
public <T> List<T> doSearch(Class<T> classToSearch, String query) throws Exception{
    IndexSearcher searcher = new IndexSearcher(d);
    Query q = parser.parse(query+"*");
    System.out.println(q);
    TopDocs rs = searcher.search(q, null, 10);
    System.out.println("Total Hits:" + rs.totalHits);
    ScoreDoc[] scoreDocs =  rs.scoreDocs;
    if (null == scoreDocs) return new ArrayList<T>();
    List<T> matching = new ArrayList<T>();
    GenericDao<T, Long> dao = new GenericJpaDaoImpl<T, Long>(classToSearch){};
    dao.setEntityManager(entityManager);
    for (ScoreDoc aDoc : scoreDocs){
        Document document = searcher.doc(aDoc.doc);
        Long id = Long.valueOf(document.get("id")) ;
        matching.add(dao.findById(id));
    }
    return matching;
}
 
private<T> Document getDocument(T entity){
    if (entity instanceof Person){
        Person p = (Person)(entity);
        Document d = new Document();
        d.add(new Field("id", p.getId().toString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
        d.add(new Field("class", Person.class.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
        d.add(new Field("name", p.getName(),Field.Store.YES, Field.Index.ANALYZED));
        d.add(new Field("country", p.getCountry(),Field.Store.YES, Field.Index.ANALYZED));
        ArrayList<String> hobbies = p.getHobbies();
        String hobbiesString = null;
        if (null != hobbies){
            hobbiesString = "";
            for (String aHobby : hobbies){
                hobbiesString = hobbiesString + aHobby + " ";
            }
            d.add(new Field("hobbies", hobbiesString ,Field.Store.NO, Field.Index.ANALYZED));
        }
        d.add(new Field("searchfield", (p.getName() + " " + p.getCountry())  + (hobbiesString == null ? "" : " " + hobbiesString),Field.Store.NO, Field.Index.ANALYZED));
        System.out.println(d);
        return d;
    }
    return null;
}

The code referenced in this post can be cloned from my github repository.  To run it use maven command

mvn clean install -Pgf3

One Thought on “Testing Lucene JdbcDirectory with Arquillian

  1. Cool… Will try and explore more. Thanks a lot Boni!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation