10 4月 2009

Compass Configurations of Spring

專案中要用到全文檢索的功能,  從Lucene開始看起, 後來發現Compass, Compass將Lucene底層封裝後來使用, 而且可以配合Hibernate跟Spring, 可以直接在Hibernate更新資料時一併處理index, 比直接使用Lucene要方便得多, 只是資料實在不算多, 而且不少資料是較舊的, 花了不少時間在試, 當然要記一下囉! 不過只記設定方式, 相關用法....懶, 有機會再整理筆記好了...

一、Maven Dependencies

<!--Search Engine-->
<dependency>
 <groupId>org.compass-project</groupId>
 <artifactId>compass</artifactId>
 <version>2.1.3</version>
</dependency>
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-core</artifactId>
 <version>2.4.1</version>
</dependency>
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-highlighter</artifactId>
 <version>2.4.1</version>
</dependency>
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-analyzers</artifactId>
 <version>2.4.1</version>
</dependency>
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-queries</artifactId>
 <version>2.4.1</version>
</dependency>

更新其實還蠻快的, Lucene一更新, Compass也隨即有新的對應版本, Maven中的dependency會用到大概就是這些.

二、Spring Beans Configuration

<bean id="annotationConfiguration"
  class="org.compass.annotations.config.CompassAnnotationsConfiguration" />
<bean id="compass" class="org.compass.spring.LocalCompassBean">
 <!-- xml configuration mode 
 <property name="resourceLocations">
  <list>
   <value>classpath:your/domain/Entity.cmd.xml</value>
  </list>
 </property>
 -->
 <!-- anontaition mode -->
 <property name="classMappings">
  <list>
   <!--<value>your.domain.Entity</value>-->
  </list>
 </property>
 <property name="compassConfiguration" ref="annotationConfiguration" />
 <property name="compassSettings">
  <props>
   <prop key="compass.engine.connection">
    ${compass.engine.connection}</prop>
   <prop key="compass.transaction.factory">
    org.compass.spring.transaction.SpringSyncTransactionFactory</prop>
   <prop key="compass.engine.optimizer.aggressive.mergeFactor">0</prop>
   <prop key="compass.engine.analyzer.default.type">
    org.apache.lucene.analysis.cjk.CJKAnalyzer</prop>
  </props>
 </property>
 <property name="transactionManager" ref="transactionManager" />
</bean>
<bean id="hibernateGpsDevice" class="org.compass.gps.device.hibernate.HibernateGpsDevice">
 <property name="name" value="hibernateDevice" />
 <property name="sessionFactory" ref="sessionFactory" />
 <property name="nativeExtractor">
  <bean class="org.compass.spring.device.hibernate.SpringNativeHibernateExtractor" />
 </property>
</bean>

<bean id="compassGps" class="org.compass.gps.impl.SingleCompassGps"
 init-method="start" destroy-method="stop">
 <property name="compass" ref="compass" />
 <property name="gpsDevices">
  <list>
   <!--
   When using {SpringSyncTransactionFactory}, this gps device
   wrapper(SpringSyncTransactionGpsDeviceWrapper) should be used to 
   wrap all the devices
   -->
   <bean
    class="org.compass.spring.device.SpringSyncTransactionGpsDeviceWrapper">
    <property name="transactionManager" ref="transactionManager" />
    <property name="gpsDevice" ref="hibernateGpsDevice" />
   </bean>
  </list>
 </property>
</bean>

大部份都是制定的, 只有少數是可以讓你變動, 我的設定跟網路上其他可以找到的資料來比, 有差別的是hibernateGpsDevice跟compassGps.gpsDevices, hibernateGpsDevice用org.compass.gps.device.hibernate.HibernateGpsDevice是因為org.compass.spring.device.hibernate.SpringHibernate3GpsDevice在2.0M1時就設定為deprecated, 必需改用HibernateGpsDevice, 而gpsDevices使用org.compass.spring.device.SpringSyncTransactionGpsDeviceWrappe的原因在這Class的Javadoc中, 當使用SpringSyncTransactionFactory來管transaction時, 就要使用SpringSyncTransactionGpsDeviceWrapper將其他GPS Devices封裝.

接下再透過cmd.xml或annotation來設定Searchable的Class跟Index properties後就可以很容易的操作了, Compass的reference文件相當清楚, 仔細讀過的話大部份問題都可以找到答案, 配合PDFBox跟POI等OSS, 還可以將PDF, Word, Excel, PowerPoint中的內容取出做為檢索資料, 完成一個簡單的全文檢索系統實在不費什麼力氣...