專案中要用到全文檢索的功能, 從Lucene開始看起, 後來發現Compass, Compass將Lucene底層封裝後來使用, 而且可以配合Hibernate跟Spring, 可以直接在Hibernate更新資料時一併處理index, 比直接使用Lucene要方便得多, 只是資料實在不算多, 而且不少資料是較舊的, 花了不少時間在試, 當然要記一下囉! 不過只記設定方式, 相關用法....懶, 有機會再整理筆記好了...
一、Maven Dependencies
<!--Search Engine--> <dependency> <groupId>org.compass-project</groupId> <artifactId>compass</artifactId> <version>2.1.3</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-highlighter</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queries</artifactId> <version>2.4.1</version> </dependency>
更新其實還蠻快的, Lucene一更新, Compass也隨即有新的對應版本, Maven中的dependency會用到大概就是這些.
二、Spring Beans Configuration
<bean id="annotationConfiguration"
class="org.compass.annotations.config.CompassAnnotationsConfiguration" />
<bean id="compass" class="org.compass.spring.LocalCompassBean">
<!-- xml configuration mode
<property name="resourceLocations">
<list>
<value>classpath:your/domain/Entity.cmd.xml</value>
</list>
</property>
-->
<!-- anontaition mode -->
<property name="classMappings">
<list>
<!--<value>your.domain.Entity</value>-->
</list>
</property>
<property name="compassConfiguration" ref="annotationConfiguration" />
<property name="compassSettings">
<props>
<prop key="compass.engine.connection">
${compass.engine.connection}</prop>
<prop key="compass.transaction.factory">
org.compass.spring.transaction.SpringSyncTransactionFactory</prop>
<prop key="compass.engine.optimizer.aggressive.mergeFactor">0</prop>
<prop key="compass.engine.analyzer.default.type">
org.apache.lucene.analysis.cjk.CJKAnalyzer</prop>
</props>
</property>
<property name="transactionManager" ref="transactionManager" />
</bean>
<bean id="hibernateGpsDevice" class="org.compass.gps.device.hibernate.HibernateGpsDevice">
<property name="name" value="hibernateDevice" />
<property name="sessionFactory" ref="sessionFactory" />
<property name="nativeExtractor">
<bean class="org.compass.spring.device.hibernate.SpringNativeHibernateExtractor" />
</property>
</bean>
<bean id="compassGps" class="org.compass.gps.impl.SingleCompassGps"
init-method="start" destroy-method="stop">
<property name="compass" ref="compass" />
<property name="gpsDevices">
<list>
<!--
When using {SpringSyncTransactionFactory}, this gps device
wrapper(SpringSyncTransactionGpsDeviceWrapper) should be used to
wrap all the devices
-->
<bean
class="org.compass.spring.device.SpringSyncTransactionGpsDeviceWrapper">
<property name="transactionManager" ref="transactionManager" />
<property name="gpsDevice" ref="hibernateGpsDevice" />
</bean>
</list>
</property>
</bean>
大部份都是制定的, 只有少數是可以讓你變動, 我的設定跟網路上其他可以找到的資料來比, 有差別的是hibernateGpsDevice跟compassGps.gpsDevices, hibernateGpsDevice用org.compass.gps.device.hibernate.HibernateGpsDevice是因為org.compass.spring.device.hibernate.SpringHibernate3GpsDevice在2.0M1時就設定為deprecated, 必需改用HibernateGpsDevice, 而gpsDevices使用org.compass.spring.device.SpringSyncTransactionGpsDeviceWrappe的原因在這Class的Javadoc中, 當使用SpringSyncTransactionFactory來管transaction時, 就要使用SpringSyncTransactionGpsDeviceWrapper將其他GPS Devices封裝.
接下再透過cmd.xml或annotation來設定Searchable的Class跟Index properties後就可以很容易的操作了, Compass的reference文件相當清楚, 仔細讀過的話大部份問題都可以找到答案, 配合PDFBox跟POI等OSS, 還可以將PDF, Word, Excel, PowerPoint中的內容取出做為檢索資料, 完成一個簡單的全文檢索系統實在不費什麼力氣...
沒有留言:
張貼留言