專案中要用到全文檢索的功能, 從Lucene開始看起, 後來發現Compass, Compass將Lucene底層封裝後來使用, 而且可以配合Hibernate跟Spring, 可以直接在Hibernate更新資料時一併處理index, 比直接使用Lucene要方便得多, 只是資料實在不算多, 而且不少資料是較舊的, 花了不少時間在試, 當然要記一下囉! 不過只記設定方式, 相關用法....懶, 有機會再整理筆記好了...
一、Maven Dependencies
<!--Search Engine--> <dependency> <groupId>org.compass-project</groupId> <artifactId>compass</artifactId> <version>2.1.3</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-highlighter</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queries</artifactId> <version>2.4.1</version> </dependency>
更新其實還蠻快的, Lucene一更新, Compass也隨即有新的對應版本, Maven中的dependency會用到大概就是這些.
二、Spring Beans Configuration
<bean id="annotationConfiguration" class="org.compass.annotations.config.CompassAnnotationsConfiguration" /> <bean id="compass" class="org.compass.spring.LocalCompassBean"> <!-- xml configuration mode <property name="resourceLocations"> <list> <value>classpath:your/domain/Entity.cmd.xml</value> </list> </property> --> <!-- anontaition mode --> <property name="classMappings"> <list> <!--<value>your.domain.Entity</value>--> </list> </property> <property name="compassConfiguration" ref="annotationConfiguration" /> <property name="compassSettings"> <props> <prop key="compass.engine.connection"> ${compass.engine.connection}</prop> <prop key="compass.transaction.factory"> org.compass.spring.transaction.SpringSyncTransactionFactory</prop> <prop key="compass.engine.optimizer.aggressive.mergeFactor">0</prop> <prop key="compass.engine.analyzer.default.type"> org.apache.lucene.analysis.cjk.CJKAnalyzer</prop> </props> </property> <property name="transactionManager" ref="transactionManager" /> </bean> <bean id="hibernateGpsDevice" class="org.compass.gps.device.hibernate.HibernateGpsDevice"> <property name="name" value="hibernateDevice" /> <property name="sessionFactory" ref="sessionFactory" /> <property name="nativeExtractor"> <bean class="org.compass.spring.device.hibernate.SpringNativeHibernateExtractor" /> </property> </bean> <bean id="compassGps" class="org.compass.gps.impl.SingleCompassGps" init-method="start" destroy-method="stop"> <property name="compass" ref="compass" /> <property name="gpsDevices"> <list> <!-- When using {SpringSyncTransactionFactory}, this gps device wrapper(SpringSyncTransactionGpsDeviceWrapper) should be used to wrap all the devices --> <bean class="org.compass.spring.device.SpringSyncTransactionGpsDeviceWrapper"> <property name="transactionManager" ref="transactionManager" /> <property name="gpsDevice" ref="hibernateGpsDevice" /> </bean> </list> </property> </bean>
大部份都是制定的, 只有少數是可以讓你變動, 我的設定跟網路上其他可以找到的資料來比, 有差別的是hibernateGpsDevice跟compassGps.gpsDevices, hibernateGpsDevice用org.compass.gps.device.hibernate.HibernateGpsDevice是因為org.compass.spring.device.hibernate.SpringHibernate3GpsDevice在2.0M1時就設定為deprecated, 必需改用HibernateGpsDevice, 而gpsDevices使用org.compass.spring.device.SpringSyncTransactionGpsDeviceWrappe的原因在這Class的Javadoc中, 當使用SpringSyncTransactionFactory來管transaction時, 就要使用SpringSyncTransactionGpsDeviceWrapper將其他GPS Devices封裝.
接下再透過cmd.xml或annotation來設定Searchable的Class跟Index properties後就可以很容易的操作了, Compass的reference文件相當清楚, 仔細讀過的話大部份問題都可以找到答案, 配合PDFBox跟POI等OSS, 還可以將PDF, Word, Excel, PowerPoint中的內容取出做為檢索資料, 完成一個簡單的全文檢索系統實在不費什麼力氣...