make your own google - Linux

Robert avatar
By Robert
at 2009-11-24T21:06

Table of Contents

http://combine.it.lth.se/SearchEngineBox/




SearchEngine in a Box using Combine/Zebra

Sprung from development in the EU project ALVIS (IST-1-002068-STP) with the
help of .SE:s Internetfond and based on the two systems Combine Focused
Crawler and Zebra text indexing and retrieval engine. This system allows you
build a vertical search engine for your favorite topic in just 5 easy steps.
But before that you have to install the system on your machine. (Or you can
try it out online before installing).
Installation and testing instructions

Edit /etc/apt/sources.list and add
deb http://combine.it.lth.se/ debian/
deb http://ftp.indexdata.dk/debian sarge main
deb-src http://ftp.indexdata.dk/debian sarge main
Get the crawler, indexer and XSLT tools. Run:
sudo apt-get update
sudo apt-get install combine idzebra2.0 yaz xsltproc
Make sure you have combine version 3.4 or better.
Download the 'SearchEngine ina Box' system, unpack it, and change to where
the software was unpacked. Run
tar zxf SEbox.tgz
cd SearchEngineBox
Initialize crawler for simple test. Run:
sudo combineINIT --jobname atest
combineCtrl --jobname atest load < seeds.txt
Change to the Zebra configuration directory:
cd ZebraConf
make Combine
Tell Zebra where it should run. Edit ZebraConf.xml and change
<host>ldbkit06</host>
<port>3003</port>
to whatever host you are running on and your preferred port
Tell the crawler where the indexer is. Edit /etc/combine/atest/combine.cfg
and add
ZebraHost = <host>:<port>
at the end
ie for the original ZebraConf.xml it would be
ZebraHost = ldbkit06:3003
Generate Zebra configuration. Run
make rmConfs
make
Start the Zebra indexing and database server. Run
rm server.log
zebrasrv -f yazserver.xml -l server.log &
You might consider copying the simple UI to a Web-server (see instructions at
the end of the README file in this directory)
Test it all by starting the simple test crawling. Run
combineCtrl --jobname atest start
You should see things happening in the Zebra log ZebraConf/server.log
Test searching your new database. Use either or both of these possibilities
Use the explain facility of the database directly by opening the URL
http://<host>:<port>/ in your XML enabled browser like FireFox (use the host
and prot you configured above in the ZebraConf.xml file.
Test searching using the simple UI from the ZebraConf directory.
Kill the crawler and Zebra server. Run
combineCtrl --jobname atest kill
kill `cat lock/zebrasrv.pid`
Now you are ready to tailor it to your own application:
Build a vertical search engine in just 5 easy steps

So once the software is installed and tested ...
Create a configuration for Zebra - see the ZebraConf directory
Configure Combine to the crawl you want. Please refer to Combine
Documentation sections 'Configuration' and 'Use Scenarios'. Specifically you
have to create a topic-definition (section 'Crawler operation') for your
particular topic.
Create the crawler
sudo combineINIT --jobname atest --topic YourTopicDefFile.txt
combineCtrl --jobname atest load < seeds.txt
Tell the crawler where the indexer is. Edit /etc/combine/atest/combine.cfg
and add
ZebraHost = <host>:<port>
at the end, where host and port correspond to your Zebra configuration
Start Zebra and the crawler
zebrasrv -f yazserver.xml -l server.log &
combineCtrl --jobname atest start
Make your own UI
And now it's ready for use, building the database as we speak.
Demos

Simple demonstrators of Vertical Search Engines are available here.
Create your own Vertical Search Engine.

Last updated 2009-06-16 by Anders ArddoE

--
TW -> 曼谷 -> AMS -> PRAHA

--
Tags: Linux

All Comments

not-a-legal-address???

Franklin avatar
By Franklin
at 2009-11-24T19:02
※ [本文轉錄自 Maple 看板] 作者: biach4300736 (盤尼西霖) 看板: Maple 標題: [問題] not-a-legal-address??? 時間: Sun Nov 22 21:38:30 2009 請問一下各位高手andgt;and#34;andlt; 小弟最近奉命幫系上重 ...

pure-ftpd-mysql 中文問題

Brianna avatar
By Brianna
at 2009-11-24T18:31
作業系統是ubuntu 9.10 安裝完成pure-ftpd-mysql後 看安裝項目 沒有pure-ftpd.conf 這個檔案 不過在/pure-ftpd/conf 下我還是可以用echo的方式去產生相對應的功能 想請問 我需要建立pure-ftpd.conf這個檔案嗎 ?? 因為目前 echo 後的檔 ...

XP+virtual box(fedora8) 掛載資料夾

Cara avatar
By Cara
at 2009-11-24T17:50
各位大家好 今天我用virtual box在xp pro內裝了一個fedora8 但是我在掛載資料夾設定好後,不知道要怎麼在linux內設定 才能連線上去,可否請教一下方法呢 - ...

ubuntu dvd 安裝問題

Charlotte avatar
By Charlotte
at 2009-11-24T17:20
在網路抓了這個ubuntu-9.10-dvd-i386 在VMware安裝輸入密碼完後,出現「used名atubuntu:~$」是要打什麼進入ubuntu?? 之前抓ubuntu-9.10-desktop-i386是英文的,搞了半天才懂怎麼改成中文的 但dvd版沒圖形介面,好像dos一樣要打指令,完全看不懂 ...

單網卡 多IP

Sierra Rose avatar
By Sierra Rose
at 2009-11-24T16:03
你好!我有一個問題 在/etc/sysconfig/network-scripts目錄裡 cp ifcfg-eth0 ifcfg-eth0:1 ------------------- ifcfg-eth0 檔內容 DEVICE=eth0 BOOTPROTO=static HWADDR=00:01:02:3 ...