2023年4月 – 第 4069 页 – haodro.com

2023-04-20 15:07| 来源: 网络整理| 查看: 265

请教coreseek的问题

必须用ID值回查数据因为coreseek只保存了你索引字段的信息你可以通过多索引几个字段，来保证在列表输出的相关信息不用再次回查数据库实现你的效果

谁有coreseek Windows安装包

一、关于SphinxSphinx 是一个在GPLv2 下发布的一个全文检索引擎，商业授权（例如, 嵌入到其他程序中）需要联系作者（Sphinxsearch.com）以获得商业授权。一般而言，Sphinx是一个独立的搜索引擎，意图为其他应用提供高速、低空间占用、高结果相关度的全文搜索功能。Sphinx可以非常容易的与SQL数据库和脚本语言集成。当前系统内置MySQL和PostgreSQL 数据库数据源的支持，也支持从标准输入读取特定格式的XML数据。通过修改源代码，用户可以自行增加新的数据源（例如：其他类型的DBMS的原生支持）。搜索API支持PHP、Python、Perl、Rudy和Java，并且也可以用作MySQL存储引擎。搜索API非常简单，可以在若干个小时之内移植到新的语言上。Sphinx特性：* 高速的建立索引(在当代CPU上，峰值性能可达到10MB/秒);* 高性能的搜索(在2–4GB的文本数据上，平均每次检索响应时间小于0.1秒);* 可处理海量数据(目前已知可以处理超过100GB的文本数据,在单一CPU的系统上可处理100M文档);* 提供了优秀的相关度算法，基于短语相似度和统计（BM25）的复合Ranking方法;* 支持分布式搜索;*提供文件的摘录生成;*可作为MySQL的存储引擎提供搜索服务;*支持布尔、短语、词语相似度等多种检索模式;*文档支持多个全文检索字段(最大不超过32个);*文档支持多个额外的属性信息(例如：分组信息，时间戳等);*停止词查询;*支持单一字节编码和UTF-8编码;*原生的MySQL支持(同时支持MyISAM和InnoDB);*原生的PostgreSQL支持.中文手册可以在这里获得（酷勤网备用下载地址：sphinx_doc_zhcn_0.9.pdf）。二、Sphinx在windows上的安装1.直接在找到最新的windows版本，我这里下的是Win32 release binaries with MySQL support，下载后解压在D:\sphinx目录下；2.在D:\sphinx\下新建一个data目录用来存放索引文件，一个log目录方日志文件，复制D:\sphinx\sphinx.conf.in到D:\sphinx\bin\sphinx.conf（注意修改文件名）；3.修改D:\sphinx\bin\sphinx.conf，我这里列出需要修改的几个：type = mysql # 数据源，我这里是mysqlsql_host = localhost # 数据库服务器sql_user = root # 数据库用户名sql_pass = ’’ # 数据库密码sql_db = test # 数据库sql_port = 3306 # 数据库端口sql_query_pre = SET NAMES utf8 # 去掉此行前面的注释，如果你的数据库是uft8编码的index test1{# 放索引的目录path = D:/sphinx/data/# 编码charset_type = utf-8# 指定utf-8的编码表charset_table = 0..9, A..Z-》a..z, _, a..z, U+410..U+42F-》U+430..U+44F, U+430..U+44F# 简单分词，只支持0和1，如果要搜索中文，请指定为1ngram_len = 1# 需要分词的字符，如果要搜索中文，去掉前面的注释ngram_chars = U+3000..U+2FA1F}# index test1stemmed : test1# {# path = @CONFDIR@/data/test1stemmed# morphology = stem_en# }# 如果没有分布式索引，注释掉下面的内容# index dist1# {# ’distributed’ index type MUST be specified# type = distributed# local index to be searched# there can be many local indexes configured# local = test1# local = test1stemmed# remote agent# multiple remote agents may be specified# syntax is ’hostname:port:index1,# agent = localhost:3313:remote1# agent = localhost:3314:remote2,remote3# remote agent connection timeout, milliseconds# optional, default is 1000 ms, ie. 1 sec# agent_connect_timeout = 1000# remote agent query timeout, milliseconds# optional, default is 3000 ms, ie. 3 sec# agent_query_timeout = 3000# }# 搜索服务需要修改的部分searchd{# 日志log = D:/sphinx/log/searchd.log# PID file, searchd process ID file namepid_file = D:/sphinx/log/searchd.pid# windows下启动searchd服务一定要注释掉这个# seamless_rotate = 1}4.导入测试数据C:\Program Files\MySQL\MySQL Server 5.0\bin》mysql -uroot test《d:/sphinx/example.sql5.建立索引D:\sphinx\bin》indexer.exe –allSphinx 0.9.8-release (r1533)Copyright (c) 2001-2008, Andrew Aksyonoffusing config file ‘./sphinx.conf’…indexing index ‘test1′…collected 4 docs, 0.0 MBsorted 0.0 Mhits, 100.0% donetotal 4 docs, 193 bytestotal 0.101 sec, 1916.30 bytes/sec, 39.72 docs/secD:\sphinx\bin》6.搜索’test’试试D:\sphinx\bin》search.exe testSphinx 0.9.8-release (r1533)Copyright (c) 2001-2008, Andrew Aksyonoffusing config file ‘./sphinx.conf’…index ‘test1′: query ‘test ‘: returned 3 matches of 3 total in 0.000 secdisplaying matches:1. document=1, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008id=1group_id=1group_id2=5date_added=2008-11-26 14:58:59title=test onecontent=this is my test document number one. also checking search withinphrases.2. document=2, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008id=2group_id=1group_id2=6date_added=2008-11-26 14:58:59title=test twocontent=this is my test document number two3. document=4, weight=1, group_id=2, date_added=Wed Nov 26 14:58:59 2008id=4group_id=2group_id2=8date_added=2008-11-26 14:58:59title=doc number fourcontent=this is to test groupswords:1. ‘test’: 3 documents, 5 hitsD:\sphinx\bin》都所出来了吧。6.测试中文搜索修改test数据库中documents数据表，UPDATE `test`.`documents` SET `title` = ‘测试中文’, `content` = ‘this is my test document number two，应该搜的到吧’ WHERE `documents`.`id` = 2;重建索引：D:\sphinx\bin》indexer.exe –all搜索’中文’试试：D:\sphinx\bin》search.exe 中文Sphinx 0.9.8-release (r1533)Copyright (c) 2001-2008, Andrew Aksyonoffusing config file ‘./sphinx.conf’…index ‘test1′: query ‘中文 ‘: returned 0 matches of 0 total in 0.000 secwords:D:\sphinx\bin》貌似没有搜到，这是因为windows命令行中的编码是gbk，当然搜不出来。我们可以用程序试试，在D:\sphinx\api下新建一个foo.php的文件，注意utf-8编码《?phprequire ’sphinxapi.php’;$s = new SphinxClient();$s-》SetServer(’localhost’,3312);$result = $s-》Query(’中文’);var_dump($result);?》启动Sphinx searchd服务D:\sphinx\bin》searchd.exeSphinx 0.9.8-release (r1533)Copyright (c) 2001-2008, Andrew AksyonoffWARNING: forcing –console mode on Windowsusing config file ‘./sphinx.conf’…creating server socket on 0.0.0.0:3312accepting connections执行PHP查询：php d:/sphinx/api/foo.php

coreseek(sphinx) 如何实现 like 模糊查询

1、sphinx的两个主要进程indexer和searchd。indexer任务是从数据库（或者其他的数据源）收集原始的数据，然后建立相应的索引。searchd则是通过读取indexer建立的索引来响应客户端的请求。2、sphinx工作前提：配置文件修改a、需要让其获取数据源从何而来（即：配置source信息）b、建立索引，对数据源的哪部分数据进行索引等详细信息（即：indexer的所有信息）c、执行indexer生产索引，最用才启用searchd服务3、应用主要是通过api接口实现，支持php、perl、python及ruby等语言调用。

如何使用织梦DedeCMS v5.7全文检索说明

1.1.开始前的准备工作先从sphinx配置目录sphinx变量&索引&日志存放目录1.2.创建配置文件由于dedecms使用的是mysql,所以我们需要来配置一个mysql的sphinx模板配置,可以复制csft_mysql.conf改名为:csft_dedecmsv57.conf,例如我们这里仅做文章的全文检索,我们需要做如下配置:先在DedeCMS中创建一个统计表,方法可以在DedeCMS后台中执行下列代码:CREATE TABLE `dede_sphinx` ( `countid` int(11) unsigned NOT NULL, `maxaid` int(11) unsigned NOT NULL, PRIMARY KEY (`countid`) ) ENGINE=MyISAM DEFAULT CHARSET=gbk这是一个sphinx内容统计表,为了适合数据量较大的情况下分批生成索引而使用的.创建完数据表后,我们对sphinx的配置文件,即csft_dedecmsv57.conf修改,内容如下,其中包含注释:——————————————————————————————–#源定义source mysql{ type = mysql # 数据库服务器基本配置信息 sql_host = 192.168.0.103 sql_user = dedev57 sql_pass = dedecms sql_db = dedecmsv57gbk sql_port = 3306

# 设定编码,这里我们是gbk编码,如果是utf-8,可以设置: # sql_query_pre = SET NAMES utf8 sql_query_pre = SET NAMES gbk

# 数据检索增量 sql_range_step = 1000

#当前最新文档id数 sql_query_pre = REPLACE INTO dede_sphinx SELECT 1, MAX(id) FROM dede_archives

#检索条件 sql_query = SELECT ARC.id,ARC.typeid,ARC.typeid2,ARC.sortrank,ARC.flag,ARC.channel,ARC.ismake,ARC.arcrank,ARC.click,ARC.title,ARC.shorttitle,ARC.color,ARC.writer,ARC.source,ARC.litpic,ARC.pubdate,ARC.senddate,ARC.mtype,ARC.description,ARC.badpost,ARC.goodpost,ARC.scores,ARC.lastpost,ARC.keywords,ARC.mid,ART.body FROM dede_archives AS ARC LEFT JOIN dede_addonarticle AS ART ON ARC.id = ART.aid WHERE ARC.id》=$start AND ARC.id《=$end #sql_query第一列id需为整数 #title、body作为字符串/文本字段，被全文索引http://www.dede58.com/a/dedeaz/1678.html

sphinx coreseek的全文检索功能，如果表里没有自增主键，还能全文检索吗

不能，我刚测试过，必须是自增主键才可以，其它类型主键搜不出来

时索引，就是我新插入数据库一条数据，coreseek 检索不出来，因为没有跟新到索引文件，还有就是coreseek的增量索引是自动跟新！

mysql怎么连接coreseek

安装MySql和CoreSeek这里用的是MySql5.0和Coreseek4.0.1-win32安装参考：MySql5.0安装指南默认主机：localhost默认MySql端口：3306下载解压Coreseek：下载Coreseek修改root用户密码默认的root用户没有密码到mysql安装目录的bin目录下执行cmd命令：mysqladmin -u root -password 1234将root用户的密码设为：1234创建数据库登陆数据库：mysql -u root -p1234创建数据库：create database html_url;显示数据库：show databases;

windows下怎么安装coreseek

1、下载，我这里下的是“Win64 binaries w/MySQL+PgSQL+libstemmer+id64 support”，下载后文件名：sphinx-2.0.6-release-win64-id64-full.zip；2、将其解压到D: \ sphinx，并在D:\sphinx下新建目录data(用来存放索引文件)与log(用来存放日志文件)；3、将D:\sphinx\sphinx.conf.in复制到D:\sphinx\bin\sphinx.conf.in，并重命名为sphinx.conf；4、修改 D:\sphinx\bin\sphinx.conf 如下：4.1、搜索source src1修改{…}中的内容# 使用的数据库类型type = mysql# 服务器sql_host = localhost# 数据库登录名sql_user = root# 数据库登录密码sql_pass = root# 操作的数据库名称sql_db = test# 数据库服务器端口sql_port = 3306# 设置编码，如果用的是utf-8编码sql_query_pre = SET NAMES utf-8（以上7条前如有#将其删除）4.2、搜索index test1修改{…}中的内容# 放索引的目录path = D:/sphinx/data/# 编码charset_type = utf-8# 指定utf-8编码表charset_table = 0..9, A..Z-》a..z, _, a..z, U+410..U+42F-》U+430..U+44F, U+430..U+44F# 简单分词，只有0和1，需要搜索中文必须置1ngram_len = 1# 需要分词的字符，搜索中文时必须ngram_chars = U+3000..U+2FA1F（以上5条前如有#将其删除）5、导入测试数据将D:\sphinx\example.sql中语句执行到test数据库中，注意：test数据库创建时需要指定为utf-8格式；6、打开cmd窗口，进入目录D:\sphinx\bin；7、建立索引，执行indexer.exe test1，test1即为sphinx.conf中index test1Sphinx 2.0.6-id64-release (r3473)Copyright (c) 2001-2012, Andrew AksyonoffCopyright (c) 2008-2012, Sphinx Technologies Inc (

coreseek3.2 php 怎样更新索引

php是无法更新 coreseek 的索引的，需要使用coreseek的语法，配合定时任务来自动更新索引。

这个写起来很麻烦，我们的系统正好用了 coreseek ，说一下我的思路吧。

1、首先建立一个 search 表，这个表用来存你要进行搜索的、经过分词的数据，分词系统你们自己选，我使用的是php的pscws4中文分词。

DROP TABLE IF EXISTS `search`;CREATE TABLE `search` ( `searchid` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `content` text NOT NULL, `add_time` int(11) NOT NULL, PRIMARY KEY (`searchid`)) ENGINE=MyISAM AUTO_INCREMENT=15209 DEFAULT CHARSET=utf8;

2、还需要一个索引计数表 search_counter，这个表用来存放每次索引更新后的最大一个ID，下次更新索引的时候，就不需要从头更新了，只需要比这个ID大的就可以。

DROP TABLE IF EXISTS `search_counter`;CREATE TABLE `search_counter` ( `counter_id` int(11) NOT NULL, `max_doc_id` int(11) NOT NULL, PRIMARY KEY (`counter_id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;

3、配置 coreseek ，以下是我在windows下 coreseek的配置文件，linux 在服务器上，没去找。这里配置了2个索引源，一个是main，一个是增量索引delta，这样不需要每次重建所有索引，只需要合并 main和delta就可以了。

#源定义source main{ type = mysql sql_host = 192.168.0.10 sql_user = root sql_pass = root sql_db = database sql_port = 3306 sql_query_pre = SET NAMES utf8 sql_query_pre = REPLACE INTO search_counter SELECT 1, MAX(searchid) FROM qhb_search sql_query = SELECT searchid, title, content, controller_id, controller,add_time FROM search #sql_query第一列id需为整数 #title、content作为字符串/文本字段，被全文索引 #sql_attr_uint = searchid #从SQL读取到的值必须为整数sql_attr_uint = controller_id # 数据库ID过滤sql_attr_uint = controller # 控制器过滤 sql_attr_timestamp = add_time #从SQL读取到的值必须为整数，作为时间属性 sql_query_info_pre = SET NAMES utf8 #命令行查询时，设置正确的字符集 #sql_query_info = SELECT * FROM qhb_search WHERE searchid=$searchid #命令行查询时，从数据库读取原始数据信息}source delta : main { sql_query_pre = SET NAMES utf8 sql_query = SELECT searchid, title, content, controller_id, controller,add_time FROM qhb_search WHERE searchid》( SELECT max_doc_id FROM qhb_search_counter WHERE counter_id=1 )sql_query_post = REPLACE INTO qhb_search_counter SELECT 1, MAX(searchid) FROM qhb_search }#index定义index main{ source = main #对应的source名称 path = D:/WebSoft/coreseek/var/data/main #请修改为实际使用的绝对路径，例如：/usr/local/coreseek/var/… docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 #中文分词配置，详情请查看：

D:\WebSoft\coreseek\bin\indexer –all –config d:\WebSoft\coreseek\bin\sphinx.conf

5、配置并启动服务

D:\WebSoft\coreseek\bin\searchd –install –config D:\WebSoft\coreseek\bin\sphinx.conf –servicename coreseek

6、Windows创建定时任务，每分钟更新一次索引

D:\WebSoft\coreseek\bin\indexer.exe –config D:\WebSoft\coreseek\bin\sphinx.conf delta –rotateecho indexing, window will close when complete

7、Windows创建定时任务，每天凌晨2点合并索引

D:\WebSoft\coreseek\bin\indexer.exe –config D:\WebSoft\coreseek\bin\sphinx.conf –merge main delta –rotateecho indexing, window will close when complete

8、附上创建索引，重建索引，合并索引在windows及linux上的方法，以及一些使用上的小问题

windows：建立索引D:\WebSoft\coreseek\bin\indexer –all –config d:\WebSoft\coreseek\bin\sphinx.conf重建索引D:\WebSoft\coreseek\bin\indexer –config D:\WebSoft\coreseek\bin\sphinx.conf main –rotate增量索引D:\WebSoft\coreseek\bin\indexer –config D:\WebSoft\coreseek\bin\sphinx.conf delta –rotate合并索引D:\WebSoft\coreseek\bin\indexer –config D:\WebSoft\coreseek\bin\sphinx.conf –merge main delta –rotate配置并启动服务D:\WebSoft\coreseek\bin\searchd –install –config D:\WebSoft\coreseek\bin\sphinx.conf –servicename coreseek创建自定义词库方法：1、先去 http://pinyin.sogou.com/dict/ 搜狗细胞词库下载需要的词库2、使用深蓝词库转换将词库转换为 txt3、使用PHP程序将生成的txt转换为 coreseek 所需要的格式4、附加到 unigram.txt5、使用命令更新分词词库 cmd 进入 bin目录，执行下面命令 mmseg -u D:\WebSoft\coreseek\etc\unigram.txt6、将生成的 unigram.txt.uni 改名为：uni.lib7、重建索引8、重启coreseek服务注意：必须先建立索引，服务才能启动1、coreseek索引或者查询时提示ERROR: invalid token in etc解决办法该提示表示当前的配置文件的编码不是UTF-8（无BOM头）格式，无法正确解析，请使用软件打开配置文件，另存为UTF-8（无BOM头）格式；2、failed to lock …..try –rotate 索引已经建立，使用重建索引命令3、报警告：failed to scanf pid from 没有启动coreseek服务4、过滤搜索结果，必须使用数组传递，只支持无符号整数（1-32位宽）;UNIX 时间戳（timestamps）;浮点值（32位，IEEE 754单精度）;字符串序列 (尤其是计算出的整数值);多值属性 MVA( multi-value attributes ) (32位无符号整型值的变长序列)$this-》shpinx-》SetFilter(’controller’, array（1,2） );CENTOS 操作方法开机启动coreseek搜索服务：vi /etc/rc.d/rc.local 在最后一行添加/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/bin/sphinx.conf##如要停止搜索服务，请使用/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/bin/sphinx.conf –stop##如要已启动服务，要更新索引，请使用/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/bin/sphinx.conf –all –rotatelinux下定时任务 crontab -e#凌晨4点合并索引，其余时间每分钟更新索引* 0-3 * * * /usr/local/sphinx/bin/indexer –config /usr/local/sphinx/etc/sphinx.conf delta –rotate* 6-23 * * * /usr/local/sphinx/bin/indexer –config /usr/local/sphinx/etc/sphinx.conf delta –rotate0 4 * * * /usr/local/sphinx/bin/indexer –config /usr/local/sphinx/etc/sphinx.conf –merge main delta –rotate启动服务：/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/bin/sphinx.conf建立索引/usr/local/coreseek/bin/indexer –all –config /usr/local/coreseek/bin/sphinx.conf重建索引/usr/local/coreseek/bin/indexer –config /usr/local/coreseek/bin/sphinx.conf main –rotate增量索引/usr/local/coreseek/bin/indexer –config /usr/local/coreseek/bin/sphinx.conf delta –rotate合并索引/usr/local/coreseek/bin/indexer –config /usr/local/coreseek/bin/sphinx.conf –merge main delta –rotate

在coreseek里设置了sql_attr_string = code ,查询出来显示为0，是什么情况

声明字符串序数属性（attribute）。可声明同一类型的多个不同名称的属性，可选项。仅适用于SQL数据源（mysql,pgsql,mssql）。这个属性类型（简称为字串序数）的设计是为了允许按字符串值排序，但不存储字符串本身。对字串序数做索引时，字符串值从数据库中取出、暂存、排序然后用它们在该有序数组中的序数代替它们自身，因此字串序数是个整型，对它们的大小比较与在原字串上做字典序比较结果相同。早期版本上，对字串序数做索引可能消耗大量的RAM。自r1112起，字串序数的积累和排序也可在固定大小的内存中解决了（代价是额外的临时磁盘空间），并受mem_limit设置限制。理想中字符串可以根据字符编码和本地字符集（locale）排序。例如，如果已知字符串为KOI8R编码下的俄语字串，那么对字节0xE0,0xE1和0xE2排序结果应为0xE1,0xE2和0xE0，因为0xE0在KOI8R中代表的字符明显应在0xE1和0xE2之后。但很不幸，Sphinx目前不支持这个功能，而是简单地按字节值大小排序。请注意，这里的序号是每个索引根据自身数据计算的，因此在同时读取多个索引事实无法同时保留正确的顺序进行合并的。处理后的字符串被替换为处理时其在索引中的序列号，但是不同的索引具有不同的字符串集。例如，如果’main’索引包含字符串“aaa“, “bbb“, “ccc“, 直到 “zzz“，它们将会被分别分配数值为1,2,3,直到26。但是’delta’如果仅包含“zzz“，则会被分配数值1。那么在合并后，该顺序将被打乱。不幸的是，在不存储原始字符串的情况下，这个问题无法解决（一旦存储原始字符串，序号将没有任何用处了）。你的版本比我高，我用的Sphinx 0.9.9/Coreseek 3.2

【本文地址】

公司简介

联系我们