Elasticsearch6.x 中文分词安装与配置

当前Elasticsearch版本
1
"number": "6.4.2"
找到对应的分词插件analysis-ik版本

https://github.com/medcl/elasticsearch-analysis-ik/releases

QQ20181023-151124.png

elasticsearch-plugin install to …
1
2
3
4
5
6
7
8
9
10
11
12
~ » elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.4.2/elasticsearch-analysis-ik-6.4.2.zip                                                 loki@Loki
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.4.2/elasticsearch-analysis-ik-6.4.2.zip
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed analysis-ik
重启elasticsearch后查看插件状态
1
2
3
4
5
6
7
8
9
10
11
12
"plugins": [
{
"name": "analysis-ik",
"version": "6.4.2",
"elasticsearch_version": "6.4.2",
"java_version": "1.8",
"description": "IK Analyzer for Elasticsearch",
"classname": "org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin",
"extended_plugins": [ ],
"has_native_controller": false
}
]
接下来在配置中定义
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public function createdIndex()
{
$params = [
'index' => 'test_index', //索引名称
'body' => [
'settings' => [
'number_of_shards' => 3, //分片数
'number_of_replicas' => 2 //副本数
],
'mappings' => [
'test_type' => [
'properties' => [
// 定义分词库
'title' => [
'type' => 'text',
'analyzer' => 'ik_max_word',
'search_analyzer' => 'ik_max_word'
],
]
]
]
]
];
$result = ClientBuilder::create()
->setHosts(['localhost:9200'])
->build()
->indices()
->create($params);
return $result;
}

这里注意,如果是已有配置,可以更新Mapings,我这里演示是为重新创建索引建立新的Mappings

查看Mapings
1
2
3
4
5
6
7
8
9
10
11
12
13
14
array:1 [▼
"test_index" => array:1 [▼
"mappings" => array:1 [▼
"test_type" => array:1 [▼
"properties" => array:1 [▼
"title" => array:2 [▼
"type" => "text"
"analyzer" => "ik_max_word"
]
]
]
]
]
]
新建几条文档数据
  1. 中华人民共和国
  2. 人民解放战争已经胜利
  3. 我军是共和国最强大的一支军队
    这里我们以中华人民共和国为例,分词插件会自动拆解为: 中华 人民 共和 共和国 中华人民…等等词语
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 1,
"_source": {
"title": "人民解放战争已经胜利"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 1,
"_source": {
"title": "中华人民共和国"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_score": 1,
"_source": {
"title": "我军是共和国最强大的一支军队"
}
}
]
}
}
构建搜索
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 /**
* @return array
* 根据条件检索
*/
public function search()
{
$params = [
'index' => 'test_index',
'type' => 'test_type',
'body' => [
'query' => [
'match' => [
'title' => '人民万岁,共和国万岁'
]
],
//高亮标签
'highlight' => [
'pre_tags' => ["<em>"],
'post_tags' => ["</em>"],
'fields' => [
'title' => new \stdClass()
]
]
]
];
$result = ClientBuilder::create()
->setHosts(['localhost:9200'])
->build()
->search($params);
return $result;
}
返回结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.3170843,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_score": 1.3170843,
"_source": {
"title": "中华人民共和国"
},
"highlight": {
"title": [
"中华<em>人民</em><em>共和</em><em>国</em>"
]
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_score": 0.51676416,
"_source": {
"title": "我军是共和国最强大的一支军队"
},
"highlight": {
"title": [
"我军是<em>共和</em><em>国</em>最强大的一支军队"
]
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_score": 0.2876821,
"_source": {
"title": "人民解放战争已经胜利"
},
"highlight": {
"title": [
"<em>人民</em>解放战争已经胜利"
]
}
}
]
}
}