Elasticsearch Scrolling 滚动加载数据

批量生成文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
* @return array
* 批量生成一批文档
*/
public function createAllData()
{
//假数据生成器,指定为中文
$faker = \Faker\Factory::create('zh_CN');
for ($i = 1;$i < 1001;$i++){
$params['body'][] = [
'index' => [
'_index' => 'test_index',
'_type' => 'test_type',
'_id' => $i,
]
];
$params['body'][] = [
'title' => $faker->name
];
}
//批量生成
$result = ClientBuilder::create()
->setHosts(['localhost:9200'])
->build()
->bulk($params);
return $result;
}
成功构建出1000条数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "14",
"_score": 1,
"_source": {
"title": "顾伟"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "19",
"_score": 1,
"_source": {
"title": "鄢伟"
}
}
........
先来理解下 Scrolling

Scrolling (游标或者滚动),可以联想成页面不停的向下滚动,随着滚动,数据源源不断的加载取出
场景: 在用批量读取数据时,经常要用 Scrolling 功能对文档进行分页处理,如输出一个用户的所有文档。Scrolling 要比常规的搜索更加高效,因为这里不需要对文档执行性能消耗较大的排序操作。
Scrolling 会保留某个时间点的索引快照数据,然后用快照数据进行分页。即使后台正在执行索引文档、更新文档和删除文档,游标查询窗口仍然可以持续正常分页。如何使用呢?首先,你要在发送搜索请求时增加 Scroll 参数。然后就会返回一个文档『页数』信息,还有一个用来获取 Hits 分页数据的 scroll_id

看代码中实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* @return array
* 根据条件检索
*/
public function search()
{
$hosts = [
'localhost:9200'
];
$client = ClientBuilder::create()->setHosts($hosts)->build();
$params = [
'index' => 'test_index',
'type' => 'test_type',
'scroll' => '30s', //_scroll_id有效时间
'size' => 20, //返回hits条数
'body' => [
// 忽略条件
'query' => [
'match_all' => new \stdClass()
]
]
];
$result = $client->search($params);
return $result;
}
返回结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAqOFmNCN0JzRzdIVHFXQldkVmFRWnBRVUEAAAAAAAAKjxZjQjdCc0c3SFRxV0JXZFZhUVpwUVVBAAAAAAAACpAWY0I3QnNHN0hUcVdCV2RWYVFacFFVQQAAAAAAAAqSFmNCN0JzRzdIVHFXQldkVmFRWnBRVUEAAAAAAAAKkRZjQjdCc0c3SFRxV0JXZFZhUVpwUVVB",
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "14",
"_score": 1,
"_source": {
"title": "顾伟"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "19",
"_score": 1,
"_source": {
"title": "鄢伟"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "22",
"_score": 1,
"_source": {
"title": "原秀云"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "24",
"_score": 1,
"_source": {
"title": "董子安"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "25",
"_score": 1,
"_source": {
"title": "原斌"
}
}
]
}
}

注意第一个参数(_scroll_id),怎么去理解呢,可以把当前查询的5条数据,以base64的方式加密生成一个字符串,这个时候从search()这个方法中获取的是第一组数据,随后包含在scroll()方法中构建第二组,也就是下一页文档数据

1
2
3
4
5
6
7
8
9
10
11
12
13
if (isset($result['hits']['hits']) && count($result['hits']['hits']) > 0) {
// 这里根据返回的$result['hits']['hits] 编写逻辑
......
// 获取到_scroll_id
$scroll_id = $result['_scroll_id'];
//执行下一个滚动请求
$response = $this->setClient()->scroll([
'scroll_id' => $scroll_id,
'scroll' => '30s',
]);
//再次返回与$result一样的结构,同样包含 _scroll_id,最终实现数据滚动加载刷新
return $response;
}

可能理解起来有点绕,这里就像一场接力赛,这场比赛,规定跑多少圈(result[hits][total]/count(result['hits']['total'] / count(result[‘hits’][‘hits’])),首先第一个运动员跑到某个地点,获得一个接力棒(scroll_id),随后他跑一圈,下一个运动员接过接力棒,再往下跑…,直到圈数跑完为止,这是我的理解方式,当然这个特性可以满足的场景其实很多,第一方面可以实现常规的分页,计算好页码数,上一页可以作为旧(scroll_id),下一页通过产生新的(scroll_id),再者就是鼠标滚动加载数据,使用这种方式,是最为高效的

视图
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<table>
<tr>
<th>ID</th>
<th>姓名</th>
</tr>
@if(isset($result['hits']['hits']) && count($result['hits']['hits']) > 0)
@foreach($result['hits']['hits'] as $hits)
<tr>
<td>{{ $hits['_id'] }}</td>
<td>{{ $hits['_source']['title'] }}</td>
</tr>
@endforeach
<tr>
<th colspan="2">
共 {{ $result['hits']['total'] / count($result['hits']['hits']) }} 页
<a href="?sorll={{ $result['_scroll_id'] }}">下一页</a>
</th>
</tr>
@else
无数据
@endif
</table>
展示

QQ20181024-124153.png