目录
一、from+size 浅分页
二、scroll 深分页
三、search_after 深分页
一、from+size 浅分页
浅分页的原理很简单,就是查询前20条数据,然后截断前10条,只返回10-20的数据。这样其实白白浪费了前10条的查询。
es默认采用的是from+size形式,在深度分页的情况下,这种效率是非常低的,但是可以随机跳转页面。
es为了性能,会限制我们分页的深度,es目前支持最大的max_result_window = 10000,也就是from+size的大小不能超过10000。
DSL 查询方式:
GET demo_index/_search
{
"query":{
"match_all": {}
},
"from": 0,
"size": 10,
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
注意:es是基于分片的,假设有3个分片,from=100,size=10。则会根据排序规则从3个分片中各取回100条数据数据,然后汇总成300条数据后选择最前边的10条数据
RestHighLevelClient 查询方式:
/**
* @Description from+size浅分页查询
* @create by meng
*/
private List<SearchHit> docSearch(Date time, String title) {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder builder = ();
(())
.filter(("publish_time").gt(()));
try {
(builder)
.sort("id", )
.sort("publish_time", )
.from(0)
.size(10);
SearchRequest searchRequest = new SearchRequest("demo_index").source(searchSourceBuilder);
SearchResponse searchResponse = (searchRequest, );
SearchHit[] hits = ().getHits();
if ( > 0) {
return (hits);
} else {
return null;
}
} catch (IOException e) {
("doc分页查询异常:{} ", e);
}
return null;
}
二、scroll 深分页
from+size查询在10000-50000条数据(1000到5000页)以内的时候还是可以的,但是如果数据过多的话,就会出现深分页问题。为了这个问题,es提出了scroll滚动查询方式
scroll滚动搜索,会在第一次搜索的时候,保存一个当下的快照。之后只会基于该快照提供数据搜索。在这个期间数据如果发生变动,是不会让用户看到的。推荐非实时处理大量数据的情况可以使用不适用于有跳页的情景
DSL 查询方式:
GET demo_index/_search?scroll=3m
{
"query":{
"match_all": {}
},
"from": 0,
"size": 10,
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
说明:scroll=3m表示设置scroll_id保留3分钟可用
使用scroll必须要将from设置为0,size决定后面每次调用_search搜索返回的数量。
通过数据返回的_scroll_id读取下一页内容,每次请求将会读取下10条数据,直到数据读取完毕或者scroll_id保留时间截止:
GET _search/scroll
{
"scroll_id":"mengliulUaGVuRmV0Y2g7NTsxMDk5NDpkUmpiR2FjOFNhNnlCM1ZDMWpWYnRRO==",
"scroll": "3m"
}
注意:我们需要再次设置游标查询过期时间为3分钟,GET和POST请求均可,scroll是非常消耗资源的,所以当不需要scroll数据的时候,尽可能快的把scroll_id显式删除掉
清除指定的scroll_id:
DELETE _search/scroll/mengliulUaGVuRmV0Y2g7NTsxMDk5NDpkUmpiR2FjOFNhNnlCM1ZDMWpWYnRRO==
清除所有的scroll:
DELETE _search/scroll/_all
RestHighLevelClient 查询方式:
/**
* @Description scroll 深分页
* @create by meng
*/
private void docSearch(Date time, String title) {
List<SearchHit> searchHits = new ArrayList<>();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder builder = ();
(())
.filter(("publish_time").gt(()));
try {
(builder)
.sort("id", )
.sort("publish_time", )
.from(0)
.size(10);
SearchRequest searchRequest = new SearchRequest("demo_index").source(searchSourceBuilder);
//失效时间为3min
Scroll scroll = new Scroll((3));
//封存快照
(scroll);
SearchResponse searchResponse = (searchRequest, );
//计算总页数
long totalCount = ().getTotalHits().value;
int pageSize = (int) ((float) totalCount / 2);
//多次遍历分页,获取结果
String scrollId = ();
for (int i = 1; i <= pageSize; i++) {
//获取scrollId
SearchScrollRequest searchScrollRequest = new SearchScrollRequest(scrollId);
(scroll);
SearchResponse response = (searchScrollRequest, );
SearchHits hits = ();
scrollId = ();
Iterator<SearchHit> iterator = ();
while (()) {
SearchHit next = ();
(next);
}
}
} catch (IOException e) {
("doc分页查询异常:{} ", e);
}
}
三、search_after 深分页
可以在实时数据的情况下深度分页,为了找每一页最后一条数据,每个文档必须有一个全局唯一值
不适用于有跳页的情景。
DSL 查询方式:
GET demo_index/_search
{
"query":{
"match_all": {}
},
"from": 0,
"size": 10,
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
使用search_after必须要将from设置为0
上边的DSL查询中id是唯一不重复字段,publish_time可能会重复
注意:查询结果会返回sort字段,我们在返回的结果集中,获取最后一条数据的sort属性值,提供给下次查询中search_after
GET demo_index/_search
{
"query":{
"match_all": {}
},
"size": 10,
"search_after": [
1638374400000,
"mengliu20211202"
],
"sort": [
{
"id": {
"order": "asc"
},
"publish_time": {
"order": "asc"
}
}
]
}
RestHighLevelClient 查询方式:
/**
* @Description search_after 深分页
* @create by meng
*/
private void docSearch(Date time, String title) {
List<SearchHit> searchHits = new ArrayList<>();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder builder = ();
(())
.filter(("publish_time").gt(()));
try {
(builder)
.sort("id", )
.sort("publish_time", )
.from(0)
.size(10);
SearchRequest searchRequest = new SearchRequest("demo_index").source(searchSourceBuilder);
SearchResponse searchResponse = (searchRequest, );
SearchHit[] hits = ().getHits();
//查询最后一个数据
SearchHit result = hits[ - 1];
//分页查询下一页数据
SearchSourceBuilder searchSourceBuilder2 = new SearchSourceBuilder();
(builder)
.sort("id", )
.sort("publish_time", )
.size(10);
//存储上一次分页的sort信息
(());
SearchRequest searchRequest2 = new SearchRequest("demo_index").source(searchSourceBuilder2);
SearchResponse searchResponse2 = (searchRequest2, );
SearchHit[] hits2 = ().getHits();
} catch (IOException e) {
("doc分页查询异常:{} ", e);
}