一、概要：

1.es默认的分词器对中文支持不好，会分割成一个个的汉字。ik分词器对中文的支持要好一些，主要由两种模式：ik_smart和ik_max_word
2.环境
操作系统：centos
es版本：6.0.0

二、安装插件

1.插件地址：https://github.com/medcl/elasticsearch-analysis-ik
2.运行命令行：

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip

运行完成后会发现多了以下文件：esroot 下的plugins和config文件夹多了analysis-ik目录。

三、重启es

1.查找es进程

ps -ef | grep elastic

2.终止进程
从上面的结果可以看到es进程号是12776.
执行命令：

kill 12776

3.启动es后台运行

./bin/sh elastic search –d

提醒：重启es会重新分片，线上环境要注意了。

四、测试

1.使用ik_max_word分词

GET _analyze

{

   "analyzer":"ik_max_word",

   "text":"*国歌"

}

分词结果：

{

   "tokens": [

     {

       "token": "*",

       "start_offset": 0,

       "end_offset": 7,

       "type": "CN_WORD",

       "position": 0

     },

     {

       "token": "中华人民",

       "start_offset": 0,

       "end_offset": 4,

       "type": "CN_WORD",

       "position": 1

     },

     {

       "token": "中华",

       "start_offset": 0,

       "end_offset": 2,

       "type": "CN_WORD",

       "position": 2

     },

     {

       "token": "华人",

       "start_offset": 1,

       "end_offset": 3,

       "type": "CN_WORD",

       "position": 3

     },

     {

       "token": "人民*",

       "start_offset": 2,

       "end_offset": 7,

       "type": "CN_WORD",

       "position": 4

     },

     {

       "token": "人民",

       "start_offset": 2,

       "end_offset": 4,

       "type": "CN_WORD",

       "position": 5

     },

     {

       "token": "*",

       "start_offset": 4,

       "end_offset": 7,

       "type": "CN_WORD",

       "position": 6

     },

     {

       "token": "共和",

       "start_offset": 4,

       "end_offset": 6,

       "type": "CN_WORD",

       "position": 7

     },

     {

       "token": "国",

       "start_offset": 6,

       "end_offset": 7,

       "type": "CN_CHAR",

       "position": 8

     },

     {

       "token": "国歌",

       "start_offset": 7,

       "end_offset": 9,

       "type": "CN_WORD",

       "position": 9

     }

   ]

}

2.使用ik_smart分词

GET _analyze

{

   "analyzer":"ik_smart",

   "text":"*国歌"

}

分词结果：

{

   "tokens": [

     {

       "token": "*",

       "start_offset": 0,

       "end_offset": 7,

       "type": "CN_WORD",

       "position": 0

     },

     {

       "token": "国歌",

       "start_offset": 7,

       "end_offset": 9,

       "type": "CN_WORD",

       "position": 1

     }

   ]

}

五、java api分词测试

1.调用ik_max_word分词

@Test

public void analyzer_ik_max_word() throws Exception {

     java.lang.String text = "提前祝大家春节快乐！";

    TransportClient client = EsClient.get();

     AnalyzeRequest request = (new AnalyzeRequest()).analyzer("ik_max_word").text(text);

     List<AnalyzeResponse.AnalyzeToken> tokens = client.admin().indices().analyze(request).actionGet().getTokens();

     System.out.println(tokens.size());//

     for (AnalyzeResponse.AnalyzeToken token : tokens) {

         System.out.println(token.getTerm() + " ");

     }

}

结果：

6

提前

祝

大家

春节快乐

春节

快乐

2.调用ik_smart分词

@Test

public void analyzer_ik_smart() throws Exception {

     java.lang.String text = "提前祝大家春节快乐！";

    TransportClient client = EsClient.get();

     AnalyzeRequest request = (new AnalyzeRequest()).analyzer("ik_smart").text(text);

     List<AnalyzeResponse.AnalyzeToken> tokens = client.admin().indices().analyze(request).actionGet().getTokens();

     System.out.println(tokens.size());

     for (AnalyzeResponse.AnalyzeToken token : tokens) {

         System.out.println(token.getTerm() + " ");

     }

}

结果：

4

提前

祝

大家

春节快乐

秒客网

elasticsearch安装ik分词器

一、概要：

二、安装插件

三、重启es

四、测试

五、java api分词测试

相关文章