文章/答案/技术大牛

发布

社区首页 >问答首页 >弹性搜索完形类型中的特殊特征

问弹性搜索完形类型中的特殊特征
EN

Stack Overflow用户

提问于 2015-02-28 19:09:11

回答 1查看 1.3K关注 0票数 0

我是弹性搜索的新用户，我有一个映射:-

curl -X PUT localhost:9200/vee_trade -d '
{
 "mappings": {
  "sDocument" : {
   "properties" : {
    "id" : { "type" : "long" },
    "docId" : { "type" : "string" },
    "documentType" : { "type" : "string" },
    "rating"  : { "type" : "float" },
    "suggestion" : { "type" :     "completion"}
    }
   }
  }
}

其中一个样本数据是:-

 _index: "test"
 _type: "sDocument"
 _id: "CATEGORY7"
 _score: 1
 _source{}
 docId: "CATEGORY7"
 documentType: "CATEGORY"
 id: 7
 suggestion[]
 "Kids's wear"
 rating: null

基本上，我的目标是启用自动建议，它适用于查询，但在自动建议条目中，我只获得术语和分数值，而我也希望获得其他字段值，因此，我再次对建议字段进行匹配查询，并生成自动建议项。

{
  "query" : {
   "match" : {
    "suggestion" : "Men's"  
    }
   }
}

但是我没有得到数据，因为弹性删除了术语中的特殊字符(不确定它是如何存储和索引的)，所以请告诉我

如何检索自动建议中的其他字段值和搜索项？或者如何使匹配查询工作？

提前谢谢。

autocomplete

elasticsearch

回答 1

Stack Overflow用户

发布于 2015-03-01 17:56:20

警告:长话短说。从你发布的信息中很难准确地判断出问题所在，所以我给你提供了几个可以帮助你解决问题的方法。

对于你想要做的事情，你可以有几种不同的方法。我在Qbox博客上写了两种不同的自动完成方法，一篇关于使用完成建议，另一篇关于使用涉及ngrams和多个字段的更复杂的设置。

在实践中，我发现完成提示有点笨拙(因为您必须明确告诉它该响应什么)，所以我倾向于更多地依赖自定义分析框架。对分析器进行实验的一种方法是为属性设置多个子域 (以前称为多字段)。下面我将给出几个例子。

我将设置一个包含两个子字段的字段，这些子字段以不同的方式分析文本，然后对每个字段使用一个match查询来显示它的行为。

看一看这个：

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "text_field": {
               "type": "string",
               "index_analyzer": "standard",
               "search_analyzer": "standard",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "ngram": {
                     "type": "string",
                     "index_analyzer": "nGram_analyzer",
                     "search_analyzer": "whitespace_analyzer"
                  }
               }
            }
         }
      }
   }
}

这里发生了很多事情，我鼓励大家阅读分析和纳克。此外，我从我的部分字自动完成后中获取了部分代码，因此您可能会发现阅读这些代码以获得更深入的解释是有帮助的。

但是，基本上，我有一个字段，"text_field"，它使用分析器进行分析，既用于索引(即在创建倒排索引时为给定文档和字段生成的术语)，也用于搜索(将搜索短语分解为与倒排索引中的术语匹配的方式)。我在那个领域里有两个不同的子字段。一个根本不被分析，所以我们可以匹配的唯一术语是文档字段的原始文本。第二个子字段使用"nGram_analyzer"进行索引分析，"whitespace_analyzer"用于搜索，这两个子字段都在索引的"settings"中定义。

现在，如果我们索引几个文档：

PUT /test_index/doc/1
{
    "text_field": "Kid's wear"
}

PUT /test_index/doc/2
{
    "text_field": "Men's wear"
}

我们可以用不同的方式来对付他们。

查询"text_field.raw"需要精确、完整的文本才能得到匹配：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.raw": "Men's wear"
      }
   }
}
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

针对"text_field"的标准"text_field"查询将按预期工作，因为在索引和搜索时，术语"Men's"将被标记为"men"：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": "Men's"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.625,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.625,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

但是，如果我们加上第二个任期，我们就会得到可能不是我们想要的结果：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": "Men's wear"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.09494676,
            "_source": {
               "text_field": "Kid's wear"
            }
         }
      ]
   }
}

这是因为生成术语的方式，以及匹配查询的默认操作符是"or"。我们可以通过指定match查询使用的运算符为"and"来限制结果

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": {
             "query":  "Men's wear",
             "operator": "and"
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

我们可以使用"text_field.ngram"字段来匹配部分单词(包括符号和标点符号，因为这是在索引设置中的"nGram_filter"定义中指定的)：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.ngram": {
             "query":  "men's we",
             "operator": "and"
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

希望这能给你一些关于如何继续下去的想法。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/28785689

复制

相似问题

问弹性搜索完形类型中的特殊特征
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问弹性搜索完形类型中的特殊特征EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问弹性搜索完形类型中的特殊特征
EN