我想查询存储在es中的所有电子邮件,现在我使用了这个查询术语并获得了查询结果。
{
"query": {
"regexp": {
"sys_content": {
"value": "[-a-zA-Z0-9_]+(\\.[-a-zA-Z0-9_]+)*@[-a-zA-Z0-9_]+(\\.[-a-zA-Z0-9_]+)+",
"flags_value": 65535,
"max_determinized_states": 10000,
"boost": 1.0
}
}
},
"highlight": {
"pre_tags": [
"<span style='color:red'>"
],
"post_tags": [
"</span>"
],
"fragment_size": 100,
"require_field_match": true,
"fields": {
"sys_content": {}
}
}}
然后,我试着去查询"\@“,却一无所获
发布于 2021-06-22 11:12:54
下面是使用uax url电子邮件标记器的解决方案。这将在索引时间完成大部分工作,使您的搜索速度更快。
使用自定义分析器创建索引以创建令牌,并创建仅保留这些令牌的筛选器:
PUT test-index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer",
"filter": ["extract_email"]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "uax_url_email",
"max_token_length": 50
}
},
"filter": {
"extract_email": {
"type": "keep_types",
"types": [ "<EMAIL>" ]
}
}
}
},
"mappings" : {
"properties" : {
"sys_content" : {
"type" : "text",
"fields": {
"email": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}然后添加一个文档:
POST test-index/_doc
{
"sys_content": "test email@gmail.com not@ a@a email another@email.fr"
}最后搜索并突出显示电子邮件。由于uax url电子邮件标记器,查找电子邮件已经在索引时完成,因此在搜索时,您只需匹配sys_content.email字段中的任何令牌:
GET test-index/_search
{
"query": {
"regexp": {
"sys_content.email": {
"value": ".*",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
},
"highlight": {
"pre_tags": [
"<span style='color:red'>"
],
"post_tags": [
"</span>"
],
"fragment_size": 100,
"require_field_match": true,
"fields": {
"sys_content.email": {}
}
}
}这产生了以下结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test-index",
"_type" : "_doc",
"_id" : "GxSbM3oBJxdf7EzzH4jM",
"_score" : 1.0,
"_source" : {
"sys_content" : "test email@gmail.com not@ a@a email another@email.fr"
},
"highlight" : {
"sys_content.email" : [
"test <span style='color:red'>email@gmail.com</span> not@ a@a email <span style='color:red'>another@email.fr</span>"
]
}
}
]
}
}注意:在不使用regex搜索的情况下,必须有更好的方法来匹配字段中的任何令牌,但我找不到它。无论如何,这是有效的,正则表达式很简单,所以它应该是快速的。
https://stackoverflow.com/questions/68081284
复制相似问题