首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >高亮显示Regex匹配中的单词

高亮显示Regex匹配中的单词
EN

Stack Overflow用户
提问于 2018-10-29 18:31:01
回答 2查看 1.9K关注 0票数 5

我试图用Regex搜索一段特定的文本。我希望现实主义者返回X字数前后的文字,并添加高光周围的所有出现的文本。

,例如,:考虑下面的段落。结果应该有至少10个字符前后,没有文字切断。搜索词是“狗”。

狗是宠物。它是最听话的动物之一。世界上有很多种狗。有些人非常友好,有些人很危险。狗有不同的颜色,如黑色、红色、白色和棕色。一些老的有光滑的光泽的皮肤和一些粗糙的皮肤。狗是食肉动物。他们喜欢吃肉。他们有四条腿,两只耳朵和一条尾巴。狗被训练去执行不同的任务。他们保护我们免受小偷的侵扰。( b)守卫我们的房子。它们是可爱的动物。狗被称为人类最好的朋友。他们被警察用来找隐藏的东西。它们是世界上最有用的动物之一。狗狗!

我想要的结果是一个数组,它看起来如下所示:

  • 是一种宠物动物
  • 世界上许多种
  • 很危险。狗的是不同的
  • 粗糙的皮肤。是食肉动物
  • 还有尾巴。的训练
  • 动物。一个叫的狗
  • 整个世界。狗,,-!

我得到了什么:

我已经搜索了,并找到了以下正则表达式,它完美地返回了所需的结果,但没有添加额外的格式。我创建了几个方法来促进每个功能:

代码语言:javascript
复制
private List<List<string>> Search(string text, string searchTerm, bool searchEntireWord) {
    var result = new List<List<string>>();
    var searchTerms = searchTerm.Split(' ');
        foreach (var word in searchTerms) {
            var searchResults = ExtractParagraph(text, word, sizeOfResult, searchEntireWord);
            result.Add(searchResults);
            if (searchResults.Count > 0) {
                foreach (var searchResult in searchResults) {
                    Response.Write("<strong>Result:</strong> " + searchResult + "<br>");
                }
            }
        }
    return result;
}

private List<string> ExtractParagraph(string text, string searchTerm, sizeOfResult, bool searchEntireWord) {
    var result = new List<string>();
    searchTerm = searchEntireWord ? @"\b" + searchTerm + @"\b" : searchTerm;
    //var expression = @"((^.{0,30}|\w*.{30})\b" + searchTerm + @"\b(.{30}\w*|.{0,30}$))";
    var expression = @"((^.{0," + sizeOfResult + @"}|\w*.{" + sizeOfResult + @"})" + searchTerm + @"(.{" + sizeOfResult + @"}\w*|.{0," + sizeOfResult + @"}$))";
    var wordMatch = new Regex(expression, RegexOptions.IgnoreCase | RegexOptions.Singleline);

    foreach (Match m in wordMatch.Matches(text)) {
        result.Add(m.Value);
    }
    return result;
}

我可以这样说:

代码语言:javascript
复制
var text = "The Dog is a pet animal. It is one of...";
var searchResults = Search(text, "dog", 10);
if (searchResults.Count > 0) {
    foreach (var searchResult in searchResults) {
        foreach (var result in searchResult) {
            Response.Write("<strong>Result:</strong> " + result + "<br>");
        }
    }
}

我还不知道这个单词在10个字符中多次出现的结果,或者如何处理。如果一句话是“狗当然是狗!”我想我以后会处理的。

测试:

代码语言:javascript
复制
var searchResults = Search(text, "dog", 0, false); // should include only the matched word
var searchResults = Search(text, "dog", 1, false); // should include the matched word and only one word preceding and following the matched word (if any)
var searchResults = Search(text, "dog", 10, false); // should include the matched word and up to 10 characters (but not cutting off words in the middle) preceding and following it (if any)
var searchResults = Search(text, "dog", 50, false); // should include the matched word and up to 50 characters (but not cutting off words in the middle) preceding and following it (if any)

问题:

我创建的函数允许搜索找到整个单词或单词的一部分searchTerm。

在显示结果时,我所做的是一个简单的Replace(word, "<strong>" + word "</strong>")。如果我是在寻找单词的一部分,这是很好的工作。但是,当搜索整个单词时,如果结果将searchTerm作为单词的一部分,则该部分将突出显示。

例如,,如果我在寻找“狗”,结果是:“所有的狗都去狗天堂。”“‘s to dog ’s todog‘s todog’s todog‘s todog’s todog‘s todog’s。”但是我要“所有的狗都去天堂。”

问题:

问题是,我如何获得匹配的单词包装一些HTML,如<strong>或其他我想要的东西?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-11-05 22:00:55

您的解决方案应该能够完成两项主要工作: 1)提取匹配项,即关键字/短语加上它们周围的附加左右上下文;2)用标记包装搜索词。

提取正则表达式(例如,左边和右边的10个字符)是

代码语言:javascript
复制
(?si)(?<!\S).{0,10}(?<!\S)\S*dog\S*(?!\S).{0,10}(?!\S)

regex演示

详细信息

  • (?si) -启用SinglelineIgnoreCase修饰符(.将匹配所有字符,模式将不区分大小写)
  • (?<!\S) -左边的空白边界
  • .{0,10} -0到10个字符
  • (?<!\S) -左边的空白边界
  • \S*dog\S* - dog及其周围任何0+非空格字符(注意:如果searchEntireWord为false,则需要从此模式部分删除\S* )。
  • (?!\S) -一个右边的空白边界
  • .{0,10} -0到10个字符
  • (?!\S) -一个右手的空白边界.

在C#中,它将定义为

代码语言:javascript
复制
var expression = string.Format(@"(?si)(?<!\S).{{0,{0}}}(?<!\S)\S*{1}\S*(?!\S).{{0,{0}}}(?!\S)", sizeOfResult, Regex.Escape(searchTerm)); 
if (searchEntireWord) { 
    expression = string.Format(@"(?si)(?<!\S).{{0,{0}}}(?<!\S){1}(?!\S).{{0,{0}}}(?!\S)", sizeOfResult, Regex.Escape(searchTerm)); 
} 

注意,{{实际上是一个文本{}}是格式化字符串中的文字}

用强标记包装关键术语的第二个正则表达式要简单得多:

代码语言:javascript
复制
Regex.Replace(x.Value, 
            searchEntireWord ? 
                string.Format(@"(?i)(?<!\S){0}(?!\S)", Regex.Escape(searchTerm)) : 
                string.Format(@"(?i){0}", Regex.Escape(searchTerm)), 
            "<strong>$&</strong>")

注意,替换模式中的$&指的是整个匹配值。

C#代码:

代码语言:javascript
复制
public static List<string> ExtractTexts(string text, string searchTerm, int sizeOfResult, bool searchEntireWord) 
{
    var expression = string.Format(@"(?si)(?<!\S).{{0,{0}}}(?<!\S)\S*{1}\S*(?!\S).{{0,{0}}}(?!\S)", sizeOfResult, Regex.Escape(searchTerm)); 
    if (searchEntireWord) { 
        expression = string.Format(@"(?si)(?<!\S).{{0,{0}}}(?<!\S){1}(?!\S).{{0,{0}}}(?!\S)", sizeOfResult, Regex.Escape(searchTerm)); 
    } 
    return Regex.Matches(text, expression) 
        .Cast<Match>() 
        .Select(x => Regex.Replace(x.Value, 
            searchEntireWord ? 
                string.Format(@"(?i)(?<!\S){0}(?!\S)", Regex.Escape(searchTerm)) : 
                string.Format(@"(?i){0}", Regex.Escape(searchTerm)), 
            "<strong>$&</strong>"))
        .ToList();
}

示例用法(参见演示)

代码语言:javascript
复制
var text = "The Dog is a real-pet animal. There's an undogging dog that only undogs non-dogs. It is one of the most obedient animals. There are many kinds of dogs in the world. Some of the are very friendly while some of them a dangerous. Dogs are of different color like black, red, white and brown. Some old them have slippery shiny skin and some have rough skin. Dogs are carnivorous animals. They like eating meat. They have four legs, two ears and a tail. Dogs are trained to perform different tasks. They protect us from thieves b) guarding our house. They are loving animals. A dog is called man's best friend. They are used by the police to find hidden things. They are one of the most useful animals in the world. Doggonit!";
var searchTerm = "dog";
var searchEntireWord = false;
Console.WriteLine("======= 10 ========");
var results = ExtractTexts(text, searchTerm, 10, searchEntireWord);
foreach (var result in results)
    Console.WriteLine(result);

输出:

代码语言:javascript
复制
======= 10 ========
(?si)(?<!\S).{0,10}(?<!\S)\S*dog\S*(?!\S).{0,10}(?!\S)
The <strong>Dog</strong> is a
an un<strong>dog</strong>ging <strong>dog</strong> that
only un<strong>dog</strong>s non-<strong>dog</strong>s.
kinds of <strong>dog</strong>s in the
<strong>Dog</strong>s are of
skin. <strong>Dog</strong>s are
a tail. <strong>Dog</strong>s are
A <strong>dog</strong> is called
world. <strong>Dog</strong>gonit!

另一个例子是:

代码语言:javascript
复制
Console.WriteLine("======= 15 ========");
results = ExtractTexts(text, searchTerm, 15, searchEntireWord);
foreach (var result in results)
    Console.WriteLine(result);

输出:

代码语言:javascript
复制
======= 15 ========
(?si)(?<!\S).{0,15}(?<!\S)\S*dog\S*(?!\S).{0,15}(?!\S)
The <strong>Dog</strong> is a real-pet
There's an un<strong>dog</strong>ging <strong>dog</strong> that only
un<strong>dog</strong>s non-<strong>dog</strong>s. It is one of
many kinds of <strong>dog</strong>s in the world.
a dangerous. <strong>Dog</strong>s are of
rough skin. <strong>Dog</strong>s are
and a tail. <strong>Dog</strong>s are trained to
animals. A <strong>dog</strong> is called
in the world. <strong>Dog</strong>gonit!
票数 1
EN

Stack Overflow用户

发布于 2018-10-29 19:56:30

一种使用Regex.Replace的简单解决方案

代码语言:javascript
复制
public bool HighlightExactMatchOnly(string input, string textToHighlight, string expected)
{
    // given
    var escapedHighlight = Regex.Escape(textToHighlight);

    // when
    var result = Regex.Replace(input, @"\b" + escapedHighlight + @"\b", "<strong>$0</strong>");

    return expected == result;
}

测试:

代码语言:javascript
复制
var text = "My test dogs with a single dog and some text behind";
var expected = "My test dogs with a single <strong>dog</strong> and some text behind";
HighlightExactMatchOnly(text , "dog", expected);

请注意,这不是最快的解决方案。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53051733

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档