文章/答案/技术大牛

发布

社区首页 >问答首页 >添加转义html的BeautifulSoup replaceWith()方法，希望它未转义

问添加转义html的BeautifulSoup replaceWith()方法，希望它未转义
EN

Stack Overflow用户

提问于 2015-10-04 18:53:40

回答 2查看 4.4K关注 0票数 9

我有一个python方法(感谢这个片段)，它接受一些html，并使用BeautifulSoup和Django的urlize将<a>标记包装在未格式化的链接周围：

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(urlizedText)

    print(soup)

    return str(soup)

示例输入文本(作为第一个print语句的输出)如下：

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: http://google.ca

得到的返回文本(作为第二个print语句的输出)如下：

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: &lt;a href="http://google.ca"&gt;http://google.ca&lt;/a&gt;

正如您所看到的，它正在格式化链接，但它使用的是转义的html，所以当我在模板{{ my.html|safe }}中打印它时，它不会呈现为html。

那么，如何才能获得这些添加urlize的标记以避免转义，并将其正确地呈现为html？我怀疑这与我用它作为方法而不是模板过滤器有什么关系吗？我实际上找不到这个方法上的文档，它没有出现在django.utils.html中。

编辑:它似乎转义实际上发生在这一行：textNode.replaceWith(urlizedText)。

beautifulsoup

python

django

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-10-04 21:25:07

您可以将您的urlizedText字符串提交给一个新的BeautifulSoup对象，并且它将作为一个标记来处理，而不是在一个对象中的文本(正如您所期望的那样转义)。

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(BeautifulSoup(urlizedText, "html.parser"))

    print(soup)

    return str(soup)

票数 11

Stack Overflow用户

发布于 2015-10-04 19:35:23

这似乎就是您试图使用BeautifulSoup将文本节点替换为一个包含HTML实体的节点的地方。

实现目标的一种方法是使用urlize的输出构建一个新的字符串(这似乎并不关心链接是否已经格式化)。

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    finalFragments = []
    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if getattr(textNode.parent, 'name') == 'a':
            finalFragments.append(str(textNode.parent))
        else:
            finalFragments.append(urlize(textNode))

    return str("".join(finalFragments))

但是，如果您只想在模板中呈现它，则只需将输入字符串上的urlize调用为模板标记-

{{input_string|urlize}}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32937126

复制

相似问题

问添加转义html的BeautifulSoup replaceWith()方法，希望它未转义
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问添加转义html的BeautifulSoup replaceWith()方法，希望它未转义EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问添加转义html的BeautifulSoup replaceWith()方法，希望它未转义
EN