文章/答案/技术大牛

发布

社区首页 >问答首页 >在Python BeautifulSoup4中，如何提取如下所示的特殊文本

问在Python BeautifulSoup4中，如何提取如下所示的特殊文本
EN

Stack Overflow用户

提问于 2014-05-06 22:03:51

回答 1查看 72关注 0票数 1

我在试着抽些绳子。从本文中：

    text = "<li>(<a rel="nofollow" class="external text" href="http://www.icd9data.com/getICD9Code.ashx?
    icd9=999.1">999.1</a>) <a href="/wiki/Air_embolism" title="Air embolism">Air embolism</a> as
    a complication of medical care not elsewhere classified</li>"

我的目标是“作为未分类的医疗服务的并发症”，但语法不起作用：

    soup = bs4.Beautifulsoup(text)
    for tag in soup.find_all('li'):
        print tag.string

有人知道有什么方法可以调用我想要的字符串吗？谢谢。

python

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-05-06 22:16:18

for tag in soup.find_all('li'):
    print(tag.get_text())

版画

(999.1) Air embolism as
a complication of medical care not elsewhere classified

get_text方法返回标记中的所有文本，甚至是作为子标记一部分的文本。

使用lxml，您可以使用

import lxml.html as LH
text = """<li>(<a rel="nofollow" class="external text" href="http://www.icd9data.com/getICD9Code.ashx?
icd9=999.1">999.1</a>) <a href="/wiki/Air_embolism" title="Air embolism">Air embolism</a> as
a complication of medical care not elsewhere classified</li>"""

doc = LH.fromstring(text)
for tag in doc.xpath('//li/a[2]'):
    print(tag.tail)

获得

 as
a complication of medical care not elsewhere classified

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/23505303

复制

相似问题

问在Python BeautifulSoup4中，如何提取如下所示的特殊文本
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python BeautifulSoup4中，如何提取如下所示的特殊文本EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python BeautifulSoup4中，如何提取如下所示的特殊文本
EN