首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >美汤获得跨度内容

美汤获得跨度内容
EN

Stack Overflow用户
提问于 2018-06-29 12:56:52
回答 2查看 714关注 0票数 0

我已经解析了html页面:使用漂亮的汤

代码语言:javascript
复制
badges = soup.body.find('div', attrs={'class': 'col-md-11'})

在此之后,我的badges对象如下所示:

代码语言:javascript
复制
<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>

现在我要提取NEDELCU Paul-IulianBaroul activSediu首席执行官n Baroul Doljadresă: mun.Craiova,str.Mihail kogălniceanu,nr.16,jud.Dolj,tel.e 212

我试着使用badges.span.span,但这不起作用。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-06-29 13:03:34

使用soup.find

演示:

代码语言:javascript
复制
from bs4 import BeautifulSoup
s = """<div class="col-md-11">
   <h4>
      <span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
      <font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
      <span style="color:green;font-weight:bold;"> [activ]</span>
   </h4>
   <p>
      <span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
   </p>
   <p>
      <span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
      <span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
   </p>
</div>"""

soup = BeautifulSoup(s, "html.parser")
val = soup.find("font", {"style":"font-weight:bold;"})
print( "{} {}".format(val.text, val.next_sibling ).strip() )
print( soup.find("span", {"style":"color:green;font-weight:bold;"}).text.strip() )
print( soup.find("span", class_="fas fa-map-marker text-red padding-right-sm").next_sibling.strip() )
print( soup.find("span", class_="text-nowrap").text.strip() )

输出:

代码语言:javascript
复制
NEDELCU Paul-Iulian , Baroul Dolj
[activ]
Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
paul_iulyan@yahoo.com
票数 2
EN

Stack Overflow用户

发布于 2018-06-29 13:21:20

soup.select方法优化解:

代码语言:javascript
复制
for el in badges.select('h4 font, h4 span:nth-of-type(3), p:nth-of-type(1), p:nth-of-type(2) > span.text-nowrap'):
    if el.name == 'font':
        result.extend([el.text.strip(), el.nextSibling.strip()])
    else:
        result.append(el.text.strip())

print(result)

输出(格式化):

代码语言:javascript
复制
['NEDELCU Paul-Iulian',
 ', Baroul Dolj',
 '[activ]',
 'Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.',
 'paul_iulyan@yahoo.com']
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51102307

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档