我已经解析了html页面:使用漂亮的汤
badges = soup.body.find('div', attrs={'class': 'col-md-11'})在此之后,我的badges对象如下所示:
<div class="col-md-11">
<h4>
<span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
<font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
<span style="color:green;font-weight:bold;"> [activ]</span>
</h4>
<p>
<span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
</p>
<p>
<span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
<span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
</p>
</div>现在我要提取NEDELCU Paul-Iulian,Baroul ,activ,Sediu首席执行官n Baroul Dolj,adresă: mun.Craiova,str.Mihail kogălniceanu,nr.16,jud.Dolj,tel.e 212和
我试着使用badges.span.span,但这不起作用。
发布于 2018-06-29 13:03:34
使用soup.find
演示:
from bs4 import BeautifulSoup
s = """<div class="col-md-11">
<h4>
<span class="fas fa-user-circle padding-right-sm text-green"></span><span class="label label-success">Avocat definitiv</span>
<font style="font-weight:bold;">NEDELCU Paul-Iulian</font>, Baroul Dolj
<span style="color:green;font-weight:bold;"> [activ]</span>
</h4>
<p>
<span class="fas fa-map-marker text-red padding-right-sm"></span>Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
</p>
<p>
<span class="padding-right-md text-primary"><span class="fal fa-phone text-primary padding-right-sm"></span></span>
<span class="text-nowrap"><span class="fal fa-envelope text-info padding-right-sm"></span>paul_iulyan@yahoo.com</span>
</p>
</div>"""
soup = BeautifulSoup(s, "html.parser")
val = soup.find("font", {"style":"font-weight:bold;"})
print( "{} {}".format(val.text, val.next_sibling ).strip() )
print( soup.find("span", {"style":"color:green;font-weight:bold;"}).text.strip() )
print( soup.find("span", class_="fas fa-map-marker text-red padding-right-sm").next_sibling.strip() )
print( soup.find("span", class_="text-nowrap").text.strip() )输出:
NEDELCU Paul-Iulian , Baroul Dolj
[activ]
Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.
paul_iulyan@yahoo.com发布于 2018-06-29 13:21:20
单soup.select方法优化解:
for el in badges.select('h4 font, h4 span:nth-of-type(3), p:nth-of-type(1), p:nth-of-type(2) > span.text-nowrap'):
if el.name == 'font':
result.extend([el.text.strip(), el.nextSibling.strip()])
else:
result.append(el.text.strip())
print(result)输出(格式化):
['NEDELCU Paul-Iulian',
', Baroul Dolj',
'[activ]',
'Sediu principal în Baroul Dolj, adresă: mun.Craiova, str.Mihail kogălniceanu, nr.16, jud.Dolj, tel.',
'paul_iulyan@yahoo.com']https://stackoverflow.com/questions/51102307
复制相似问题