我是一个在Python上使用re库的菜鸟。我正在做一个Web抓取,我想匹配一些字符串模式并在列表中追加值。例如:
parking = []
rooms = []
toilets = []
attribute = soup.find('ul',{'class':'specs-list'}).find_all('li')
for a in attribute:
print(a.text)输出索引为0的迭代a
Metters
50 m�
Rooms
2
Toilets
1带索引1的输出迭代a
Metters
50 m�
parking
1
spends
340 因此,例如,我想匹配标题的名称,如果在A值上存在,我想在每个列表中追加结果
伪码:
for a in attribute:
if a contains "Rooms":
rooms.append(a)
if a contains "Parking":
parking.append(a)
if a contains "toilets":
parking.append(a)
if a not contains strings above:
rooms.append(nan)
parking.append(nan)
rooms.append(nan)我使用BeautifulSoup创建web抓取,属性值的结果如下:
索引0的属性变量输出:
[<li class="specs-item">
<strong>Metters</strong>
<span>50 m�</span>
</li>,<li class="specs-item">
<strong>Rooms</strong>
<span>2</span>
</li>,<li class="specs-item">
<strong>Toilets</strong>
<span>1</span>
</li>,<li class="specs-item">
<strong>Spends</strong>
<span>340</span></li>]属性的长度为0f5值,每个值的代码都与上面的值相似,但标题和值不同,有些属性包含停车场、房间、厕所,而其他值只有厕所和房间,等等。
发布于 2020-10-02 15:25:15
这应该对你有帮助:
from bs4 import BeautifulSoup
import requests
parking = []
rooms = []
toilets = []
html = requests.get('website url').text
soup = BeautifulSoup(html,'html.parser')
attribute = soup.find_all('li',{'class':'specs-item'})
for a in attribute:
heading = a.strong.text
span = a.span.text
if heading == "Parking":
parking.append(span)
elif heading == "Rooms":
rooms.append(span)
elif heading == "Toilets":
toilets.append(span)
print("Parking =" , parking)
print("Rooms =", rooms)
print("Toilets =", toilets)U提供的li值的输出:
Parking = []
Rooms = ['2']
Toilets = ['1']编辑:
虽然这样做很有效,但我觉得拥有这么多lists并不是一种好方法。相反,你可以使用dictionary。这就是如何使用dictionary实现相同的输出。
details_dict = {'Parking':[],
'Rooms':[],
'Toilets':[]}
for a in attribute:
heading = a.strong.text
span = a.span.text
if heading == "Parking" or heading == "Rooms" or heading == "Toilets":
details_dict[heading].append(span)
print(details_dict)输出:
{'Parking': [], 'Rooms': ['2'], 'Toilets': ['1']}我觉得这是一个更好的方法。但这完全取决于你,选择最适合你的任务。
https://stackoverflow.com/questions/64173449
复制相似问题