我试图抓取一个动态加载页面,并且不想使用selenium,所以我尝试使用data-price_excl_tax,这是我获取表的价格的地方,(参见下面)作为返回qty的一种方法。我试图在以下情况下通过if发言来做到这一点:
**如果存在数据价格不包括税金5,则数据价格不包括税额4=4
但如果数据价格不包括税额5,则数据价格不包括税额4-5。
我还需要:
如果数据价格为excel_12,则数据价格不包括税金6= 6-11
但是如果它不存在,那么数据价格不包括税= 6+**。
任何帮助都将不胜感激。
HTML代码
<form id="cart-30102" action="/cart/add/" class="cart" method="post" enctype="multipart/form-data" data-add_savings_message="true" data-price_excl_tax="38.99" data-price_excl_tax_2="24.87" data-price_excl_tax_3="22.99" data-price_excl_tax_4="23.89" data-price_excl_tax_5="23.04" data-price_excl_tax_6="22.19" data-price_excl_tax_12="21.86">示例url:merv=11
这是我的python代码:
from web_sites import web_sites
from bs4 import BeautifulSoup
import requests
import json
import csv
from selenium import webdriver
from selenium.webdriver.common.by import By
PATH = '***'
urls = web_sites
#driver = webdriver.Chrome()
#driver.get(urls)
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
mervs = BeautifulSoup(response.text, 'lxml').find_all('strong')
product = BeautifulSoup(response.text, 'lxml').find("h1", class_="text-center")
product_name = product.text
qty="something in (data-price_excl_tax)"
json_schema = soup.find_all('script', attrs={'type': 'application/ld+json'})[1]
json_file = json.loads(json_schema.get_text())
the_dict = json_file
n = the_dict['@graph'][0]
descriptions = n['description']
d = the_dict['@graph'][0]['aggregateRating']
ratingValue = d['ratingValue']
reviewCount = d['reviewCount']
for i, cart in enumerate(BeautifulSoup(response.text, 'lxml').find_all('form', class_='cart')):
for tax in cart.attrs:
if 'data-price' in tax:
if 'data-price_excl_tax' in tax:
qty= '1'
if 'data-price_excl_tax_2' in tax:
qty= '2'
if 'data-price_excl_tax_3' in tax:
qty= '3'
#if ('data-price_excl_tax_4' and 'data-price_excl_tax_5') in tax:
#qty= "4"
#if ('data-price_excl_tax_5') in tax:
#qty= "4-5"
if 'data-price_excl_tax_5' in tax:
qty="5"
#if 'data-price_excl_tax_6' and 'data-price_excl_tax_12' in tax:
#qty="6-11"
#if 'data-price_excl_tax_6' in tax:
#qty="6+"
if 'data-price_excl_tax_12' in tax:
qty="12+"
print(product_name.replace("\n", "").replace("('", "").strip(), mervs[i].get_text(), qty, cart[tax], ratingValue, reviewCount)
#header = ['merv', 'price', 'json_file']
data = [product_name.replace("\n", "").replace("('", "").strip(), mervs[i].get_text(), qty, cart[tax],ratingValue,reviewCount]
with open('products1.csv', 'a', newline='', encoding='UTF8') as csv_file:
writer = csv.writer(csv_file, delimiter =',')
#writer.writerow(header)
writer.writerow(data)发布于 2022-07-19 19:36:33
查找()方法可能有用吗?
for i, cart in enumerate(BeautifulSoup(response.text, 'lxml').find_all('form', class_='cart')):
for tax in cart.attrs:
if in tax.find('data-price'):
print('found')https://stackoverflow.com/questions/73040599
复制相似问题