首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从一所学校的top247大学橄榄球新兵中剔除

从一所学校的top247大学橄榄球新兵中剔除
EN

Stack Overflow用户
提问于 2021-05-28 23:45:16
回答 1查看 43关注 0票数 0

我正在尝试从下面的网页中获取google colab上的表格:https://247sports.com/college/penn-state/Sport/Football/AllTimeRecruits/

下面是我正在尝试使用的python脚本...

代码语言:javascript
复制
Team = 'penn-state'

url = "https://247sports.com/college/" + str(Team) + "/Sport/Football/AllTimeRecruits/"

# Add the `user-agent` otherwise we will get blocked when sending the request
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"}

response = requests.get(url, headers = headers).content
soup = BeautifulSoup(response, "html.parser")
data = []

for tag in soup.find_all("li", class_="ri-page__list-item"):  # `[1:]` Since the first result is a table header
    rank = tag.find_next("span", class_="all-time-rank").text
    school = tag.find_next("span", class_="meta").text
    year = tag.find_next("span", class_="meta").text
    name = tag.find_next("a", class_="ri-page__name-link").text
    position = tag.find_next("div", class_="position").text
    height_weight = tag.find_next("div", class_="metrics").text
    rating = tag.find_next("span", class_="score").text
    nat_rank = tag.find_next("a", class_="natrank").text
    state_rank = tag.find_next("a", class_="sttrank").text
    pos_rank = tag.find_next("a", class_="posrank").text
#    status = tag.find_next("p", class_="commit-date withDate").text

    data.append(
        {
            "Rank": rank,
            "Name": name,
            "School": school,
            "Class of": year,
            "Position": position,
            "Height & Weight": height_weight,
            "Rating": rating,
            "National Rank": nat_rank,
            "State Rank": state_rank,
            "Position Rank": pos_rank,
#            "Date": status,
        }
    )

df = pd.DataFrame(data)

df

我想要一个专栏,上面写着那个球员是在哪届招兵班的。例如,如果一个球员来自"class of 2005",我希望"2005“作为"year”列的列值。

代码语言:javascript
复制
    Rank    Name    School  Class of    Position    Height & Weight Rating  National Rank   State Rank  Position Rank
0   1   Derrick Williams    Eleanor Roosevelt (Greenbelt, MD)   Eleanor Roosevelt (Greenbelt, MD)   WR  6-0 / 190   0.9986  4   1   2
1   2   Micah Parsons   Harrisburg (Harrisburg, PA) Harrisburg (Harrisburg, PA) WDE 6-3 / 235   0.9982  5   1   2
2   3   Justin Shorter  South Brunswick (Monmouth Junction, NJ) ... South Brunswick (Monmouth Junction, NJ) ... WR  6-4 / 213   0.9962  8   1   1
3   4   Dan Connor  Strath Haven (Wallingford, PA)  Strath Haven (Wallingford, PA)  ILB 6-3 / 215   0.9944  13  1   2
4   5   Justin King Gateway (Monroeville, PA)   Gateway (Monroeville, PA)   CB  6-0 / 185   0.9942  15  1   2
... ... ... ... ... ... ... ... ... ... ...
242 243 Will Levis  Xavier (Middletown, CT) Xavier (Middletown, CT) PRO 6-4 / 222   0.8689  652 2   28
243 244 Troy Reeder Salesianum (Wilmington, DE) Salesianum (Wilmington, DE) ILB 6-2 / 230   0.8687  500 2   22
244 245 Jake Cooper Archbishop Wood (Warminster, PA)    Archbishop Wood (Warminster, PA)    ILB 6-1 / 220   0.8686  520 11  17
245 246 Jon Ditto   Gateway (Monroeville, PA)   Gateway (Monroeville, PA)   WR  6-3 / 221   0.8684  417 16  52
246 247 Shareef Miller  George Washington (Philadelphia, PA)    George Washington (Philadelphia, PA)    SDE 6-5 / 230   0.8681  525 12  27
247 rows × 10 columns

然而,我在学校得到的却是复制品。这是因为在html中,在观察html代码时,高中和年份都在"span“下找到。这就是说,有没有一种方法可以根据html的设置来筛选高中和年份呢?

任何关于如何使这项工作的援助将是真正的感谢。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-05-29 00:13:08

您有两个包含meta类的spans --第一个用于学校,第二个用于年份(始终按此顺序),因此可以使用find_all查找这两个类,然后从第一个类中提取school,从第二个类中提取year

代码语言:javascript
复制
for tag in soup.find_all("li", class_="ri-page__list-item"):
    meta = tag.find_all("span", class_="meta")
    school = meta[0].text
    year = meta[1].text.replace('Class of ', '')

    # extract other fields...
    # data.append(...)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67741919

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档