文章/答案/技术大牛

发布

问Python:隔离re.search结果
EN

Stack Overflow用户

提问于 2015-06-23 16:33:01

回答 1查看 126关注 0票数 4

所以我有这样的代码(可能效率极低，但这是另一个故事)，它从博客的html代码中提取urls。我有一个.csv中的html，我将其放入python，然后运行regex来获取urls。以下是代码：

import csv, re # required imports

infile = open('Book1.csv', 'rt')  # open the csv file
reader = csv.reader(infile)  # read the csv file


strings = [] # initialize a list to read the rows into

for row in reader: # loop over all the rows in the csv file 
    strings += row  # put them into the list

link_list = []  # initialize list that all the links will be put in
for i in strings:  #  loop over the list to access each string for regex (can't regex on lists)

    links = re.search(r'((https?|ftp)://|www\.)[^\s/$.?#].[^\s]*', i) # regex to find the links
    if links != None: # if it finds a link..
        link_list.append(links) # put it into the list!

for link in link_list: # iterate the links over a loop so we can have them in a nice column format
    print(link)

然而，当我打印结果时，它的工作形式是：

<_sre.SRE_Match object; span=(49, 80), match='http://buy.tableausoftware.com"'>
<_sre.SRE_Match object; span=(29, 115), match='https://c.velaro.com/visitor/requestchat.aspx?sit>
<_sre.SRE_Match object; span=(34, 117), match='https://www.tableau.com/about/blog/2015/6/become->
<_sre.SRE_Match object; span=(32, 115), match='https://www.tableau.com/about/blog/2015/6/become->
<_sre.SRE_Match object; span=(76, 166), match='https://www.tableau.com/about/blog/2015/6/become->
<_sre.SRE_Match object; span=(9, 34), match='http://twitter.com/share"'>

有什么方法可以让我从那些包括在内的胡说八道中抽出链接？还有，这只是正则表达式搜索的一部分吗？谢谢!

python

regex

csv

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-06-23 16:36:32

这里的问题是，re.search返回一个match object，而不是匹配字符串，您需要使用group属性来访问您想要的结果。

如果您想要捕获的所有组，您可以使用groups属性，对于一个特殊的组，您可以将预期组的数量传递给它。

在这种情况下，您似乎需要整个匹配，以便可以使用group(0)。

for i in strings:  #  loop over the list to access each string for regex (can't regex on lists)

    links = re.search(r'((https?|ftp)://|www\.)[^\s/$.?#].[^\s]*', i) # regex to find the links
    if links != None: # if it finds a link..
        link_list.append(links.group(0))

群(group1，.) 返回匹配的一个或多个子组。如果有一个参数，则结果是一个字符串；如果有多个参数，则结果是一个元组，每个参数只有一个项。如果没有参数，group1默认为零(整个匹配将返回)。如果groupN参数为零，则对应的返回值是整个匹配字符串；如果它位于包含范围1..99的范围内，则为匹配相应括号组的字符串。如果组号为负数或大于模式中定义的组数，则会引发IndexError异常。如果一个组包含在不匹配的模式的一部分中，则相应的结果为None。如果一个组包含在多次匹配的模式的某个部分中，则返回最后一个匹配。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31008459

复制

相似问题

问Python:隔离re.search结果
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:隔离re.search结果EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:隔离re.search结果
EN