我试图使用python正则表达式来分析一个收益调用。我想删除那些只包含下一位发言人的姓名和位置的不必要的行。
这是我想分析的案文的摘录:
“问题和答案\n操作人员1\n Shannon Siemsen十字,交叉研究有限公司-共同创始人,首席和分析师2\n我希望大家都好。”蒂姆,你说过在四月下半月会有一些改善。所以我在想,你是否可以在市场和地理基础上多谈谈你在不同地区看到的销售情况,以及你从客户那里听到的信息。然后我有一个跟进。\n蒂莫西·D·库克,苹果公司--首席执行官兼董事3.“
在我要删除的每一行末尾,您都有一些数字。
因此,我使用了下面的代码行来获得这些行:
name_lines = re.findall('.*[\d]]', text)
这是可行的,并给了我以下列表:[‘操作员1',’Shannon Siemsen Cross,交叉研究有限公司-联合创始人,首席和分析师2',‘蒂莫西·库克,苹果公司-首席执行官兼董事3']
因此,在接下来的步骤中,我希望使用以下代码行替换文本中的字符串:
for i in range(0,len(name_lines)):
text = re.sub(name_lines[i], '', text)但这行不通。另外,如果我只是尝试替换1而不是使用循环,它就不能工作,但我不知道为什么。
另外,如果我现在尝试使用re.findall并搜索从第一行代码中获得的行,我就得不到匹配。
发布于 2022-11-29 12:14:05
re.sub的第一个参数被视为正则表达式,因此方括号得到了一个特殊的含义,而不是字面上的匹配。
但是,您根本不需要用于替换的正则表达式(您也不需要循环计数器i):
for name_line in name_lines:
text = text.replace(name_line, '')发布于 2022-11-29 12:37:28
尝试使用re.sub替换匹配:
import re
text = """\
Questions and Answers
Operator [1]
Shannon Siemsen Cross, Cross Research LLC - Co-Founder, Principal & Analyst [2]
I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.
Timothy D. Cook, Apple Inc. - CEO & Director [3]"""
text = re.sub(r".*\d]", "", text)
print(text)指纹:
Questions and Answers
I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.https://stackoverflow.com/questions/74613853
复制相似问题