文章/答案/技术大牛

发布

社区首页 >问答首页 >Python正则表达式用于在字符串中查找和替换

问Python正则表达式用于在字符串中查找和替换
EN

Stack Overflow用户

提问于 2013-04-09 14:30:15

回答 1查看 750关注 0票数 0

我正在尝试用to数组加载大型文本数据。Numpy的loadtxt和genfromtxt不起作用，

首先，我需要删除以分隔符['#','!','C']开头的注释行。
其次，在n*value格式的数据中有一个重复模式，其中n是一个整数的重复数，value是浮点数据。

因此，我尝试使用readlines()读取文本文件，然后使用Numpy的loadtxt将数据转换为Numpy数组。

对于读取和替换，我尝试使用正则表达式(re模块)，但无法工作。但是，下面的Python代码正在工作。我的问题是，什么是最有效的和毕达通的方式来做到这一点？

如果是RegEx，那么在readlines()列表对象中进行查找和替换的正确正则表达式是什么：

lines = ['1 2 3*2.5 3 6 1*.3 8 \n', '! comment here\n', '1*1 2.0 2*2.1 3 6 0 8 \n']
for l, line in enumerate(lines):
    if line.strip() == '' or line.strip()[0] in ['#','!','C']:
        del lines[l]        
for l, line in enumerate(lines):
    repls = [word  for word in line.strip().split() if word.find('*')>=0]
    print repls
    for repl in repls:
        print repl
        line = line.replace(repl, ' '.join([repl.split('*')[1] for n in xrange(int(repl.split('*')[0]))]))
    lines[l] = line
print lines

产出如下：

['1 2 2.5 2.5 2.5 3 6 .3 8 \n', '1 2.0 2.1 2.1 3 6 0 8 \n']

编辑：

对于评论，我编辑了Python代码如下：

    in_lines = ['1 2 3*2.5 3 6 1*.3 8 \n', '! comment here\n', '1*1 2.0 2*2.1 3 6 0 8 \n']
    lines = []
    for line in in_lines:
        if line.strip() == '' or line.strip()[0] in ['#','!','C']:
            continue        
        else:
            repls = [word  for word in line.strip().split() if word.find('*')>=0]
            for repl in repls:
                line = line.replace(repl, ' '.join([float(repl.split('*')[1]) for n in xrange(int(repl.split('*')[0]))]))
            lines.append(line)
    print lines

regex

numpy

python

回答 1

Stack Overflow用户

发布于 2013-04-09 14:43:06

Pythonic方法

使用python强大的功能特性和列表理解功能：

#!/usr/bin/env python

lines = ['1 2 3*2.5 3 6 1*.3 8 \n', '! comment here\n', '1*1 2.0 2*2.1 3 6 0 8 \n']

#filter out comments
lines = [line for line in lines if  line.strip() != '' and line.strip()[0] not in ['#','!','C']]

#turns lines into lists of tokens
lines = [[word for word in line.strip().split()] for line in lines]

# turns a list of strings into a number generator, parsing '*' properly
def generate_numbers(tokens):
  for token in tokens:
    if '*' in token:
      n,m = token.split("*")
      for i in range(int(n)):
        yield float(m)
    else:
      yield float(token)

# use the generator to clean up the lines
lines = [list(generate_numbers(tokens)) for tokens in lines]

print lines

产出：

➤ ./try.py 
[[1.0, 2.0, 2.5, 2.5, 2.5, 3.0, 6.0, 0.3, 8.0], [1.0, 2.0, 2.1, 2.1, 3.0, 6.0, 0.0, 8.0]]

快速和小毕达通方法

此解决方案使用生成器而不是列表，这样就不必在内存中加载整个文件。注意两个习语的用法：

with open("name") as file 这将清理您的文件句柄后，您退出块。
for line in file 这将使用生成器迭代文件中的行，而不会在内存中加载整个文件。

这使我们：

#!/usr/bin/env python

# turns a list of strings into a number generator, parsing '*' properly
def generate_numbers(tokens):
  for token in tokens:
    if '*' in token:
      n,m = token.split("*")
      for i in range(int(n)):
        yield float(m)
    else:
      yield float(token)

# Pull this out to make the code more readable
def not_comment(line):
  return line.strip() != '' and line.strip()[0] not in ['#','!','C']

with open("try.dat") as file:
  lines = ( 
    list(generate_numbers((word for word in line.strip().split()))) 
    for line in file if not_comment(line)
  ) # lines is a lazy generator

  for line in lines:
    print line

输出：

➤ ./try.py 
[1.0, 2.0, 2.5, 2.5, 2.5, 3.0, 6.0, 0.3, 8.0]
[1.0, 2.0, 2.1, 2.1, 3.0, 6.0, 0.0, 8.0]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/15904727

复制

相似问题

问Python正则表达式用于在字符串中查找和替换
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python正则表达式用于在字符串中查找和替换EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python正则表达式用于在字符串中查找和替换
EN