文章/答案/技术大牛

发布

问HackerRank“拯救人类”
EN

Code Review用户

提问于 2016-06-18 18:59:24

回答 1查看 1.5K关注 0票数 3

Hackerrank上的"拯救人类“问题要求我们：

..。在病人DNA中找出与病毒DNA完全匹配的所有子子，或者最多有一个不匹配。例如："aa"和"aa"匹配，"ab"和"aa"匹配，"ab"和"ba"不匹配。输入格式第一行包含测试用例的数量，T.T测试用例如下。每个病例包含两个字符串P(病人DNA)和V(病毒DNA)由空间分离。输出格式输出T行，每个测试用例对应一行。对于每个测试用例，输出一个空格分隔的列表，其中包括根据上述条件与V匹配的P的起始索引(0索引)。指数必须是一个不断增加的顺序。如果没有匹配的输出，则不匹配！

简而言之，它要求我比较两个字符串，并为第二个子字符串在第一个字符串中出现的所有事件找到开头的索引值。问题是允许在子字符串中最多不匹配一次。

我编写了以下解决方案，它似乎对前3种测试用例很有效。问题是它在第三个测试用例之后超时。我似乎无法比我已经做的更多地优化它，除非我的逻辑在某种程度上过于复杂，或者我没有使用足够的内置方法来加快速度。

inp = inp.split('\n')
del inp[0]
for h in inp:
    P,sep,V = str(h).partition(' ')
    upperlimit = len(P) - len(V) + 1
    if P == V:
            print("0")
            continue
    else:
        matches=0
        for i in range(0,upperlimit):
            Psub = P[i:i+len(V)]
            if Psub == V:  # Check if they match exactly
                print(str(i) + " ", end='')
                matches += 1
                continue   # Just proceed with the next iteration of loop if they match
            mismatches = 0
            for j in range(0,len(V)): # If we hit this code then, we're about to see how many characters mismatch in the substring
                if Psub[j] != V[j]:
                    mismatches += 1
                if mismatches>=2: # No point checking the substrings alphabet by alphabet any further, more than 1 mismatch present here.
                    break
            if mismatches<2:
                print(str(i)+" ",end='')
                matches += 1
        if matches==0:      # This is possible ONLY if none of the above conditions were ever met.
            print("No Match!",end='')
    print()

有人能告诉我为什么代码速度慢吗？另外，如果您想让我对我的代码进行更多的注释，请让我知道，这样更容易理解。问题描述的链接对于理解问题应该是非常有用的。

programming-challenge

time-limit-exceeded

python

algorithm

回答 1

Code Review用户

回答已采纳

发布于 2016-06-19 11:54:12

各种挑剔的

range(0, x)最好写成range(x)
print(str(i) + " ", end='')等同于print(i, end=' ')，但速度较慢
if matches == 0读起来比if not matches更好
您可以多次计算len(V)，只需将其值存储在变量中即可。
考虑到您不使用sep，而且h应该已经是一个字符串，那么P,sep,V = str(h).partition(' ')最好写成P, V = h.split()

特例不足以打破规则

检查患者子字符串与病毒之间的相等只是计算它们之间差异的子案例(并发现它们之间的差异为0)。比较整个病人的字符串和病毒也是一样的。所有这些只是以下几个子案例：

    for i in range(0,upperlimit):
        Psub = P[i:i+len(V)]
        for j in range(0,len(V)): # If we hit this code then, we're about to see how many characters mismatch in the substring
            if Psub[j] != V[j]:
                mismatches += 1
            if mismatches>=2: # No point checking the substrings alphabet by alphabet any further, more than 1 mismatch present here.
                break

此外，在检查子串时，会出现比精确匹配更多的不匹配或模糊匹配。所以确切的比较就是让你慢下来。

此外，您可以滥用布尔值是整数的事实，并使用sum内置程序来更快地计算差异数：

    for i in range(upperlimit):
        Psub = P[i:i+len(V)]
        mismatches = sum(Psub[j] != V[j] for j in range(len(V)))

再一次，可以通过使用zip来优化这一点，而不是使用字符串中的索引检索字母：

    for i in range(upperlimit):
        Psub = P[i:i+len(V)]
        mismatches = sum(p != v for p, v in zip(Psub, V))

使用zip的优点是，当到达最短字符串的末尾时，迭代将停止。我们可以使用它简化写到：

    for i in range(upperlimit):
        mismatches = sum(p != v for p, v in zip(P[i:], V))

关注点分离

就目前情况而言，您的代码同时执行三项任务：

迭代输入数据
计算“匹配”子字符串的索引
打印这样的索引

您应该使用函数来分离关注点，并使重用/测试变得更容易。基本布局可能如下所示：

def analyze(patient, virus):
    # build indexes where there is a match

def main():
    # iterate over lines in input
    for line in input_string:
        patient, virus = line.split()
        matches = analyze(patient, virus)
        # print output based on `matches`

它使人们更清楚地了解正在发生的事情。然后，您只需弄清楚如何从analyze返回有意义的值。第一个想法应该是建立一个列表并返回它：

def analyze(patient, virus):
    matches = []
    for i in range(len(patient) - len(virus) + 1):
        if sum(p != v for p, v in zip(patient[i:], virus)) < 2:
            matches.append(i)
    return matches

def main():
    # iterate over lines in input
    for line in input_string:
        patient, virus = line.split()
        matches = analyze(patient, virus)
        if not matches:
            print('No match!')
        else:
            print(' '.join(map(str, matches)))

或者，您可以将analyze转换为生成器，并在调用它时将其计算转换为列表。它允许您避免显式调用append：

提议的改进

def analyze(patient, virus):
    for begin in range(len(patient) - len(virus) + 1):
        if sum(p != v for p, v in zip(patient[begin:], virus)) < 2:
            yield begin


if __name__ == '__main__':
    T = int(input())

    for _ in range(T):
        patient, virus = input().split()
        matches = list(analyze(patient, virus))
        if not matches:
            print('No Match!')
        else:
            print(' '.join(map(str, matches)))

使用生成器的优点是，您甚至不必将其转换为列表，您最初的prints也是这样做的：

    for _ in range(T):
        patient, virus = input().split()
        matches = 0
        for index in analyze(patient, virus):
            print(index, end=' ')
            matches += 1
        if not matches:
            print('No match!')
        else:
            print()

或者你可以动态转换火柴：

    for _ in range(T):
        patient, virus = input().split()
        matches = ' '.join(map(str, analyze(patient, virus)))
        if not matches:
            matches = 'No match!'
        print(matches)

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/132395

复制

相似问题

问HackerRank“拯救人类”
EN

回答 1

Code Review用户

各种挑剔的

特例不足以打破规则

关注点分离

提议的改进

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问HackerRank“拯救人类”EN

回答 1

Code Review用户

各种挑剔的

特例不足以打破规则

关注点分离

提议的改进

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问HackerRank“拯救人类”
EN