我想像在期望的输出中那样对链接进行排序,但我不知道如何更改我的代码
那么,如何更改代码以获得所需的输出呢?
提前感谢您的帮助
代码:
def my_sort(line):
social_folders = {'engine': 1,
'wormix_mm': 2,
'wormix_ok': 3}
line_fields = line.strip().split("/")
social = line_fields[3]
print(line_fields[3])
return social_folders[social]
numbers = 'First', 'Second', 'Third', 'Fourth'
with open('./testsort.txt') as testsortf, \
open('./test_out999.txt', "w") as test_out:
contents = testsortf.readlines()
contents[-1] = f'{contents[-1]}\n'
contents.sort(key=my_sort)
for i, line in enumerate(contents):
test_out.write(f'{numbers[i]}:\n{line}')
if i+1 < len(contents):
test_out.write('\n')我来自.txt文件的输入:
https://markus.rmart.ru/engine/preloader/somefold
https://markus.rmart.ru/wormix_ok/preloader/somefold3
https://markus.rmart.ru/engine/preloader/somefold3
https://markus.rmart.ru/engine/preloader/somefold1
https://markus.rmart.ru/wormix_ok/preloader/somefold4
https://markus.rmart.ru/wormix_mm/preloader/somefold2
https://markus.rmart.ru/wormix_mm/preloader/somefold1
https://markus.rmart.ru/engine/preloader/somefold2
https://markus.rmart.ru/engine/preloader/somefold5
https://markus.rmart.ru/wormix_mm/preloader/somefold5
https://markus.rmart.ru/wormix_ok/preloader/somefold1因此,没有任何排序的输入
所需输出:
First:
https://markus.rmart.ru/engine/preloader/somefold
https://markus.rmart.ru/engine/preloader/somefold3
https://markus.rmart.ru/engine/preloader/somefold1
https://markus.rmart.ru/engine/preloader/somefold2
https://markus.rmart.ru/engine/preloader/somefold5
Second:
https://markus.rmart.ru/wormix_mm/preloader/somefold2
https://markus.rmart.ru/wormix_mm/preloader/somefold1
https://markus.rmart.ru/wormix_mm/preloader/somefold5
Third:
https://markus.rmart.ru/wormix_ok/preloader/somefold1
https://markus.rmart.ru/wormix_ok/preloader/somefold4
https://markus.rmart.ru/wormix_ok/preloader/somefold3现在输出:
First:
https://markus.rmart.ru/engine/preloader/somefold
Second:
https://markus.rmart.ru/engine/preloader/somefold1
Third:
https://markus.rmart.ru/engine/preloader/somefold3
Fourth:
https://markus.rmart.ru/engine/preloader/somefoldtest发布于 2021-04-25 05:48:36
尝试这段代码(我在代码中注释了我所做的事情)。
def my_sort(conts):
social_folders = {'engine': 1, 'wormix_mm': 2, 'wormix_ok': 3}
line_fields = conts.strip().split("/")
social = line_fields[3]
return social_folders[social]
# I didn't know what is the differences between First and second section.
# So I put them together. You can handle that yourself.
numbers = 'First', 'Second', 'Third'#, 'Fourth'
folds = ['engine', 'wormix_mm', 'wormix_ok']
with open('./testsort.txt') as testsortf, open('./test_out999.txt', "w") as test_out:
contents = testsortf.readlines()
contents[-1] = f'{contents[-1]}\n'
contents.sort(key=my_sort)
# It needs 2 for loops
for k, fold in enumerate(numbers):
# Put enter before every category, except the first one
if k != 0:
test_out.write(f'\n')
# Put the label of each category
test_out.write(f'{numbers[k]}:\n')
for i, line in enumerate(contents):
# Put the right label in each category
if line.strip().split("/")[3] == folds[k]:
test_out.write(f'{line}')发布于 2021-04-25 06:17:27
标准库中的itertools.groupby()将按照您想要的方式对链接列表进行集群,但它需要一些设置。具体地说,除了链接的排序迭代之外,它还需要知道分组依据的是链接字符串的哪一部分。为此,需要一些类似于正则表达式的东西来隔离链接的关键部分。
示例:
import re
from itertools import groupby
sorted_links = sorted([
"https://markus.rmart.ru/engine/preloader/somefold",
"https://markus.rmart.ru/wormix_ok/preloader/somefold3",
"https://markus.rmart.ru/engine/preloader/somefold3",
"https://markus.rmart.ru/engine/preloader/somefold1",
"https://markus.rmart.ru/wormix_ok/preloader/somefold4",
"https://markus.rmart.ru/wormix_mm/preloader/somefold2",
"https://markus.rmart.ru/wormix_mm/preloader/somefold1",
"https://markus.rmart.ru/engine/preloader/somefold2",
"https://markus.rmart.ru/engine/preloader/somefold5",
"https://markus.rmart.ru/wormix_mm/preloader/somefold5",
"https://markus.rmart.ru/wormix_ok/preloader/somefold1",
])
# Finds the category part of the path that follows the domain name (e.g., "engine")
category = re.compile(r"https.*\.[a-z]{2,3}\/([^\/]*)")
for _, group in groupby(sorted_links, lambda url: category.search(url).group(1)):
for url in group:
print(url)
print()输出:
https://markus.rmart.ru/engine/preloader/somefold
https://markus.rmart.ru/engine/preloader/somefold1
https://markus.rmart.ru/engine/preloader/somefold2
https://markus.rmart.ru/engine/preloader/somefold3
https://markus.rmart.ru/engine/preloader/somefold5
https://markus.rmart.ru/wormix_mm/preloader/somefold1
https://markus.rmart.ru/wormix_mm/preloader/somefold2
https://markus.rmart.ru/wormix_mm/preloader/somefold5
https://markus.rmart.ru/wormix_ok/preloader/somefold1
https://markus.rmart.ru/wormix_ok/preloader/somefold3
https://markus.rmart.ru/wormix_ok/preloader/somefold4https://stackoverflow.com/questions/67246902
复制相似问题