我有以下在Jupyter Notebook中运行的Python代码。它从源位置下载一个tar文件,将其解压缩并上传到Azure Blob存储。
import os
import tarfile
from azure.storage.blob import BlobClient
def upload_folder(local_path):
connection_string = "XXX"
container_name = "mycontainername"
with tarfile.open(local_path, "r") as file:
for each in file.getnames():
print(each)
file.extract(each)
blob = BlobClient.from_connection_string(connection_string,
container_name=container_name,
blob_name=each)
with open(each, "rb") as f:
blob.upload_blob(f, overwrite=True)
os.remove(each)
# MAIN
!wget https://path/to/myarchive.tar.gz
local_path = "myarchive.tar.gz"
upload_folder(local_path)
!rm -rf myarchive.tar.gz
!rm -rf myarchivemyarchive.tar.gz占用1 4Gb,相当于大约4 4Gb的未压缩数据。问题是,即使对于这么小的数据量,运行这段代码也需要太长时间。大约需要5-6个小时。
我做错了什么?有没有办法优化我的代码,让它运行得更快?
https://stackoverflow.com/questions/65630687
复制相似问题