我(不幸的是)正在使用日立容器平台S3,我需要每2分钟将大约400张图像同步到一个存储桶中。文件名总是相同的,同步会用最新的图像“更新”原始文件。
最初,我无法覆盖现有文件。与其他平台不同,在HCP上,当版本控制被禁用时,您不能更新已经存在的文件,它返回409并且不会存储文件,所以我启用了版本控制,允许覆盖文件。
现在的问题是,HCP被设置为为我的存储桶保留旧版本0天(我的S3管理员说,这应该会导致它不保留任何版本),并且“保留删除的版本”也被禁用,但存储桶仍然充满了对象(每2分钟400个文件=每天约288K)。它似乎以这个数量封顶,在第一天之后,它永久保持在288K(这似乎最终会在1天后删除旧版本)。
下面是一个模拟问题的示例脚本:
# Generate 400 files with the current date/time in them for i in $(seq -w 1 400); do echo $(date +'%Y%m%d%H%M%S') > "file_${i}.txt" done # Sync the current directory to the bucket aws --endpoint-url $HCP_HOST s3 sync . s3://$HCP_BUCKET/ # Run this a few times to simulate the 2 minute upload cycle
最初的同步非常快,只需不到5秒,但在一天中,随着存储桶开始获得更多版本,它变得越来越慢,最终有时需要超过2分钟才能同步文件(这很糟糕,因为我需要每2分钟同步一次文件)。
如果我尝试在1天后列出存储桶中的对象,列表中只会返回400个文件,但可能需要1分钟才能返回(这就是我需要添加--cli-read-timeout 0
的原因):
# List all the files in the bucket aws --endpoint-url $HCP_HOST s3 ls s3://$HCP_BUCKET/ --cli-read-timeout 0 --summarize # Output Total Objects: 400 Total Size: 400
我还可以列出并查看所有不需要的旧版本:
# List object versions and parse output with jq aws --endpoint-url $HCP_HOST s3api list-object-versions --bucket $HCP_BUCKET --cli-read-timeout 0 | jq -c '.Versions[] | {"key": .Key, "version_id": .VersionId, "latest": .IsLatest}'
输出:
{"key":"file_001.txt","version_id":"107250810359745","latest":false} {"key":"file_001.txt","version_id":"107250814851905","latest":false} {"key":"file_001.txt","version_id":"107250827750849","latest":false} {"key":"file_001.txt","version_id":"107250828383425","latest":false} {"key":"file_001.txt","version_id":"107251210538305","latest":false} {"key":"file_001.txt","version_id":"107251210707777","latest":false} {"key":"file_001.txt","version_id":"107251210872641","latest":false} {"key":"file_001.txt","version_id":"107251212449985","latest":false} {"key":"file_001.txt","version_id":"107251212455681","latest":false} {"key":"file_001.txt","version_id":"107251212464001","latest":false} {"key":"file_001.txt","version_id":"107251212470209","latest":false} {"key":"file_001.txt","version_id":"107251212644161","latest":false} {"key":"file_001.txt","version_id":"107251212651329","latest":false} {"key":"file_001.txt","version_id":"107251217133185","latest":false} {"key":"file_001.txt","version_id":"107251217138817","latest":false} {"key":"file_001.txt","version_id":"107251217145217","latest":false} {"key":"file_001.txt","version_id":"107251217150913","latest":false} {"key":"file_001.txt","version_id":"107251217156609","latest":false} {"key":"file_001.txt","version_id":"107251217163649","latest":false} {"key":"file_001.txt","version_id":"107251217331201","latest":false} {"key":"file_001.txt","version_id":"107251217343617","latest":false} {"key":"file_001.txt","version_id":"107251217413505","latest":false} {"key":"file_001.txt","version_id":"107251217422913","latest":false} {"key":"file_001.txt","version_id":"107251217428289","latest":false} {"key":"file_001.txt","version_id":"107251217433537","latest":false} {"key":"file_001.txt","version_id":"107251344110849","latest":true} // ...
我以为我可以运行一个定期清理旧版本的作业,但我尝试删除旧版本,但失败了,出现错误:
# Try deleting an old version for the file_001.txt key aws --endpoint-url $HCP_HOST s3api delete-object --bucket $HCP_BUCKET --key "file_001.txt" --version-id 107250810359745 # Error An error occurred (NotImplemented) when calling the DeleteObject operation: Only the current version of an object can be deleted.
我已经使用MinIO和AWS S3测试了这一点,我的用例在这两个平台上都运行得很好。
是否有我做得不正确的地方,或者HCP中是否有我缺失的设置,可以使我在同步时覆盖对象,而不保留以前的版本?或者,是否有方法手动删除以前的版本?