使用 OpenDAL 直接写对象存储

2025年5月16日 · 阅读需 2 分钟

Data AI

一般下载文件上传 S3 的实现流程为：下载文件到本地盘，然后调用 s3 相关的 SDK 然后上传到 S3 中，这里的面的瓶颈很容易出现下载文件到本地盘时，出现本地盘成为瓶颈，这里使用给大家推荐 OpenDAL 可以直接写远程的 S3.

基本 Demo

下面以 Python 为例，直接给一个 Demo: 从 gharchive 上下载 2025-05-15 这天 github 的文件。例子如下：

import requests
from opendal import Operator

# 初始化 opendal s3 operator
s3_config = {
    "endpoint": "http://127.0.0.1:9900",      # 修改为你的 MinIO 地址
    "access_key_id": "minioadmin",
    "secret_access_key": "minioadmin",
    "bucket": "mystage",
    "region": "us-east-1",               # MinIO 一般可不填或任意值
    "root": "gh"
}

op = Operator("s3", **s3_config)

def download_and_upload(hour: int):
    url = f"https://data.gharchive.org/2025-05-15-{hour}.json.gz"
    print(f"Downloading {url} ...")
    resp = requests.get(url)
    resp.raise_for_status()

    remote_path = f"2025-05-15-{hour}.json.gz"
    print(f"Uploading to {remote_path} ...")
    op.write(remote_path, resp.content)
    print(f"Uploaded {remote_path} successfully.")

def main():
    for h in range(24):
        try:
            download_and_upload(h)
        except Exception as e:
            print(f"Error processing hour {h}: {e}")

if __name__ == "__main__":
    main()

上传阿里云 OSS 配置参考

oss_config={
    "endpoint": "https://cn-beijing.oss.aliyuncs.com",
    "access_key_id": "x",  
    "secret_access_key": "x",
    "bucket": "wubx-bj01",
    "region": "cn-beijing",
    "root": "gh",
    "enable_virtual_host_style":"true"
}
op = Operator("s3", **oss_config)

参考

官方文档: https://opendal.apache.org/
Python 文档: https://opendal.apache.org/docs/python/
核心 Service: https://docs.rs/opendal/latest/opendal/services/index.html
S3 的配置: https://opendal.apache.org/docs/rust/opendal/services/struct.S3Config.html

文档参考可以从上而下，大概先了解 OpenDAL 的整体结构，然后再看看基本的 Demo , service 及最终 s3 的配置项。这个流程也适用于 Java 。

💬 社区支持
有问题与我们的团队联系：Slack

微信：82565387

基本 Demo​

上传阿里云 OSS 配置参考​

参考​

基本 Demo

上传阿里云 OSS 配置参考

参考