Documentation Index
Fetch the complete documentation index at: https://mintlify.com/flyteorg/flyte/llms.txt
Use this file to discover all available pages before exploring further.
Flyte uses object storage for all task inputs, outputs, intermediate data, and workflow metadata. Storage is configured via the stow library, which provides a unified interface over multiple cloud backends.
How Flyte uses storage
| Data type | Storage path | Who writes it |
|---|
| Workflow metadata (launch plans, executions) | metadataContainer | FlyteAdmin |
| Task input/output literals | userDataContainer | FlyteCopilot sidecar |
| Large datasets (offloaded literals) | userDataContainer | FlytePropeller |
| Cached task outputs | userDataContainer | DataCatalog |
The metadataContainer and userDataContainer can point to the same bucket. Using separate buckets allows independent lifecycle policies.
Configuring storage backends
S3 with IRSA (recommended for EKS)
configuration:
storage:
metadataContainer: my-flyte-metadata
userDataContainer: my-flyte-userdata
provider: s3
providerConfig:
s3:
region: "us-east-1"
authType: "iam" # Uses pod IAM role / IRSA — no static keys
S3 with static access keys
configuration:
storage:
metadataContainer: my-flyte-metadata
userDataContainer: my-flyte-userdata
provider: s3
providerConfig:
s3:
region: "us-east-1"
authType: "accesskey"
accessKey: "<ACCESS_KEY_ID>"
secretKey: "<SECRET_ACCESS_KEY>"
Inline stow config (for advanced options)
configuration:
inline:
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
auth_type: iam
container: my-flyte-bucket
limits:
maxDownloadMBs: 1000
GCS with Workload Identity (recommended for GKE)
configuration:
storage:
metadataContainer: my-flyte-bucket
userDataContainer: my-flyte-bucket
provider: gcs
providerConfig:
gcs:
project: "my-gcp-project"
The GKE node pool’s service account or Workload Identity binding is used automatically when project is set.GCS with a service account key
Mount a JSON key file as a Kubernetes Secret and set GOOGLE_APPLICATION_CREDENTIALS:configuration:
inline:
storage:
type: stow
stow:
kind: google
config:
json: ""
project_id: my-gcp-project
scopes: https://www.googleapis.com/auth/devstorage.read_write
container: my-flyte-bucket
configuration:
storage:
metadataContainer: my-flyte-container
userDataContainer: my-flyte-container
provider: azure
providerConfig:
azure:
account: "my-storage-account"
key: "<STORAGE_ACCOUNT_KEY>"
configDomainSuffix: ""
configUploadConcurrency: 4
MinIO exposes an S3-compatible API. Use the s3 provider with the MinIO endpoint:configuration:
storage:
metadataContainer: my-s3-bucket
userDataContainer: my-s3-bucket
provider: s3
providerConfig:
s3:
disableSSL: true
v2Signing: true
endpoint: http://minio.minio-ns.svc.cluster.local:9000
authType: accesskey
accessKey: minio
secretKey: miniostorage
region: us-east-1 # Required but ignored by MinIO
configuration:
inline:
storage:
signedURL:
stowConfigOverride:
endpoint: http://<NODE_IP>:30002 # For pre-signed URL generation
The stowConfigOverride.endpoint in signedURL must be the externally reachable MinIO endpoint so that pre-signed URLs work from outside the cluster.
Signed URLs
Flyte generates pre-signed URLs for the FlyteConsole to let users download task output files directly from the object store. The remoteData config controls how these URLs are created:
configuration:
inline:
remoteData:
region: us-east-1
scheme: aws # aws, gcs, or azure
signedUrls:
durationMinutes: 3
Cache configuration
DataCatalog uses the object store to cache task output metadata. Configure the in-memory cache size:
configuration:
inline:
storage:
cache:
max_size_mbs: 10
target_gc_percent: 100
Download limits
To protect against unexpectedly large task outputs being pulled into FlytePropeller memory, set a download limit:
configuration:
inline:
storage:
limits:
maxDownloadMBs: 1000
Offline / offloaded literal data
Large task inputs and outputs can be offloaded to the object store rather than stored inline in the workflow CRD. This is recommended for workflows that pass large datasets between tasks:
configuration:
inline:
propeller:
literal-offloading-config:
enabled: true
With offloading enabled, FlytePropeller writes large literals to userDataContainer/data/ and stores a reference in the workflow CRD instead of the raw bytes.