Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/flyteorg/flyte/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers the most common issues encountered when running Flyte. Before diving in, collect the following diagnostic information:
# Get the pod status and events
kubectl describe pod <PodName> -n <namespace>

# Get pod logs
kubectl logs <PodName> -n <namespace>
<PodName> is the node execution string shown in the Flyte UI. <namespace> corresponds to the Flyte project-domain, e.g. flytesnacks-development.
The Flyte UI shows node execution IDs like ab5mg9lzgth62h82qprp-n0-0. This is also the pod name in Kubernetes.

Installation and sandbox issues

Error:
Error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?
This occurs when running Docker Desktop instead of the native Docker engine on Linux. The socket path differs.Fix for Docker Desktop on macOS:
sudo ln -s ~/Library/Containers/com.docker.docker/Data/docker.raw.sock /var/run/docker.sock
Fix for Docker Desktop on Linux:
sudo ln -s ~/.docker/desktop/docker.sock /var/run/docker.sock
Fix for Rancher Desktop on Linux:
sudo ln -s ~/.rd/docker.sock /var/run/docker.sock
If you are using another container runtime, link its socket to /var/run/docker.sock.
Error:
message: '0/1 nodes are available: 1 Insufficient cpu.
preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.'
This is common on macOS with Docker Desktop.Fix: Open Docker Desktop settings and increase resources to a minimum of 4 CPU cores and 3 GB RAM.
Error:
authentication handshake failed: x509: "Kubernetes Ingress Controller Fake Certificate" certificate is not trusted
This occurs when TLS is not properly configured in a flyte-core deployment.Fix: Enable TLS in your values.yaml:
ingress:
  host: example.com
  separateGrpcIngress: true
  separateGrpcIngressAnnotations:
    ingress.kubernetes.io/backend-protocol: "grpc"
  annotations:
    ingress.kubernetes.io/app-root: "/console"
    ingress.kubernetes.io/default-backend-redirect: "/console"
    kubernetes.io/ingress.class: haproxy
  tls:
    enabled: true
Also update your flytectl config to disable insecure mode:
admin:
  endpoint: dns:///example.com
  authType: Pkce
  insecure: false
  insecureSkipVerify: true
Error:
OPENSSL_internal:WRONG_VERSION_NUMBER
For flyte-binary: Verify that the endpoint name in your config.yaml matches the DNS names in the SSL certificate (whether self-signed or CA-issued).For sandbox: Verify the FLYTECTL_CONFIG environment variable points to the correct config file:
export FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml

Execution failures

Error:
terminated with exit code (137). Reason [OOMKilled]
The container exceeded its memory limit.Fix 1: For Helm deployments, update task resource defaults in your values.yaml:
inline:
  task_resources:
    defaults:
      cpu: 100m
      memory: 100Mi
      storage: 100Mi
    limits:
      memory: 1Gi
Fix 2: Override resource limits directly in your task code:
from flytekit import Resources, task

@task(limits=Resources(mem="256Mi"))
def your_task(...):
    ...
Fix 3: For EKS deployments, adjust limits in the inline section of eks-production.yaml. Use the most recent Helm charts.
Error: Kubernetes cannot pull the task container image.Fix 1: If your environment uses a network proxy, pass the proxy configuration when starting the sandbox:
flytectl demo start --env HTTP_PROXY=<your-proxy-IP>
Fix 2: Never use latest as an image tag. Kubernetes changes the pull policy to Always for latest, forcing a pull on every pod start. Use a specific version tag:
@task(container_image="my-registry.example.com/my-image:v1.2.3")
def my_task(...):
    ...
Fix 3: If the registry requires authentication, create a Kubernetes image pull secret and configure it in your pod template.
Error:
ModuleNotFoundError: No module named 'mymodule'
Cause: The Python module is not on the container’s path.Fix: If using a custom Docker image, ensure:
  1. Your Dockerfile is at the same level as the flyte directory.
  2. An empty __init__.py exists in your project folder.
Expected directory layout:
myflyteapp/
├── Dockerfile
├── docker_build_and_tag.sh
└── flyte/
    ├── __init__.py
    └── workflows/
        ├── __init__.py
        └── example.py
Error:
FlyteScopedUserException: 'JavaPackage' object is not callable
Cause: The spark plugin is not enabled in the FlytePropeller configuration.Fix: Add spark to the enabled-plugins list in your config YAML:
tasks:
  task-plugins:
    enabled-plugins:
      - container
      - sidecar
      - K8S-ARRAY
      - spark
    default-for-task-types:
      - container: container
      - container_array: K8S-ARRAY
Error: An execution appears stuck or reports an inconsistent failed + succeeded + running state.Cause: A malformed dynamic workflow was processed by FlytePropeller. This was a known bug fixed in v1.16.4.Fix: Upgrade to Flyte v1.16.4 or later. If you cannot upgrade immediately, use RecoverExecution to resume from the last known good state:
grpcurl -plaintext \
  -d '{"id": {"project": "flytesnacks", "domain": "development", "name": "<execution-id>"}}' \
  localhost:81 flyteidl.service.AdminService/RecoverExecution

Storage and data issues

Error:
An error occurred (AccessDenied) when calling the PutObject operation
Cause: The Kubernetes service account Flyte uses does not have the correct IAM role annotation for IRSA (IAM Roles for Service Accounts).Fix 1: Verify the service account annotation:
kubectl describe sa <my-flyte-sa> -n <flyte-namespace>
Expected output should include:
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/flyte-system-role
Fix 2: If the annotation is missing, add it manually:
kubectl annotate serviceaccount -n <flyte-namespace> <my-flyte-sa> \
  eks.amazonaws.com/role-arn=arn:aws:iam::<account-id>:role/<flyte-iam-role>
Refer to the community-maintained Flyte the Hard Way guide for full EKS IAM configuration.
When running the local sandbox, Minio is available at:For debugging, set these environment variables when running tasks locally:
export FLYTE_AWS_ENDPOINT="http://localhost:30002"
export FLYTE_AWS_ACCESS_KEY_ID="minio"
export FLYTE_AWS_SECRET_ACCESS_KEY="miniostorage"

Authentication issues

Error: rpc error: code = UnauthenticatedFix 1: Re-authenticate:
flytectl config init --host flyte.example.com
Fix 2: Verify your config file has the correct auth settings:
admin:
  endpoint: dns:///flyte.example.com
  authType: Pkce        # or ClientSecret for service accounts
  insecure: false
Fix 3: For development/sandbox, you can disable auth entirely:
admin:
  endpoint: dns:///localhost:30080
  insecure: true
After running flytectl demo start, the sandbox config is written to ~/.flyte/config-sandbox.yaml. Export it:
export FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml
Add this to your shell profile to persist it across sessions.

Getting more help

GitHub Issues

Open a bug report or feature request.

Slack Community

Get real-time help in the #ask-the-community channel.

GitHub Discussions

Ask questions or share ideas with the community.

Documentation

Browse the official Flyte documentation.