FastAPI for ML inference

How to use FastAPI for machine learning inference, Docker-ized and deployed using Kubernetes

Feb 28, 2023 5 min read


Why FastAPI?

FastAPI is fast to build, iterate, and deploy. It uses Python types to automatically do validation and generate OpenAPI docs that are interactive. For machine learning model inference, it’s a great way to quickly get a REST API running in development. It’s also easy and fast to Dockerize it and deploy it to Kubernetes for production.

The power of Python type hints

From the FastAPI docs:

By declaring parameters with type hints, you get:

And FastAPI uses the same declarations to:

FastAPI setup

Check our in the demo repo here for this example of setting up FastAPI with a redirect to the auto-built /docs.

Plus 2 simple health check endpoints, one for the load balancer and one for the app.

import os
from datetime import datetime, timezone

from fastapi import FastAPI
from starlette.responses import RedirectResponse

from app.env import load_env_vars

app = FastAPI()


booted_at = {}

def startup_event():
    booted_at["time"] =

# Redirect base url to /docs
def redirect_to_docs():
    return RedirectResponse(url="/docs")

# ...

def health_check():
    return {
        "aws_region": os.environ.get("AWS_REGION"),
        "booted_at": booted_at.get("time"),
        "health": "OK",
        "k8s_env": os.environ.get("K8S_ENV"),
        "python_env": os.environ["PYTHON_ENV"],
        "titanic_model_version": TITANIC_MODEL_VERSION,
        "version": os.environ.get(
            "APP_REVISION", "Missing $APP_REVISION env var, not set"

# Don't render JSON for load balancer health check, just return 200
def health_check_load_balancer():

Add an inference endpoint

Next in we add an inference endpoint, something like:
    "/predict_survival_by_passengers", response_model=SurvivalPredictionByPassengerIds
def predict_survival_by_passengers(passengers: list[Passenger]):
    # Convert passengers to a list of dicts; this is what the model expects
    # There must be a better way to do this.
    passengers = [passenger.dict() for passenger in passengers]

    # Make predictions
    results = [] if not passengers else titanic_survival_predict(passengers)

    return {
        "predictions": results,
        "metadata": {
            "titanic_model_version": TITANIC_MODEL_VERSION,

That endpoint simply calls the predict() function (here as titanic_survival_predict()) that loads the model and makes predictions. See demo app for more code

FastAPI handles the type checking, validation, and most errors for us automatically.


I’ve found poetry to be a best way to manage dependencies and you’ll see poetry commands in the Dockerfile below.

Here’s the Dockerfile:

# Versions with defaults. Override with env var to build a different version.

# More args
# For security, set a non-root user. Name is arbitrary.
ARG USER=nonroot
ARG PYTHON_ENV=production

# Use the official lightweight Python image.
FROM python:${PYTHON_VERSION}-slim-buster

# Args needed for this container


# Recommended by hadolint
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Add non-root user
RUN groupadd --gid $USER_ID $USER \
  && useradd --uid $USER_ID --gid $USER --shell /bin/bash --create-home $USER

# Create a directory for the app code
RUN mkdir /app \
  && chown -R $USER:$USER /app


# Copy dependency definitions for production-only requirements
COPY --chown=$USER:$USER pyproject.toml ./pyproject.toml

# Install poetry
RUN pip install poetry
# Don't create a virtualenv, we'll use the container's python so everything else just works
RUN poetry config virtualenvs.create false
# Install production-only dependencies
RUN poetry install --without dev,test

# COPY the app code that is needed to start the server and run the app
# This will be at `app/app` in the container.
COPY --chown=$USER:$USER app app
COPY --chown=$USER:$USER .env.production .env.production

# Set user to non-root $USER
# This needs to be the numeric uid, not the username, for the k8s
# securityContext: runAsNonRoot check to work.

CMD ["uvicorn", "app.main:app", "--host", "", "--port", "80"]

Kubernetes deployment and security best practices

Assuming you already have an existing Kubernetes cluster, it’s straightforward to deploy this app. Check out the demo app’s kubernetes/ directory for the deployment and service specs.

I always recommend following the NSA/CISA Kubernetes Hardening Guidance as much as possible.

In the demo app, you’ll notice it follows some key security recommendations:

The demo app does the checked ones above and everyone really should. They’re straightforward to implement. See the Dockerfile for “nonroot” USER setup and USER_ID

The other parts are as simple as passing some options in the K8s Deployment spec

    # Must match Dockerfile's USER_ID for User and Group
    runAsUser: 1001
    runAsGroup: 1001
    # Set ownership of mounted volumes to the user running the container
    fsGroup: 1001

    - name: demo-fastapi-ml-container
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        runAsNonRoot: true

Wrapping up

FastAPI is a great framework for building APIs, including for machine learning model inference. It’s easy to use, has great documentation, and is fast. It’s also easy to Dockerize and to deploy to Kubernetes.

Check out the demo app code and let me know what you think.

Read more posts like this in the Software Engineering Toolbox collection.
Visit homepage
comments powered by Disqus