Serving a function
In this section we will learn how to serve a function as a web endpoint. This will allow you to call your function from any language that supports HTTP requests.
Python functions to Web Endpoints
Any FAL function can be turned into a production-ready web endpoint with a single line of configuration change. When the serve=True
option is added to the @fal.function
decorator, FAL wraps the functions with a Fast API (opens in a new tab) web server. Similar to fal function this webserver runs serverlessly, and scales down to zero when it is not actively used.
import fal
@fal.function(
"virtualenv",
requirements=["pyjokes"],
serve=True,
)
def tell_a_joke() -> str:
import pyjokes
joke = pyjokes.get_joke()
return joke
Deploying a basic app through the CLI
This function can be deployed a serverless web endpoint by running the following command
fal fn serve ./path/to/tell_joke.py tell_joke --alias docs_tell_joke
You'll receive an revision ID in the following format: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
. This is the revision id of your deployed serverless app. Every time you call the fal fn serve
command a new revision id will be generated. We will keep the old revisions around so can still access them.
Serving a function with the --alias
option will create a url that includes the alias you specified instead of the revision id. If you serve a new revision with the same alias, the url will point to the most recent revision of the function.
>> Registered a new revision for function 'joke' (revision='21847a72-93e6-4227-ae6f-56bf3a90142d').
>> URL: https://000000-docs_tell_joke.gateway.alpha.fal.ai
Passing arguments and leveraging Pydantic
Fal functions and FAST API are fully compatible with Pydantic. Any features of Pydantic used in function arguments will also work.
Pydantic features can be used for data validation in your function. In the example below, you can set some of the parameters as optional, set default values, and apply other types of validation such as constraints and types.
import fal
from pydantic import BaseModel, Field
from fal.toolkit import Image
MODEL_NAME = "google/ddpm-cat-256"
class ImageModelInput(BaseModel):
seed: int | None = Field(
default=None,
description="""
The same seed and the same prompt given to the same version of Stable Diffusion
will output the same image every time.
""",
examples=[176400],
)
num_inference_steps: int = Field(
default=25,
description="""
Increasing the amount of steps tell the model that it should take more steps
to generate your final result which can increase the amount of detail in your image.
""",
gt=0,
le=100,
)
@fal.function(
requirements=[
"diffusers[torch]",
"transformers",
"pydantic<2",
],
machine_type="GPU-T4",
keep_alive=60,
serve=True,
)
def generate_image(input: ImageModelInput) -> Image:
import torch
from diffusers import DDPMPipeline
pipe = DDPMPipeline.from_pretrained(MODEL_NAME, use_safetensors=True)
pipe = pipe.to("cuda")
result = pipe(
num_inference_steps=input.num_inference_steps,
generator=torch.manual_seed(input.seed or torch.seed()),
)
return Image.from_pil(result.images[0])
Running functions
Since served functions become web endpoints that expect a POST
request with a JSON body and it will return the JSON representation of the result of the function, any language with support to HTTP requests can call a served function, using their standard or community provided libraries.
Let's see an example of how to call the generate_image
function from a few popular methods:
curl --request POST \
--url https://$FUNCTION_ID.gateway.alpha.fal.ai/ \
--header "Authorization: Key $FAL_KEY_ID:$FAL_KEY_SECRET" \
--header 'Content-Type: application/json' \
--data '{ "seed": 17600 }'
Implementation best practices
While it is indeed possible to execute the function by directly calling the HTTP endpoint, certain scenarios, especially those involving long-running requests common in machine learning inference and training, demand a more nuanced approach. We recommend you read about the queue and the fal clients to learn how to properly implement such use cases.
Running a Web Endpoint Function through SDK
For fast iteration during development, it is advisable to invoke the fal functions directly from your local Python environment. You can do this by passing the serve=False
option to the on
method of the fal function. That will return a new function reference that you can call directly without the need to publish the function as an endpoint.
Using the previous generate_image
example, add this to the end of the file:
if __name__ == "__main__":
generate_image_through_sdk = generate_image.on(
serve=False,
keep_alive=None,
)
cat_image = generate_image_through_sdk(input=ImageModelInput(seed=176400))
print(f"Here is your cat: {cat_image.url}")
Now you can execute it like any other python file in your local environment.Note that in this example we not only changed serve
to False
, but also set keep_alive
to None
. This is because the keep alive option in this case is only relevant when the function is served as an endpoint. You may have use cases where you want to run subsequent tests locally and keep the function alive, so tweak it as needed.
Prefer the queue in production
While this is a convenience we use ourselves, make sure you publish your functions and consume them through web endpoints for a production-ready implementation.
Authentication
By default, each registered function is private. To access the web endpoint, all requests need to be authenticated.
The simplest way to authenticate is by creating keys and using them as an Authentication Header in the request. If this is your first time accessing a web endpoint, navigate to [Key management](https://fal.ai/dashboard/keys)
to create keys.
curl -X POST https://1714827-joke.gateway.alpha.fal.ai \
-H "Authorization: Key $FAL_KEY_ID:$FAL_KEY_SECRET"
Public Web Endpoints
Alternatively, you can mark your web endpoint as public. When an endpoint is marked as public, the authentication step provided by Fal is skipped, and your endpoint is publicly accessible on the internet.
fal fn serve ./path/to/server.py tell_a_joke --alias joke --auth public
Checking Logs
Web endpoint logs can be accessed via fal CLI and the logs tab of the dashboard. Visit the Logs Viewer (opens in a new tab) on the fal dashboard or use the following command to access the logs through the CLI:
fal fn logs --url https://1714827-joke.gateway.alpha.fal.ai
The command above will print out the latest 100 log entries of your web endpoint. To view more entries, use the following command with the desired number of entries:
fal fn logs --url https://1714827-joke.gateway.alpha.fal.ai --lines 1000
Scaling Web Endpoints
You can configure the maximum number of concurrent apps that can exist simultaneously using the max_concurrency
property of the fal.function decorator. By default, this property is set to 2.
import fal
@fal.function("virtualenv",
requirements=["pyjokes"],
serve=True,
max_concurrency=5,
)
def tell_a_joke() -> str:
import pyjokes
joke = pyjokes.get_joke()
return joke