Chapter 10: Advanced Runnables & Production Deployment with LangServe
Congratulations on reaching the final chapter of the course! You have built chains, managed data flows, configured multi-turn tool loops, connected database memory, and implemented resilience.
The final challenge is putting your application into production. If you are building a SaaS or integrating an AI helper into an enterprise codebase, you cannot run your code as terminal CLI scripts. You need to:
- Define Custom Operations: Wrap arbitrary Python execution blocks into clean, streamable runnables.
- Stream Intermediate Events: Stream back-channel details (like showing "retrieving files..." or "invoking calculator...") to the frontend in real-time.
- Expose APIs: Expose your chain as a fast, documented REST endpoint.
In this chapter, we will master the @chain decorator, explore the astream_events() API, and deploy our chain using LangServe.
10.1 The @chain Decorator
In Chapter 3, we used RunnableLambda to wrap custom Python functions. If your custom function combines other runnables, wrapping them can become complex.
The @chain decorator (imported from langchain_core.runnables) turns any Python function that invokes other runnables into a fully compliant Runnable class automatically.
from langchain_core.runnables import chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser
model = ChatOllama(model="llama3.2")
prompt1 = ChatPromptTemplate.from_template("Give me a list of ingredients for {dish}.")
prompt2 = ChatPromptTemplate.from_template("What is a recipe that uses these ingredients: {ingredients}?")
chain1 = prompt1 | model | StrOutputParser()
chain2 = prompt2 | model | StrOutputParser()
# The @chain decorator turns this function into a Runnable
@chain
def recipe_generator_chain(inputs: dict) -> str:
# We invoke chain1 synchronously inside the function
ingredients = chain1.invoke({"dish": inputs["dish"]})
print(f"-> Intermediate ingredients extracted: {ingredients[:30]}...")
# Pass output directly to chain2
recipe = chain2.invoke({"ingredients": ingredients})
return recipe
By applying the decorator, recipe_generator_chain gets access to the standard Runnable methods: .invoke(), .ainvoke(), .stream(), and .batch().
10.2 Intermediate Event Streaming: astream_events()
When executing a complex RAG chain or tool-calling loop, the standard .stream() method only yields the final text tokens.
If your frontend needs to show a spinner stating "Searching vector store..." or "Calculating compound interest...", you need intermediate event data.
The astream_events(version="v2") API is designed for this. It yields a stream of fine-grained events for every step in the pipeline.
The most common event types are:
on_chat_model_stream: Fired when model tokens are streaming.on_retriever_end: Fired when a vector store retrieval finishes (contains retrieved documents).on_tool_start/on_tool_end: Fired when tool invocations start and end.
Let's write a script to consume these events:
async for event in chain.astream_events({"dish": "lasagna"}, version="v2"):
event_type = event["event"]
if event_type == "on_chat_model_stream":
# Stream text token
token = event["data"]["chunk"].content
print(token, end="", flush=True)
elif event_type == "on_retriever_end":
# Capture retrieved sources
docs = event["data"]["output"]
print(f"\n[EVENT] Retrieved {len(docs)} document sources.")
elif event_type == "on_tool_start":
print(f"\n[EVENT] Executing tool: {event['name']}...")
10.3 Deploying APIs with LangServe
Once your chain is built, you can expose it using LangServe. LangServe wraps your LCEL runnables in a FastAPI application, instantly generating high-performance API endpoints.
Step 1: Install LangServe
pip install "langserve[server]" fastapi uvicorn
Step 2: Define and Serve the Application
Create a file named server.py:
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama
from langserve import add_routes
# 1. Define the Runnable Chain
model = ChatOllama(model="llama3.2", temperature=0.2)
prompt = ChatPromptTemplate.from_template("Translate the following sentence into {language}:\n\n{text}")
chain = prompt | model | StrOutputParser()
# 2. Initialize FastAPI
app = FastAPI(
title="LangChain API Server",
version="1.0",
description="A production-grade translation server powered by LangChain and LangServe"
)
# 3. Add Routes using LangServe
# This automatically creates /translate/invoke, /translate/stream, /translate/batch, etc.
add_routes(
app,
chain,
path="/translate"
)
if __name__ == "__main__":
import uvicorn
# Start the server on http://localhost:8000
uvicorn.run(app, host="localhost", port=8000)
Step 3: Running and Using the Endpoints
Run the server using your terminal:
python server.py
LangServe creates several endpoints automatically:
- POST
/translate/invoke: Submits a single request and returns the final parsed result. - POST
/translate/stream: Returns an EventStream of tokens, perfect for streaming text to a frontend client. - GET
/translate/playground: A beautiful, fully interactive browser UI. You can type variables, select settings, and watch the execution and token generation stream in real-time.
10.4 Querying LangServe from a Remote Client
To consume your served runnables in another Python project, LangServe provides a RemoteRunnable client that mirrors the native Runnable interface.
from langserve import RemoteRunnable
# Initialize connection
remote_chain = RemoteRunnable("http://localhost:8000/translate")
# Call it exactly like a local runnable!
result = remote_chain.invoke({
"language": "Italian",
"text": "Antigravity makes development effortless."
})
print(result)
This completes the cycle: you write a chain locally, host it on a server with LangServe, and query it from your mobile application or backend services with the exact same LCEL methods.
10.5 Course Conclusion
Congratulations! You have completed the LangChain Core & LCEL Course.
Throughout these 10 chapters, you have acquired the skills to:
- Navigate the modular modern architecture (
langchain-corevs. partner packages). - Deconstruct and compose the Runnable protocol using the pipe operator (
|). - Control complex schema architectures using
RunnablePassthroughandRunnableParallel. - Inject custom Python algorithms using
RunnableLambdaand the@chaindecorator. - Route inputs dynamically using functional conditionals.
- Force deterministic, validation-guaranteed JSON outputs using
.with_structured_output()and Pydantic. - Design multi-turn custom tool execution loops using manual
ToolMessagehandling. - Connect database storage to stateful conversations and write token-trimming rules.
- Build RAG pipelines from scratch with source document citation.
- Ensure fault tolerance via
.with_fallbacks()and trace telemetry via Callbacks. - Stream intermediate events and serve models as APIs using LangServe.
With these primitives, you are equipped to build production-grade, highly optimized, and maintainable AI applications.
Happy Coding!