Chapter 105 min read

Chapter 10: Advanced Runnables & Production Deployment with LangServe

Congratulations on reaching the final chapter of the course! You have built chains, managed data flows, configured multi-turn tool loops, connected database memory, and implemented resilience.

The final challenge is putting your application into production. If you are building a SaaS or integrating an AI helper into an enterprise codebase, you cannot run your code as terminal CLI scripts. You need to:

  1. Define Custom Operations: Wrap arbitrary Python execution blocks into clean, streamable runnables.
  2. Stream Intermediate Events: Stream back-channel details (like showing "retrieving files..." or "invoking calculator...") to the frontend in real-time.
  3. Expose APIs: Expose your chain as a fast, documented REST endpoint.

In this chapter, we will master the @chain decorator, explore the astream_events() API, and deploy our chain using LangServe.


10.1 The @chain Decorator

In Chapter 3, we used RunnableLambda to wrap custom Python functions. If your custom function combines other runnables, wrapping them can become complex.

The @chain decorator (imported from langchain_core.runnables) turns any Python function that invokes other runnables into a fully compliant Runnable class automatically.

from langchain_core.runnables import chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser

model = ChatOllama(model="llama3.2")
prompt1 = ChatPromptTemplate.from_template("Give me a list of ingredients for {dish}.")
prompt2 = ChatPromptTemplate.from_template("What is a recipe that uses these ingredients: {ingredients}?")

chain1 = prompt1 | model | StrOutputParser()
chain2 = prompt2 | model | StrOutputParser()

# The @chain decorator turns this function into a Runnable
@chain
def recipe_generator_chain(inputs: dict) -> str:
    # We invoke chain1 synchronously inside the function
    ingredients = chain1.invoke({"dish": inputs["dish"]})
    print(f"-> Intermediate ingredients extracted: {ingredients[:30]}...")
    
    # Pass output directly to chain2
    recipe = chain2.invoke({"ingredients": ingredients})
    return recipe

By applying the decorator, recipe_generator_chain gets access to the standard Runnable methods: .invoke(), .ainvoke(), .stream(), and .batch().


10.2 Intermediate Event Streaming: astream_events()

When executing a complex RAG chain or tool-calling loop, the standard .stream() method only yields the final text tokens.

If your frontend needs to show a spinner stating "Searching vector store..." or "Calculating compound interest...", you need intermediate event data.

The astream_events(version="v2") API is designed for this. It yields a stream of fine-grained events for every step in the pipeline.

The most common event types are:

  • on_chat_model_stream: Fired when model tokens are streaming.
  • on_retriever_end: Fired when a vector store retrieval finishes (contains retrieved documents).
  • on_tool_start / on_tool_end: Fired when tool invocations start and end.

Let's write a script to consume these events:

async for event in chain.astream_events({"dish": "lasagna"}, version="v2"):
    event_type = event["event"]
    
    if event_type == "on_chat_model_stream":
        # Stream text token
        token = event["data"]["chunk"].content
        print(token, end="", flush=True)
        
    elif event_type == "on_retriever_end":
        # Capture retrieved sources
        docs = event["data"]["output"]
        print(f"\n[EVENT] Retrieved {len(docs)} document sources.")
        
    elif event_type == "on_tool_start":
        print(f"\n[EVENT] Executing tool: {event['name']}...")

10.3 Deploying APIs with LangServe

Once your chain is built, you can expose it using LangServe. LangServe wraps your LCEL runnables in a FastAPI application, instantly generating high-performance API endpoints.

Step 1: Install LangServe

pip install "langserve[server]" fastapi uvicorn

Step 2: Define and Serve the Application

Create a file named server.py:

from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama
from langserve import add_routes

# 1. Define the Runnable Chain
model = ChatOllama(model="llama3.2", temperature=0.2)
prompt = ChatPromptTemplate.from_template("Translate the following sentence into {language}:\n\n{text}")
chain = prompt | model | StrOutputParser()

# 2. Initialize FastAPI
app = FastAPI(
    title="LangChain API Server",
    version="1.0",
    description="A production-grade translation server powered by LangChain and LangServe"
)

# 3. Add Routes using LangServe
# This automatically creates /translate/invoke, /translate/stream, /translate/batch, etc.
add_routes(
    app,
    chain,
    path="/translate"
)

if __name__ == "__main__":
    import uvicorn
    # Start the server on http://localhost:8000
    uvicorn.run(app, host="localhost", port=8000)

Step 3: Running and Using the Endpoints

Run the server using your terminal:

python server.py

LangServe creates several endpoints automatically:

  • POST /translate/invoke: Submits a single request and returns the final parsed result.
  • POST /translate/stream: Returns an EventStream of tokens, perfect for streaming text to a frontend client.
  • GET /translate/playground: A beautiful, fully interactive browser UI. You can type variables, select settings, and watch the execution and token generation stream in real-time.

10.4 Querying LangServe from a Remote Client

To consume your served runnables in another Python project, LangServe provides a RemoteRunnable client that mirrors the native Runnable interface.

from langserve import RemoteRunnable

# Initialize connection
remote_chain = RemoteRunnable("http://localhost:8000/translate")

# Call it exactly like a local runnable!
result = remote_chain.invoke({
    "language": "Italian",
    "text": "Antigravity makes development effortless."
})

print(result)

This completes the cycle: you write a chain locally, host it on a server with LangServe, and query it from your mobile application or backend services with the exact same LCEL methods.


10.5 Course Conclusion

Congratulations! You have completed the LangChain Core & LCEL Course.

Throughout these 10 chapters, you have acquired the skills to:

  1. Navigate the modular modern architecture (langchain-core vs. partner packages).
  2. Deconstruct and compose the Runnable protocol using the pipe operator (|).
  3. Control complex schema architectures using RunnablePassthrough and RunnableParallel.
  4. Inject custom Python algorithms using RunnableLambda and the @chain decorator.
  5. Route inputs dynamically using functional conditionals.
  6. Force deterministic, validation-guaranteed JSON outputs using .with_structured_output() and Pydantic.
  7. Design multi-turn custom tool execution loops using manual ToolMessage handling.
  8. Connect database storage to stateful conversations and write token-trimming rules.
  9. Build RAG pipelines from scratch with source document citation.
  10. Ensure fault tolerance via .with_fallbacks() and trace telemetry via Callbacks.
  11. Stream intermediate events and serve models as APIs using LangServe.

With these primitives, you are equipped to build production-grade, highly optimized, and maintainable AI applications.

Happy Coding!

    Chapter 10: Advanced Runnables & Production Deployment with LangServe — Mastering LangChain: From Basics to Stateful Agents | Krishna Tiwari