Chapter 2: Demystifying LCEL & The Runnable Protocol

At the heart of modern LangChain lies LCEL (LangChain Expression Language). To many developers, LCEL looks like magic because it uses Python's bitwise OR operator (|) to stitch prompts, models, and output parsers together.

However, LCEL is not magic. It is a declarative way to compose objects that implement a single, unified interface: the Runnable Protocol.

In this chapter, we will deconstruct the Runnable protocol, examine how chains are compiled behind the scenes, and write a production-grade asynchronous, streaming pipeline.

2.1 The Runnable Interface

Every component in modern LangChain—whether a prompt template, a chat model, an embeddings generator, a retriever, or an output parser—inherits from the base class Runnable (defined in langchain_core.runnables).

The Runnable protocol guarantees that every component implements a standard set of methods for both synchronous and asynchronous execution:

Sync Method	Async Counterpart	Input Type	Output Type	Description
`invoke(input)`	`ainvoke(input)`	Dict / String	Output Object	Executes the component on a single input.
`stream(input)`	`astream(input)`	Dict / String	Generator / Async Gen	Yields chunks of the output in real-time as they generate.
`batch(inputs)`	`abatch(inputs)`	List of inputs	List of outputs	Runs multiple inputs in parallel using a thread pool or async loop.

By enforcing this design pattern, LangChain components are fully composable. Since every component accepts an input and produces an output using the same protocol, you can chain them together like Lego blocks.

2.2 How the Pipe Operator (`|`) Works

In Python, any object can customize its behavior when combined with another object using the bitwise OR operator (|) by overriding the special __or__ and __ror__ magic methods.

When you write:

chain = prompt | llm | parser

LangChain's base Runnable class intercept the operator. Behind the scenes, it executes code equivalent to:

# What LangChain actually compiles
from langchain_core.runnables import RunnableSequence

chain = RunnableSequence(first=prompt, middle=[llm], last=parser)

A RunnableSequence is itself a Runnable. When you call chain.invoke(input), the sequence coordinates the data flow:

It passes input to prompt.invoke(input).
It takes the output (a PromptValue object) and passes it to llm.invoke(PromptValue).
It takes the model's output (an AIMessage) and passes it to parser.invoke(AIMessage).
It returns the final parser output.

This execution pipeline is optimized for concurrency, streaming, and error propagation out-of-the-box.

2.3 Constructing a Production-Grade Chain

Let's build a clean, real-world example: an automated email response draft generator. The chain needs to:

Accept the customer's email and their sentiment classification.
Inject these values into a system prompt.
Query the LLM.
Parse the output into a clean string.

import asyncio
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama
# Alternatively, you can import ChatOpenAI or ChatGroq
# from langchain_openai import ChatOpenAI

# 1. Instantiate the Chat Model
model = ChatOllama(model="llama3.2", temperature=0.2)

# 2. Formulate the Prompt Template
email_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an executive customer relations specialist. Write a professional email reply. "
               "Tone must match the requested sentiment: {sentiment}."),
    ("human", "Customer Email:\n{email_body}")
])

# 3. Use the standard String Output Parser
# This extracts the `.content` from the AIMessage automatically
output_parser = StrOutputParser()

# 4. Compose the RunnableSequence using LCEL
support_reply_chain = email_prompt | model | output_parser

Now we have a unified support_reply_chain object that behaves as a single Runnable.

2.4 Asynchronous Invocation & Streaming

To show why this interface is powerful, let's write a complete asynchronous execution script. We will run a single request, a batch of requests, and a streaming response.

async def run_examples():
    email = """
    Hi team, I bought your software last week, and it keeps crashing on startup. 
    I need a refund or a fix immediately! This is blocking my work.
    """
    
    # --- Example A: Async Invoke ---
    print("--- Async Invocation ---")
    response = await support_reply_chain.ainvoke({
        "sentiment": "apologetic and helpful",
        "email_body": email
    })
    print(response)
    print("\n" + "="*50 + "\n")

    # --- Example B: Async Real-time Token Streaming ---
    print("--- Async Token Streaming ---")
    inputs = {
        "sentiment": "calm, polite, and offering debugging steps",
        "email_body": email
    }
    
    # We call astream() which yields tokens as they are generated by the LLM
    async for token in support_reply_chain.astream(inputs):
        print(token, end="", flush=True)
    print("\n\n" + "="*50 + "\n")

    # --- Example C: Async Batch Processing ---
    print("--- Async Batch Processing ---")
    # Batch running allows running multiple requests concurrently.
    # LangChain executes these in parallel using asyncio tasks under the hood.
    batch_inputs = [
        {
            "sentiment": "concise and corporate", 
            "email_body": "Just checking in on the status of my order #1029."
        },
        {
            "sentiment": "highly enthusiastic", 
            "email_body": "I love your product! Can I write a blog post about it?"
        }
    ]
    
    results = await support_reply_chain.abatch(batch_inputs)
    
    for idx, reply in enumerate(results):
        print(f"Reply {idx + 1}:\n{reply}")
        print("-" * 25)

# To execute the async workflow
if __name__ == "__main__":
    asyncio.run(run_examples())

2.5 Deconstructing Runnable Batch & Threading

When you use .batch() or .abatch(), LangChain optimizes the process:

For models that use network APIs (like OpenAI, Groq, or Gemini), calling .abatch() fires all HTTP requests concurrently using async I/O. This is drastically faster than running them in a sequential for loop.
For local models (like Ollama running on your local machine), parallel batch requests will be submitted to the local server, which processes them according to its thread pool configuration.

This highlights the advantage of LCEL: you write your chain once, and you get synchronous execution, asynchronous execution, streaming, and parallel batching automatically without changing the pipeline definition.

2.6 Summary

You now understand:

The core structure of the Runnable protocol and its six key execution methods.
How the Python | operator compiles into a RunnableSequence.
How to assemble prompts, models, and parsers into clean chains.
How to invoke chains asynchronously, stream outputs token-by-token using astream(), and execute parallel workloads using abatch().

In the next chapter, we will expand on LCEL by looking at components that modify the data flow itself: RunnablePassthrough, RunnableParallel, and RunnableLambda. We will see how to manipulate inputs, run sub-chains concurrently, and integrate custom Python code directly into the sequence.