LangChain
Overview
LangChain is an open-source framework for building applications that integrate large language models with external code, data sources, memory systems, and tools. The framework provides a compositional architecture where individual components (models, prompts, retrievers, tools) are connected into chains using a declarative syntax. LangChain enables developers to build complex AI workflows including conversational agents, retrieval-augmented generation (RAG) systems, and multi-step reasoning applications.
The framework addresses the problem of orchestrating LLM capabilities with external systems through a Runnable interface that standardizes component interactions. LangChain supports Python and JavaScript, with extensive community-driven integrations for data sources, vector databases, and AI services.
Key technical components covered:
- Core architecture and package structure
- Runnable interface and execution patterns
- LangChain Expression Language (LCEL)
- Memory management systems
- Agent framework and ReAct pattern
- Vector stores, retrievers, and RAG architecture
- Output parsers and structured data
- LangGraph for stateful workflows
- LangSmith for observability
- Version history and breaking changes
Core Architecture and Package Structure
LangChain’s architecture comprises three core packages:
langchain-core defines base abstractions for LLMs, vector stores, retrievers, and other components. Establishes interfaces and lightweight dependencies necessary for component interoperability. Provides the Runnable interface as the foundational abstraction for all executable components.
langchain builds on langchain-core with higher-level functionality including chains, agents, and retrieval strategies. Forms the cognitive architecture of applications through composition of core primitives. As of version 0.2.0 (May 2024), this package became integration-agnostic, requiring explicit specification of chat models, LLMs, embedding models, and vector stores.
langchain-community contains third-party integrations maintained by the community. Includes 500+ connectors for components like LLMs (OpenAI, Anthropic, HuggingFace), vector stores (Pinecone, Weaviate, Chroma), and various tools and data sources.
Integration packages like langchain-openai and langchain-pinecone provide lightweight adapters wrapping provider APIs and exposing them as LangChain components.
Runnable Interface and Execution Patterns
The Runnable interface serves as the foundational abstraction enabling consistent interaction with all LangChain components. Each Runnable transforms inputs into outputs through standardized methods:
invoke processes a single input synchronously, returning the corresponding output.
ainvoke asynchronously processes a single input, returning an awaitable output.
batch efficiently processes multiple inputs in parallel using a thread pool executor, returning a list of outputs in the same order as inputs. The default implementation is effective for I/O-bound operations, though Python’s Global Interpreter Lock limits true parallelism for CPU-bound tasks. Some Runnables provide optimized batch implementations.
stream yields output chunks as they are produced, enabling real-time processing and faster time-to-first-token.
streamEvents provides advanced streaming of intermediate steps and final output.
Each Runnable defines specific input and output types:
- Prompt: Input
object, OutputPromptValue - ChatModel: Input
string/messages/PromptValue, OutputChatMessage - LLM: Input
string/messages/PromptValue, Outputstring - OutputParser: Input LLM/ChatModel output, Output varies by parser
- Retriever: Input
string, Output list ofDocuments - Tool: Input
string/object, Output varies by tool
RunnableConfig enables runtime customization with parameters including runName (custom run identifier), runId (unique identifier), tags (for filtering), metadata (tracking data), callbacks (lifecycle hooks), maxConcurrency (batch parallelism limit), and recursionLimit (prevents infinite recursion).
RunnableLambda wraps arbitrary functions into Runnables for simple transformations. RunnableGenerator handles more complex transformations when streaming is required.
LangChain Expression Language (LCEL)
LCEL provides declarative syntax for composing components into chains using Python operators. The design enables LangChain to optimize execution automatically without requiring developers to specify implementation details.
Pipe operator (|) creates RunnableSequence for sequential composition:
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
model = ChatOpenAI(model="gpt-4")
chain = prompt | model
result = chain.invoke({"topic": "programming"})RunnableParallel enables parallel execution using dictionary literals:
from langchain_core.runnables import RunnableParallel, RunnableLambda
parallel_chain = RunnableParallel(
uppercase=RunnableLambda(lambda x: x.upper()),
lowercase=RunnableLambda(lambda x: x.lower())
)
result = parallel_chain.invoke("Hello World")
# Output: {'uppercase': 'HELLO WORLD', 'lowercase': 'hello world'}LCEL supports asynchronous execution through the Runnable Async API, enabling efficient handling of concurrent requests. Chains can stream outputs incrementally, providing faster response times in real-time applications. The declarative nature allows LangChain to apply optimizations like automatic batching and parallel execution where appropriate.
Memory Management Systems
LangChain provides multiple memory implementations for maintaining conversation context:
ConversationBufferMemory stores complete conversation history. Maintains all exchanges between user and AI, providing full context but consuming increasing tokens as conversations lengthen.
ConversationBufferWindowMemory implements a sliding window mechanism retaining only the most recent k exchanges. When the buffer exceeds k exchanges, the oldest messages are automatically discarded. The save_context() method stores new exchanges, while load_memory_variables() retrieves the current buffer as conversation history for the LLM.
from langchain.memory import ConversationBufferWindowMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
memory = ConversationBufferWindowMemory(k=3)
llm = ChatOpenAI(model="gpt-3.5-turbo")
conversation = ConversationChain(llm=llm, memory=memory)ConversationSummaryMemory periodically summarizes conversation history to reduce token usage while maintaining context. Uses an LLM to generate summaries of past exchanges.
ConversationSummaryBufferMemory combines buffer and summary approaches, keeping recent messages verbatim while summarizing older exchanges.
Memory systems integrate with chains through the memory parameter, automatically managing context loading and saving during chain execution.
Agent Framework and ReAct Pattern
LangChain agents follow the ReAct (Reasoning + Acting) pattern, alternating between reasoning steps and tool executions. Agents reason about tasks, decide on necessary actions, invoke tools, observe results, and iteratively refine responses.
Agent components:
Model serves as the reasoning engine, typically an LLM supporting function calling.
Tools are external functions or APIs the agent can invoke. Defined using the @tool decorator:
from langchain_core.tools import tool
@tool
def search(query: str) -> str:
"""Search for information."""
return f"Results for: {query}"Prompt template structures the agent’s reasoning and action process with placeholders for tools, tool names, and agent scratchpad tracking thoughts and actions.
Implementation uses create_react_agent:
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4", temperature=0.1)
agent = create_react_agent(model, tools=[search], prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=[search], verbose=True)
response = agent_executor.invoke({"input": "What is the capital of France?"})The agent iteratively executes Thought → Action → Observation cycles until reaching a final answer or meeting termination conditions.
Vector Stores, Retrievers, and RAG Architecture
LangChain’s RAG (Retrieval-Augmented Generation) architecture combines LLM generation with external knowledge retrieval through vector similarity search.
Embeddings are fixed-length numerical vectors representing semantic content of text. LangChain integrates with embedding models from OpenAI, Cohere, HuggingFace, and others. Embedding models convert text into vector representations enabling semantic similarity searches.
Vector stores are specialized databases storing embeddings with efficient similarity search capabilities. LangChain supports Elasticsearch, MongoDB Atlas, Pinecone, Weaviate, Chroma, Qdrant, and others. Each vector store implementation provides methods for storing document embeddings with metadata and performing similarity searches.
Retrievers interface with vector stores to retrieve relevant documents based on queries. Created from vector stores using .as_retriever() method. Retrievers abstract the logic of document fetching and support various retrieval strategies including similarity search, MMR (Maximum Marginal Relevance), and hybrid retrieval combining sparse and dense search techniques.
RAG workflow:
- Document processing: Raw documents are loaded and split into chunks
- Embedding generation: Each chunk is converted to vector embedding
- Storage: Embeddings with metadata stored in vector store
- Retrieval: Query converted to embedding, similar documents retrieved via similarity search
- Augmentation: Retrieved documents augment the original query
- Generation: LLM generates response using retrieved context
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
# Create embeddings and store
embeddings = OpenAIEmbeddings()
documents = [Document(page_content="...", metadata={...})]
vectorstore = Chroma.from_documents(documents, embeddings)
# Retrieve and generate
retriever = vectorstore.as_retriever()
docs = retriever.invoke("query")Output Parsers and Structured Data
LangChain provides multiple approaches for parsing LLM outputs into structured formats using Pydantic models.
with_structured_output() method works with models natively supporting structured outputs (function/tool calling):
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
class ResponseFormatter(BaseModel):
answer: str = Field(description="Main answer")
confidence: float = Field(description="Confidence score 0-1")
sources: list[str] = Field(description="Reference sources")
model = ChatOpenAI(model="gpt-4")
structured_llm = model.with_structured_output(ResponseFormatter)
response = structured_llm.invoke("What is LangChain?")PydanticOutputParser works with models not natively supporting structured outputs. Uses prompt templates to instruct models to generate specific formats:
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
class Person(BaseModel):
name: str = Field(..., description="Person's name")
height_in_meters: float = Field(..., description="Height in meters")
parser = PydanticOutputParser(pydantic_object=Person)
prompt = ChatPromptTemplate.from_messages([
("system", "Answer the query. Wrap output in `json` tags\n{format_instructions}"),
("human", "{query}"),
]).partial(format_instructions=parser.get_format_instructions())
chain = prompt | model | parserStructuredOutputParser parses outputs using defined response schemas supporting multiple fields with type specifications.
LangGraph: Stateful Workflows
LangGraph extends LangChain for building stateful multi-actor applications by modeling workflows as graphs with nodes and edges. Enables features including human-in-the-loop, memory, time travel, and fault tolerance.
Threads assign each workflow execution a unique thread_id representing a distinct session. Enables tracking and managing state across multiple interactions.
Checkpoints are snapshots of state saved at each super-step during graph execution. Each checkpoint includes config, metadata, values (current state channel values), next (nodes scheduled for next execution), and tasks (task information including errors/interrupts).
Checkpointer implementations:
- InMemorySaver: Stores checkpoints in memory for experimentation
- SqliteSaver: Saves checkpoints to SQLite database for local workflows
- PostgresSaver: Uses PostgreSQL for production environments
Each checkpointer implements .put (store checkpoints), .get_tuple (fetch checkpoint), and .list (list checkpoints by criteria).
Serializers convert state into storable format. Default JsonPlusSerializer handles LangChain/LangGraph primitives, datetimes, and enums, with fallback to pickle for unsupported objects.
Checkpointing enables human-in-the-loop (inspect, interrupt, approve steps), memory (retain context across interactions), time travel (replay executions, fork at arbitrary checkpoints), and fault tolerance (resume from last successful checkpoint).
LangSmith: Observability and Monitoring
LangSmith provides tracing, evaluation, and monitoring for LLM applications through unified architecture.
Tracing captures inputs, outputs, and internal states at each step using the @traceable decorator. Execution data is organized into hierarchical run trees visualizing operation sequences and relationships.
Evaluation creates datasets from production traces or manual uploads. Supports experiments running over datasets to assess model performance. Provides LLM-as-Judge evaluators using language models to assess outputs and heuristic evaluators applying deterministic metrics.
Monitoring provides real-time dashboards displaying response times, token usage, error rates, and cost estimations. Supports alerts for anomalies and automations triggering actions like sending data to annotation queues or running online evaluations.
Setup:
pip install "langsmith[otel]" langchain
export LANGSMITH_OTEL_ENABLED=true
export LANGSMITH_TRACING=true
export LANGSMITH_ENDPOINT=https://api.smith.langchain.com
export LANGSMITH_API_KEY=<your_api_key>LangSmith seamlessly integrates with LangChain and LangGraph, providing observability without significant code modifications.
Version History and Breaking Changes
Version 0.1.0 (January 2024) marked certain classes and methods for removal in 0.2.0.
Version 0.2.0 (May 2024) made langchain package integration-agnostic, requiring explicit specification of models and stores. Functions and classes now require explicit LLMs as arguments. The @tool decorator’s behavior changed to assign function docstring as tool description. Migration required installing 0.2.x versions, verifying code, using langchain-cli to update imports, resolving deprecation warnings, and migrating astream_events to version 2.
Version 0.3.0 (September 2024) upgraded all packages from Pydantic 1 to Pydantic 2, eliminating compatibility bridges. Dropped Python 3.8 support. In JavaScript, @langchain/core became a peer dependency requiring explicit installation, callbacks became non-blocking by default requiring await in serverless environments, and deprecated document loaders and Google PaLM entry points were removed.
Version 1.0.0 (October 2025) dropped Python 3.9 support, requiring Python 3.10 or higher. Moved legacy code outside standard interfaces and agents to langchain-classic package. Pre-bound models are no longer supported. Import paths for agents and related components changed. Migration requires upgrading to Python 3.10+, installing langchain-classic for legacy functionality, and updating import paths.