Notebooks contain complete working sample code for end-to-end solutions.
Learn how to build an end-to-end document processing pipeline that processes PDFs from S3 and stores structured results in MongoDB. Features VLM-powered partitioning, semantic chunking, and vector embeddings using the Unstructured Workflows API.
Unstructured API
Workflows
S3
MongoDB
VLM
Embeddings
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.
Unstructured API
Workflows
Azure Blob Storage
IBM watsonx.data
Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.
Unstructured API
Snowflake Cortex
RAG Search
Workflows
S3
Build Agentic RAG with LangGraph
and Together AI
and compare the results with Vanilla RAG in pure Python
Unstructured API
Workflows
Agents
LangGraph
Together AI
Astra DB
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.
Unstructured API
Workflows
Azure Blob Storage
Snowflake
Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.
Unstructured API
Workflows
Graph RAG
NER
Astra DB
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.
Unstructured API
Workflows
Databricks
S3
Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.
Unstructured API
Workflows
MongoDB
Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
Unstructured API
Workflows
S3
Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
Databricks
Introductory notebook
Build Agentic RAG with smolagents
library and compare the results with Vanilla RAG in pure Python
GPT-4o
smolagents
Agents
DataStax
S3
Advanced notebook
Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain
GPT-4o
Ragas
LangChain
Llama3.2
Pinecone
S3
Advanced notebook
Process a file in S3 with Unstructured and return images in your RAG output
S3
FAISS
GPT-4o-mini
Advanced notebook
From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex
Unstructured API
Hex
Advanced notebook
Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
Unstructured API
PII
GLiNER
Advanced notebook
Extract custom metadata, and enable metadata pre-filtering in your RAG.
Unstructured API
MongoDB
Metadata
Advanced notebook
End-to-end data processing pipeline using Unstructured Serverless API.
Unstructured API
Hugging Face
Advanced notebook
A RAG system with the Llama 3 model from Hugging Face.
Unstructured API
🤗 Hugging Face
LangChain
Llama 3
Introductory notebook
Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
Unstructured API
SingleStoreDB
AWS S3
Introductory notebook
Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API
Unstructured API
Google
DataStax
Introductory notebook
Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API
Unstructured API
OpenAI
Weaviate
Introductory notebook
Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
Unstructured API
AWS S3
Elasticsearch
Introductory notebook
Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
Unstructured API
Google Drive
Databricks
Introductory notebook
Add document source references to RAG responses based on documents metadata.
Unstructured API
RAG
LangChain
Intermediate notebook
Send a PDF to Unstructured for processing, and send a subset of the returned PDF’s processed text to HuggingChat for chatbot-style querying.
Unstructured API
🤗 Hugging Face
🤗 HuggingChat
Introductory notebook
Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
Unstructured API
LangChain
Ollama
Llama 3
Introductory notebook
A RAG solution that is based on PowerPoint files.
Unstructured API
🤗 Hugging Face
LangChain
Llama 3
Introductory notebook
Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Unstructured API
GPT-4o
Ragas
LangChain
Advanced notebook
Notebooks contain complete working sample code for end-to-end solutions.
Learn how to build an end-to-end document processing pipeline that processes PDFs from S3 and stores structured results in MongoDB. Features VLM-powered partitioning, semantic chunking, and vector embeddings using the Unstructured Workflows API.
Unstructured API
Workflows
S3
MongoDB
VLM
Embeddings
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.
Unstructured API
Workflows
Azure Blob Storage
IBM watsonx.data
Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.
Unstructured API
Snowflake Cortex
RAG Search
Workflows
S3
Build Agentic RAG with LangGraph
and Together AI
and compare the results with Vanilla RAG in pure Python
Unstructured API
Workflows
Agents
LangGraph
Together AI
Astra DB
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.
Unstructured API
Workflows
Azure Blob Storage
Snowflake
Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.
Unstructured API
Workflows
Graph RAG
NER
Astra DB
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.
Unstructured API
Workflows
Databricks
S3
Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.
Unstructured API
Workflows
MongoDB
Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
Unstructured API
Workflows
S3
Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
Databricks
Introductory notebook
Build Agentic RAG with smolagents
library and compare the results with Vanilla RAG in pure Python
GPT-4o
smolagents
Agents
DataStax
S3
Advanced notebook
Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain
GPT-4o
Ragas
LangChain
Llama3.2
Pinecone
S3
Advanced notebook
Process a file in S3 with Unstructured and return images in your RAG output
S3
FAISS
GPT-4o-mini
Advanced notebook
From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex
Unstructured API
Hex
Advanced notebook
Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
Unstructured API
PII
GLiNER
Advanced notebook
Extract custom metadata, and enable metadata pre-filtering in your RAG.
Unstructured API
MongoDB
Metadata
Advanced notebook
End-to-end data processing pipeline using Unstructured Serverless API.
Unstructured API
Hugging Face
Advanced notebook
A RAG system with the Llama 3 model from Hugging Face.
Unstructured API
🤗 Hugging Face
LangChain
Llama 3
Introductory notebook
Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
Unstructured API
SingleStoreDB
AWS S3
Introductory notebook
Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API
Unstructured API
Google
DataStax
Introductory notebook
Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API
Unstructured API
OpenAI
Weaviate
Introductory notebook
Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
Unstructured API
AWS S3
Elasticsearch
Introductory notebook
Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
Unstructured API
Google Drive
Databricks
Introductory notebook
Add document source references to RAG responses based on documents metadata.
Unstructured API
RAG
LangChain
Intermediate notebook
Send a PDF to Unstructured for processing, and send a subset of the returned PDF’s processed text to HuggingChat for chatbot-style querying.
Unstructured API
🤗 Hugging Face
🤗 HuggingChat
Introductory notebook
Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
Unstructured API
LangChain
Ollama
Llama 3
Introductory notebook
A RAG solution that is based on PowerPoint files.
Unstructured API
🤗 Hugging Face
LangChain
Llama 3
Introductory notebook
Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Unstructured API
GPT-4o
Ragas
LangChain
Advanced notebook