The Unstructured user interface (UI) is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for Retrieval Augmented Generation (RAG).
Here is a screenshot of the Unstructured UI Start page:
This 90-second video provides a brief overview of the Unstructured UI:
To get your data RAG-ready, Unstructured moves it through the following process:
Connect
Unstructured offers multiple source connectors to connect to your data in its existing location.
Route
Routing determines which strategy Unstructured uses to transform your documents into Unstructured’s canonical JSON schema. Unstructured provides four partitioning strategies for document transformation, as follows.
Unstructured recommends that you choose the Auto partitioning strategy in most cases. With Auto, Unstructured does all the heavy lifting, optimizing at runtime for the highest quality at the lowest cost page-by-page.
You should consider the following additional strategies only if you are absolutely sure that your documents are of the same type. Each of the following strategies are best suited for specific situations. Choosing one of these strategies other than Auto for sets of documents of different types could produce undesirable results, including reduction in transformation quality.
.bmp
, .gif
, .heic
, .jpeg
, .jpg
, .pdf
, .png
, .tiff
, and .webp
.Transform
Your source document is transformed into Unstructured’s canonical JSON schema. Regardless of the input document, this JSON schema gives you a standardized output. It contains more than 20 elements, such as Header
, Footer
, Title
, NarrativeText
, Table
, Image
, and many more. Each document is wrapped in extensive metadata so you can understand languages, file types, sources, hierarchies, and much more.
Chunk
Unstructured provides these chunking strategies:
Enrich
Images and tables can be optionally summarized. This generates enriched content around the images or tables that were parsed during the transformation process.
Embed
Unstructured uses optional third-party embedding providers such as OpenAI.
Persist
Unstructured offers multiple destination connectors, including all major vector databases.
To simplify this process and provide it as a no-code solution, Unstructured brings together these key concepts:
Source Connectors
Source connectors to ingest your data into Unstructured for transformation.
Destination Connectors
Destination connectors tell Unstructured where to write your transformed data to.
Workflow
A workflow connects sources to destinations and provide chunking, embedding, and scheduling options.
Jobs
Jobs enable you to monitor data transformation progress.
The platform is designed for global reach with SOC2 Type 1, SOC2 Type 2, and HIPAA compliance. It has support for over 50 languages.
To use the Unstructured UI, you must have an Unstructured account for billing purposes.
Unstructured offers three account pricing plans:
For more details, see the Unstructured Pricing page.
Some of these plans are billed on a per-page basis.
Unstructured calculates a page as follows:
.pdf
, .pptx
, and .tiff
..docx
files that have page metadata, Unstructured calculates the number of pages based on that metadata.Skip ahead to the quickstart.
Contact us directly, or join our Slack community.
For enterprise support, email Unstructured Sales at sales@unstructured.io.
The Unstructured user interface (UI) is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for Retrieval Augmented Generation (RAG).
Here is a screenshot of the Unstructured UI Start page:
This 90-second video provides a brief overview of the Unstructured UI:
To get your data RAG-ready, Unstructured moves it through the following process:
Connect
Unstructured offers multiple source connectors to connect to your data in its existing location.
Route
Routing determines which strategy Unstructured uses to transform your documents into Unstructured’s canonical JSON schema. Unstructured provides four partitioning strategies for document transformation, as follows.
Unstructured recommends that you choose the Auto partitioning strategy in most cases. With Auto, Unstructured does all the heavy lifting, optimizing at runtime for the highest quality at the lowest cost page-by-page.
You should consider the following additional strategies only if you are absolutely sure that your documents are of the same type. Each of the following strategies are best suited for specific situations. Choosing one of these strategies other than Auto for sets of documents of different types could produce undesirable results, including reduction in transformation quality.
.bmp
, .gif
, .heic
, .jpeg
, .jpg
, .pdf
, .png
, .tiff
, and .webp
.Transform
Your source document is transformed into Unstructured’s canonical JSON schema. Regardless of the input document, this JSON schema gives you a standardized output. It contains more than 20 elements, such as Header
, Footer
, Title
, NarrativeText
, Table
, Image
, and many more. Each document is wrapped in extensive metadata so you can understand languages, file types, sources, hierarchies, and much more.
Chunk
Unstructured provides these chunking strategies:
Enrich
Images and tables can be optionally summarized. This generates enriched content around the images or tables that were parsed during the transformation process.
Embed
Unstructured uses optional third-party embedding providers such as OpenAI.
Persist
Unstructured offers multiple destination connectors, including all major vector databases.
To simplify this process and provide it as a no-code solution, Unstructured brings together these key concepts:
Source Connectors
Source connectors to ingest your data into Unstructured for transformation.
Destination Connectors
Destination connectors tell Unstructured where to write your transformed data to.
Workflow
A workflow connects sources to destinations and provide chunking, embedding, and scheduling options.
Jobs
Jobs enable you to monitor data transformation progress.
The platform is designed for global reach with SOC2 Type 1, SOC2 Type 2, and HIPAA compliance. It has support for over 50 languages.
To use the Unstructured UI, you must have an Unstructured account for billing purposes.
Unstructured offers three account pricing plans:
For more details, see the Unstructured Pricing page.
Some of these plans are billed on a per-page basis.
Unstructured calculates a page as follows:
.pdf
, .pptx
, and .tiff
..docx
files that have page metadata, Unstructured calculates the number of pages based on that metadata.Skip ahead to the quickstart.
Contact us directly, or join our Slack community.
For enterprise support, email Unstructured Sales at sales@unstructured.io.