The Unstructured JavaScript/TypeScript SDK client allows you to send one file at a time for processing by the Unstructured Partition Endpoint.
To use the JavaScript/TypeScript SDK, you’ll first need to set an environment variable named UNSTRUCTURED_API_KEY
,
representing your Unstructured API key. To get your API key, do the following:
Sign in to your Unstructured account:
Get your Unstructured API key:
a. In the Unstructured UI, click API Keys on the sidebar.
b. Click Generate API Key.
c. Follow the on-screen instructions to finish generating the key.
d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Before using the SDK to interact with Unstructured, install the library:
The SDK uses semantic versioning and major bumps could bring breaking changes. It is advised to pin your installed version.
Let’s start with a simple example in which you send a PDF document to the Unstructured Partition Endpoint to be partitioned by Unstructured.
The JavaScript/TypeScript SDK has the following breaking changes in v0.11.0:
dist
path have moved up a levelchunkingStrategy
, outputFormat
, and strategy
partition
have moved to a partitionParameters
objectFor a code example that works with an entire directory of files instead of just a single PDF, see the Processing multiple files section.
In order to speed up processing of large PDF files, the splitPdfPage
* parameter is true
by default. This
causes the PDF to be split into small batches of pages before sending requests to the API. The client
awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other
filetypes are ignored.
The number of parallel requests is controlled by splitPdfConcurrencyLevel
*.
The default is 8 and the max is set to 15 to avoid high resource usage and costs.
If at least one request is successful, the responses are combined into a single response object. An error is returned only if all requests failed or there was an error during splitting.
This feature may lead to unexpected results when chunking because the server does not see the entire document context at once. If you’d like to chunk across the whole document and still get the speedup from parallel processing, you can:
splitPdfPage
set to true
, without any chunking parameters.results.json
.You can also change the defaults for retries through the retryConfig
*
when initializing the client. If a request to the API fails, the client will retry the
request with an exponential backoff strategy up to a maximum interval of one minute. The
function keeps retrying until the total elapsed time exceeds maxElapsedTime
*,
which defaults to one hour:
The code example in the Basics section processes a single PDF file. But what if you want to process multiple files inside a directory with a mixture of subdirectories and files with different file types?
The following example takes an input directory path to read files from and an output directory path to write the processed data to, processing one file at a time.
The parameter names used in this document are for the JavaScript/TypeScript SDK, which follows the camelCase
convention. The Python SDK follows the snake_case
convention. Other than this difference in naming convention,
the names used in the SDKs are the same across all methods.
The Unstructured JavaScript/TypeScript SDK client allows you to send one file at a time for processing by the Unstructured Partition Endpoint.
To use the JavaScript/TypeScript SDK, you’ll first need to set an environment variable named UNSTRUCTURED_API_KEY
,
representing your Unstructured API key. To get your API key, do the following:
Sign in to your Unstructured account:
Get your Unstructured API key:
a. In the Unstructured UI, click API Keys on the sidebar.
b. Click Generate API Key.
c. Follow the on-screen instructions to finish generating the key.
d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Before using the SDK to interact with Unstructured, install the library:
The SDK uses semantic versioning and major bumps could bring breaking changes. It is advised to pin your installed version.
Let’s start with a simple example in which you send a PDF document to the Unstructured Partition Endpoint to be partitioned by Unstructured.
The JavaScript/TypeScript SDK has the following breaking changes in v0.11.0:
dist
path have moved up a levelchunkingStrategy
, outputFormat
, and strategy
partition
have moved to a partitionParameters
objectFor a code example that works with an entire directory of files instead of just a single PDF, see the Processing multiple files section.
In order to speed up processing of large PDF files, the splitPdfPage
* parameter is true
by default. This
causes the PDF to be split into small batches of pages before sending requests to the API. The client
awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other
filetypes are ignored.
The number of parallel requests is controlled by splitPdfConcurrencyLevel
*.
The default is 8 and the max is set to 15 to avoid high resource usage and costs.
If at least one request is successful, the responses are combined into a single response object. An error is returned only if all requests failed or there was an error during splitting.
This feature may lead to unexpected results when chunking because the server does not see the entire document context at once. If you’d like to chunk across the whole document and still get the speedup from parallel processing, you can:
splitPdfPage
set to true
, without any chunking parameters.results.json
.You can also change the defaults for retries through the retryConfig
*
when initializing the client. If a request to the API fails, the client will retry the
request with an exponential backoff strategy up to a maximum interval of one minute. The
function keeps retrying until the total elapsed time exceeds maxElapsedTime
*,
which defaults to one hour:
The code example in the Basics section processes a single PDF file. But what if you want to process multiple files inside a directory with a mixture of subdirectories and files with different file types?
The following example takes an input directory path to read files from and an output directory path to write the processed data to, processing one file at a time.
The parameter names used in this document are for the JavaScript/TypeScript SDK, which follows the camelCase
convention. The Python SDK follows the snake_case
convention. Other than this difference in naming convention,
the names used in the SDKs are the same across all methods.