If you’re new to Unstructured, read this note first.
Before you can create a source connector, you must first sign in to your Unstructured account:
After you sign in, the Unstructured user interface (UI) appears, which you use to get your Unstructured API key. To learn how, watch this 40-second how-to video.
After you create the source connector, add it along with a destination connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on Workflow Endpoint quickstart, go directly to the quickstart notebook, or watch the two 4-minute video tutorials for the Unstructured Python SDK.
You can also create source connectors with the Unstructured user interface (UI). Learn how.
If you need help, reach out to the community on Slack, or contact us directly.
You are now ready to start creating a source connector! Keep reading to learn how.
Ingest your files into Unstructured from Snowflake.
The requirements are as follows.
A Snowflake account and its account identifier.
To get the identifier for the current Snowflake account:
Alternatively, the following Snowflake query returns the current account’s identifier:
The Snowflake user’s login name (not its username) and its password in the account.
To view the login name for a user:
Alternatively, the following Snowflake query returns information about the user with the username of <my-user>
, including their login_name
value representing their login name:
The name of the Snowflake role that the user belongs to and that also has sufficient access to the Snowflake database, schema, table, and host.
CREATE DATABASE
privilege at the current account level; and USAGE
privilege on the warehouse that is used to create the database.USAGE
privilege on the database and the warehouse that is used to create the schema; and CREATE SCHEMA
on the database.USAGE
privilege on the database and schema and the warehouse that is used to create the table; and CREATE TABLE
on the schema.USAGE
privilege on the database and schema and the warehouse that is used to write to the table; and INSERT
on the table.USAGE
privilege on the database and schema and the warehouse that is used to write to the table; and SELECT
on the table.To view a list of available roles in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available roles in the current account:
The Snowflake warehouse’s hostname and its port number in the account.
To view a list of available warehouses in the current Snowflake account:
The following Snowflake query returns a list of available warehouse types, hostnames, and port numbers in the current account. Look for the row with a type
of SNOWFLAKE_DEPLOYMENT
:
The name of the Snowflake database in the account.
To view a list of available databases in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available databases in the current account:
The name of the schema in the database.
To view a list of available schemas for a database in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available schemas in the current account:
The following Snowflake query returns a list of available schemas for the database named <database_name>
in the current account:
The name of the table in the schema.
To view a list of available tables for a schema in a database in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available tables for the schema named <schema_name>
in the datbase named
<database_name>
in the current account:
Snowflake requires the target table to have a defined schema before Unstructured can write to the table. The recommended table
schema for Unstructured is as follows. In the following CREATE TABLE
statement, replace the following placeholders with the appropriate values:
<database_name>
: The name of the target database in the Snowflake account.<schema_name>
: The name of the target schema in the database.<number-of-dimensions>
: The number of dimensions for any embeddings that you plan to use. This value must match the number of dimensions for any embeddings that areThe name of the column in the table that uniquely identifies each record (for example, RECORD_ID
).
To create a Snowflake source connector, see the following examples.
Replace the preceding placeholders as follows:
<name>
(required) - A unique name for this connector.<account>
(required): The target Snowflake account’s identifier.<role>
(required): The name of the Snowflake role that the user belongs to. This role must have the appropriate access to the target Snowflake warehouse, database, schema, and table.<user>
(required): The target Snowflake user’s login name (not their username).<password>
(required): The user’s password.<host>
(required): The hostname of the target Snowflake warehouse.<port>
(required): The warehouse’s port number. The default is 443
if not otherwise specified.<database>
(required): The name of the target Snowflake database.<schema>
(required): The name of the target Snowflake schema within the database.<table_name>
: The name of the target Snowflake table within the database’s schema. For the destination connector, the default is elements
if not otherwise specified.<columns>
(source connector only): A comma-separated list of columns to fetch from the table. By default, all columns are fetched unless otherwise specified.<id-column>
(required, source connector only): The name of the column that uniquely identifies each record in the table.<record-id-key>
(destination connector only): The name of the column that uniquely identifies each record in the table. The default is record_id
if not otherwise specified.<batch-size>
(required): The maximum number of rows to fetch for each batch. The default is 50
if not otherwise specified.If you’re new to Unstructured, read this note first.
Before you can create a source connector, you must first sign in to your Unstructured account:
After you sign in, the Unstructured user interface (UI) appears, which you use to get your Unstructured API key. To learn how, watch this 40-second how-to video.
After you create the source connector, add it along with a destination connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on Workflow Endpoint quickstart, go directly to the quickstart notebook, or watch the two 4-minute video tutorials for the Unstructured Python SDK.
You can also create source connectors with the Unstructured user interface (UI). Learn how.
If you need help, reach out to the community on Slack, or contact us directly.
You are now ready to start creating a source connector! Keep reading to learn how.
Ingest your files into Unstructured from Snowflake.
The requirements are as follows.
A Snowflake account and its account identifier.
To get the identifier for the current Snowflake account:
Alternatively, the following Snowflake query returns the current account’s identifier:
The Snowflake user’s login name (not its username) and its password in the account.
To view the login name for a user:
Alternatively, the following Snowflake query returns information about the user with the username of <my-user>
, including their login_name
value representing their login name:
The name of the Snowflake role that the user belongs to and that also has sufficient access to the Snowflake database, schema, table, and host.
CREATE DATABASE
privilege at the current account level; and USAGE
privilege on the warehouse that is used to create the database.USAGE
privilege on the database and the warehouse that is used to create the schema; and CREATE SCHEMA
on the database.USAGE
privilege on the database and schema and the warehouse that is used to create the table; and CREATE TABLE
on the schema.USAGE
privilege on the database and schema and the warehouse that is used to write to the table; and INSERT
on the table.USAGE
privilege on the database and schema and the warehouse that is used to write to the table; and SELECT
on the table.To view a list of available roles in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available roles in the current account:
The Snowflake warehouse’s hostname and its port number in the account.
To view a list of available warehouses in the current Snowflake account:
The following Snowflake query returns a list of available warehouse types, hostnames, and port numbers in the current account. Look for the row with a type
of SNOWFLAKE_DEPLOYMENT
:
The name of the Snowflake database in the account.
To view a list of available databases in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available databases in the current account:
The name of the schema in the database.
To view a list of available schemas for a database in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available schemas in the current account:
The following Snowflake query returns a list of available schemas for the database named <database_name>
in the current account:
The name of the table in the schema.
To view a list of available tables for a schema in a database in the current Snowflake account:
Alternatively, the following Snowflake query returns a list of available tables for the schema named <schema_name>
in the datbase named
<database_name>
in the current account:
Snowflake requires the target table to have a defined schema before Unstructured can write to the table. The recommended table
schema for Unstructured is as follows. In the following CREATE TABLE
statement, replace the following placeholders with the appropriate values:
<database_name>
: The name of the target database in the Snowflake account.<schema_name>
: The name of the target schema in the database.<number-of-dimensions>
: The number of dimensions for any embeddings that you plan to use. This value must match the number of dimensions for any embeddings that areThe name of the column in the table that uniquely identifies each record (for example, RECORD_ID
).
To create a Snowflake source connector, see the following examples.
Replace the preceding placeholders as follows:
<name>
(required) - A unique name for this connector.<account>
(required): The target Snowflake account’s identifier.<role>
(required): The name of the Snowflake role that the user belongs to. This role must have the appropriate access to the target Snowflake warehouse, database, schema, and table.<user>
(required): The target Snowflake user’s login name (not their username).<password>
(required): The user’s password.<host>
(required): The hostname of the target Snowflake warehouse.<port>
(required): The warehouse’s port number. The default is 443
if not otherwise specified.<database>
(required): The name of the target Snowflake database.<schema>
(required): The name of the target Snowflake schema within the database.<table_name>
: The name of the target Snowflake table within the database’s schema. For the destination connector, the default is elements
if not otherwise specified.<columns>
(source connector only): A comma-separated list of columns to fetch from the table. By default, all columns are fetched unless otherwise specified.<id-column>
(required, source connector only): The name of the column that uniquely identifies each record in the table.<record-id-key>
(destination connector only): The name of the column that uniquely identifies each record in the table. The default is record_id
if not otherwise specified.<batch-size>
(required): The maximum number of rows to fetch for each batch. The default is 50
if not otherwise specified.