You want to specify the type of agent that you use when performing optical character recognition (OCR) on files, such as images and PDFs.
Set the environment variable named OCR_AGENT
to one of the following supported values:
unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract
to use Tesseract OCR. This is the default if not otherwise specified.unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle
to use Paddle OCR.unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision
to use Google Cloud Vision OCR.Also, be sure to install the corresponding OCR agent and its dependencies, if you have not already done so:
This example uses a PNG file with an embedded combination of English and Korean text. This example uses Tesseract OCR.
Language codes will differ depending on the OCR agent you use:
You want to specify the type of agent that you use when performing optical character recognition (OCR) on files, such as images and PDFs.
Set the environment variable named OCR_AGENT
to one of the following supported values:
unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract
to use Tesseract OCR. This is the default if not otherwise specified.unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle
to use Paddle OCR.unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision
to use Google Cloud Vision OCR.Also, be sure to install the corresponding OCR agent and its dependencies, if you have not already done so:
This example uses a PNG file with an embedded combination of English and Korean text. This example uses Tesseract OCR.
Language codes will differ depending on the OCR agent you use: