Introduction
Have you ever thought how your computer can read text from images? It is all thanks to something called Optical Character Recognition, or OCR. In Python, there are some cool libraries that help your computer understand text in pictures. From Google’s powerful Tesseract to EasyOCR’s fancy deep learning, these libraries can do some pretty amazing stuff. Let’s take a peek into OCR libraries in Python and see how these libraries turn images into readable text!
1. EasyOCR
EasyOCR simplifies text extraction from images in Python with its user-friendly approach and deep learning-powered model. It supports multiple languages, making it versatile for international applications. Whether it’s printed or handwritten, horizontally or vertically aligned, EasyOCR adeptly handles various text styles and orientations. Its efficiency ensures swift performance, ideal for real-time applications. EasyOCR’s open-source nature facilitates user modification and contribution. It enables users to tailor it to their own requirements. It also provides a dependable and easy-to-use method for removing text from photos, optimizing document processing, creating apps, and improving accessibility.
Steps to Install and Implement EasyOCR
Step1: Install Python
First, ensure you have Python installed on your system. You can download it from the official Python website and follow the installation instructions.
Step2: Install EasyOCR
Once Python is installed, open your command line or terminal and run the following command to install EasyOCR using pip.
pip install easyocr
Step3: Install Dependencies
EasyOCR has a few dependencies that need to be installed. Don’t worry; pip will take care of these for you automatically.
Step4: Usage of EasyOCR
Now that EasyOCR is installed, your Python scripts can use it. This is a basic illustration of how to extract text from an image using EasyOCR.
import easyocr
# Create an OCR reader object
reader = easyocr.Reader(['en'])
# Read text from an image
result = reader.readtext('image.jpg')
# Print the extracted text
for detection in result:
print(detection[1])
You may now easily extract text from photos in your Python programs if you have EasyOCR installed. EasyOCR makes text extraction simple, whether you’re improving accessibility or automating data entry.
2. Doctr
Doctr is a Python library for document understanding and processing, specifically designed for machine learning and natural language processing applications. It aids in tasks like document layout analysis, text extraction, and semantic understanding. Doctr identifies text regions, images, and tables within documents, and offers optical character recognition for text extraction from various formats. It also aims to understand semantic meaning through named entity recognition and sentiment analysis. Doctr is scalable, efficient, and suitable for large document volumes in production environments. It encourages community contributions and offers an extensible architecture for custom components.
Steps to Install and Implement Doctr
Step1: Install Doctr
You can install Doctr using pip, Python’s package manager. Open your command line or terminal and run the following command.
pip install doctr
Step2: Import the necessary modules
In your Python script or notebook, import the Doctr modules you’ll need for your document processing tasks. For example:
from doctr.models import ocr_predictor
Step3: Load a document
Depending on your use case, load the document you want to process. Doctr supports various document formats, including PDFs and images.
Step4: Perform document understanding tasks
Use Doctr’s functionalities to perform tasks such as document layout analysis, text extraction, and semantic understanding. For example, you can use the OCR predictor to extract text from an image:
# Load an image
image_path="example_image.jpg"
# Create an OCR predictor
predictor = ocr_predictor.create_predictor()
# Perform OCR on the image
result = predictor(image_path)
# Print the extracted text
print(result)
Step5: Integrate with your workflow
Once you’re satisfied with your implementation, integrate Doctr into your workflow or application to automate document processing tasks and streamline your workflow.
These steps will help you install and use Doctr in your Python environment. So that you can quickly and effectively complete activities to document understanding and processing.
3. Keras-OCR
Keras-OCR is a Python library that simplifies OCR tasks through the Keras and TensorFlow frameworks. It offers pre-trained models with high accuracy across various text and font styles. Its user-friendly API allows for easy implementation. Keras-OCR offers flexibility in configuration, allowing for customization of parameters like input image size and target language. Its open-source nature fosters a collaborative environment, enhancing productivity and integrating OCR capabilities into Python applications.
Steps to Install and Implement Keras-OCR
To implement Keras-OCR for text recognition in Python, follow these steps:
Step1: Install Keras-OCR
Use pip to install the Keras-OCR library in your Python environment.
pip install keras-ocr
Step2: Import Necessary Modules
In your Python script or notebook, import the required modules from Keras-OCR.
import keras_ocr
Step3: Load Pre-Trained Model
Keras-OCR provides pre-trained models for text recognition. You can load one of these models using the pipeline.Pipeline() function.
pipeline = keras_ocr.pipeline.Pipeline()
Step4: Perform Text Recognition
Use the loaded pipeline to perform text recognition on images. You can pass a single image or a list of images to the recognize() function.
images = ['image1.jpg', 'image2.jpg'] # List of image file paths
predictions = pipeline.recognize(images)
This will return predictions for each image, containing information about the detected text regions and the recognized text.
Step5: Display Results
You can then iterate through the predictions to display the recognized text and visualize the text regions:
for image, prediction in zip(images, predictions):
keras_ocr.tools.drawAnnotations(image=image, predictions=prediction)
Step6: Integration
Finally, integrate the text recognition functionality into your Python application or workflow as needed.
You can easily implement Keras-OCR for text recognition in your Python projects by following these steps. These enable you to extract text from images with high accuracy and efficiency.
4. Tesseract
Tesseract is an open-source OCR engine maintained by Google. It is known for its exceptional accuracy in deciphering text from images. It supports over 100 languages and can handle various image types, including scanned documents and photographs. Users can customize parameters like page segmentation mode and language models to optimize recognition accuracy. Tesseract encourages community contributions and is easily integrated with Python, providing a straightforward interface for developers to incorporate OCR capabilities into their applications.
Steps to Install and Implement Tesseract
Installing the pytesseract library is required in order to use Tesseract OCR in Python. The Tesseract engine is encapsulated in this library. Here’s a detailed of steps:
Step1: Install Tesseract
First, you need to install the Tesseract OCR engine on your system. You can download and install it from here.
Step2: Install pytesseract
Next, install the pytesseract
library using pip:
pip install pytesseract
Step3: Import pytesseract
Import the pytesseract module in your Python script or notebook:
import pytesseract
Step4: Set Tesseract Path (Optional)
The pytesseract.pytesseract.tesseract_cmd variable must be used to define the location of Tesseract if it is not installed in the default system path:
pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract'
Step5: Perform OCR
Use the image_to_string() function to perform OCR on an image. Pass the image file path as an argument:
# Perform OCR on an image
text = pytesseract.image_to_string('image.jpg')
This will extract text from the image and store it in the text
variable.
Step6: Display Results
You can then print or manipulate the extracted text as needed:
print(text)
You can quickly integrate Tesseract OCR to extract text from photos in your Python environment by following these instructions. Remember that Tesseract’s accuracy can change based on a number of variables, including language, text complexity, and image quality. For particular use situations, modifying the parameters and preparing the photos could assist increase OCR accuracy.
5. GOCR
GOCR is an open-source OCR engine that was created under the GNU General Public License that allows users to extract text from photographs on a range of platforms. This includes some basic text recognition features and is compatible with numerous systems. But it is mostly focused on English and does not support other languages. Its efficacy for some applications may be limited in comparison to more contemporary options due to its lack of active development and restricted linguistic support.
Steps to Install and Implement GOCR
Installing the GOCR program and utilizing its command-line interface (CLI) to perform optical character recognition on photos are the first steps in implementing GOCR. Here is a general how-to implementation:
Step1: Install GOCR
Depending on your operating system, you may be able to install GOCR using package managers like apt on Ubuntu or Homebrew on macOS. Alternatively, you can download the source code and compile it manually.
Step2: Prepare Images
Prepare the images containing the text you want to recognize. Ensure that the images are clear and of sufficient quality for accurate OCR.
Step3: Run the library from the Command Line
Use the GOCR command-line interface to perform OCR on your images. Here’s a basic command to run GOCR on an image file named “image.jpg”.
gocr image.jpg
It will process the image and output the recognized text to the terminal.
Step4: Process Output
Once GOCR has finished processing the image, you can capture the output text from the terminal and use it in your application as needed.
Keep in mind that this library may have limitations compared to more modern OCR engines in terms of accuracy, language support, and ease of use. If GOCR isn’t up to par, it’s critical to assess your demands and take other OCR options into account.
6. Pytesseract
A Python wrapper called Pytesseract allows Tesseract-OCR Engine from Google to be integrated into Python programs. It offers an efficient method for optical character recognition. Because of its intuitive interface, users may extract text from photographs with little to no coding knowledge. Pytesseract supports a variety of languages, including English, French, Spanish, and German, and is compatible with the Windows, macOS, and Linux operating systems. Text in different fonts, sizes, and styles can be processed using it. OCR parameters can be adjusted by developers to maximize accuracy. Additionally, Pytesseract interfaces with the Python Imaging Library Pillow, enabling preprocessing before to OCR processes.
Steps to Install and Implement Pytesseract
Installing the pytesseract library and using it to perform optical character recognition (OCR) on photographs is the implementation of pytesseract. Here’s how to use Pytesseract in Python, step-by-step:
Step1: Install Tesseract
Before using pytesseract, you need to install the Tesseract OCR engine on your system. You can download and install it from here.
Step2: Install pytesseract
Next, install the pytesseract library using pip:
pip install pytesseract
Step3: Import pytesseract
Import the pytesseract module in your Python script or notebook:
import pytesseract
Step4: Perform OCR on an Image
Use the image_to_string()
function from pytesseract to perform OCR on an image. Pass the image file path as an argument:
# Perform OCR on an image
text = pytesseract.image_to_string('image.jpg')
This will extract text from the image and store it in the text
variable.
Step5: Optional Configuration
You can configure pytesseract to use specific OCR parameters, such as language and page segmentation mode. For example:
# Set language (default is English)
pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract'
tessdata_dir_config = '--tessdata-dir "/usr/share/tesseract-ocr/4.00/tessdata"'
text = pytesseract.image_to_string('image.jpg', config=tessdata_dir_config)
Step6: Display Results
Finally, you can print or manipulate the extracted text as needed:
print(text)
These steps will help you quickly integrate Pytesseract into your Python environment so that you can use OCR to extract text from photos. Remember that a number of variables, like language, text complexity, and image quality, can affect how accurate OCR is. For particular use situations, modifying the parameters and preparing the photos could assist increase OCR accuracy.
7. OpenCV
OpenCV, created by Intel and kept up to date by a global developer community. It is an essential tool for computer vision and machine learning. For a variety of uses, such as image processing, object detection, face recognition, augmented reality, and robotics. It provides an extensive range of features and techniques. OpenCV’s Python interface facilitates quick development and prototyping, and its cross-platform compatibility guarantees accessible across multiple systems. OpenCV is a foundational library in computer vision that is seamlessly integrated with other Python libraries such as NumPy, SciPy, and TensorFlow. This allows developers to design inventive applications across a range of domains.
Steps to Install and Implement OpenCV
Installing the library and utilizing its features to carry out different computer vision tasks constitutes the implementation of OpenCV. Here is a simple illustration of how to process images using OpenCV in Python:
Step1: Install OpenCV
Use pip to install the OpenCV library in your Python environment.
pip install opencv-python
Step2: Import OpenCV
Import the OpenCV library in your Python script or notebook:
import cv2
Step3: Read an Image
Use the cv2.imread()
function to read an image from a file:
# Read an image from file
image = cv2.imread('image.jpg')
Step4: Display the Image
Use the cv2.imshow()
function to display the image in a window:
# Display the image in a window
cv2.imshow('Image', image)
Step5: Wait for User Input
Use the cv2.waitKey() function to wait for a key press to close the window:
# Wait for a key press and close the window
cv2.waitKey(0)
cv2.destroyAllWindows()
Step6: Perform Image Processing (Optional)
You can use various OpenCV functions to perform image processing tasks, such as resizing, cropping, filtering, and more:
# Resize the image
resized_image = cv2.resize(image, (width, height))
# Convert the image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply Gaussian blur to the image
blurred_image = cv2.GaussianBlur(image, (5, 5), 0)
Step7: Save the Processed Image (Optional)
Use the cv2.imwrite()
function to save the processed image to a file:
# Save the processed image to file
cv2.imwrite('processed_image.jpg', processed_image)
By following these steps, you can easily implement OpenCV in your Python environment to perform various image processing tasks. OpenCV offers a wide range of functions and capabilities, allowing you to manipulate images, detect objects, track motion, and much more. Experimenting with different functions and parameters will help you explore the full potential of OpenCV for your computer vision applications.
Amazon Textract is a machine learning service by Amazon Web Services (AWS) that efficiently extracts text and data from documents. It uses advanced algorithms to identify and analyze structured data, including text, tables, and forms. It is particularly useful for financial reports and invoices. Textract automates key-value pair extraction and form data extraction, streamlining data entry and processing workflows. It also offers advanced document analysis functionalities. Amazon Textract is integrated with other AWS services, ensuring scalability, high performance, and reliability. It also provides a secure environment for document processing across various sectors, including finance, healthcare, legal, and government.
Steps to Install and Implement Amazon Textract
Implementing Amazon Textract involves using the AWS SDK to interact with the Textract API. Here’s a high-level overview of the steps to implement Amazon Textract in Python:
Step1: Set Up AWS Credentials
Ensure you have AWS credentials configured with appropriate permissions to access the Textract service.
Step2: Install the AWS SDK
Install the AWS SDK for Python (Boto3) using pip:
pip install boto3
Create a Textract client object using the Boto3 library and your AWS credentials:
import boto3
# Initialize Textract client
textract_client = boto3.client('textract', region_name="your-region", aws_access_key_id='your-access-key-id', aws_secret_access_key='your-secret-access-key')
Step4: Process Documents
Use the analyze_document()
method of the Textract client to analyze documents and extract text and data:
# Process document
response = textract_client.analyze_document(Document={'S3Object': {'Bucket': 'your-bucket-name', 'Name': 'your-document-key'}}, FeatureTypes=['TABLES', 'FORMS'])
This will return a response containing extracted text, tables, and forms from the document.
Extracted text, tables, and forms can be accessed from the response object and further processed as needed:
# Extract text
extracted_text = response['Blocks']
# Extract tables
extracted_tables = [block for block in extracted_text if block['BlockType'] == 'TABLE']
# Extract forms
extracted_forms = [block for block in extracted_text if block['BlockType'] == 'KEY_VALUE_SET']
Step6: Handle Errors and Exceptions
Implement error handling to gracefully handle exceptions and errors that may occur during document processing:
try:
response = textract_client.analyze_document(Document={'S3Object': {'Bucket': 'your-bucket-name', 'Name': 'your-document-key'}}, FeatureTypes=['TABLES', 'FORMS'])
except Exception as e:
print(f'Error processing document: {e}')
Step7: Further Processing and Integration
Depending on your application requirements, you may need to further process the extracted text, tables, and forms, and integrate them into your workflow or application.
By following these steps, you can implement Amazon Textract in your Python application to extract text and data from documents stored in Amazon S3. Make sure to refer to the AWS documentation for detailed information on the Textract API and its usage.
Conclusion
Optical character recognition (OCR) has revolutionized computer text understanding, enabling various applications. Python offers eight top OCR libraries, each with unique features. EasyOCR is user-friendly, Tesseract is accurate, and Amazon Textract is efficient. OCR libraries cater to diverse needs and use cases, automating tasks, streamlining workflows, and extracting valuable insights from unstructured data. With advancements in machine learning and computer vision, the future of OCR holds promising prospects for innovation and enhancement.
By Analytics Vidhya, April 24, 2024.