Solving Text Image Orientation Mysteries:

Gofaizanashraf
5 min readSep 16, 2024

--

There I was, a content coder, happily working on my latest project, living the dream of building an automated system to process ***** ***** using OCR. Everything was smooth sailing. My coffee was hot, my code was compiling, and AWS Textract was doing all the heavy lifting — or so I thought.

But then, without warning…..

My peaceful existence was rudely interrupted. My document processing pipeline was crumbling under the weight of a single, unavoidable enemy: image orientation issues. As an engineer, I can handle bugs, but this…this was personal. Textract, my trusted ally, was extracting the text but failing to tell me if the image was rotated upside-down or sideways.

My mission was clear…

Find a solution that Textract wouldn’t give me. I had to detect the orientation myself and correct it . The plan? Use the layout of bounding boxes — specifically the positions of the first and last characters of each word — to figure out how the text is rotated.

Think about it: If the first letter is to the left of the last, we’re good. But if the first letter is at the bottom or to the right? We’ve got a problem.

How the System Works

This system is designed to correct the orientation of text in an image. It combines the use of two technologies: Tesseract and AWS Textract. Here’s a step-by-step breakdown of the process:

Step-by-Step Process

Image Loading and Preparation
The image is loaded using OpenCV, and then converted to RGB format (as required by Tesseract for better accuracy). If the image fails to load, the system stops processing the image.

Step 1: Detecting Initial Orientation with Tesseract
Tesseract’s Orientation and Script Detection (OSD) feature is used to analyze the text in the image and identify how the image is rotated. Tesseract gives an angle (0°, 90°, 180°, or 270°), which tells us how much the image is rotated:

  • 0° (Upright): No rotation is needed. The text is already upright.
  • 90° Clockwise: The text is rotated 90° clockwise.
  • 180° (Upside Down): The text is upside down and needs to be rotated 180°.
  • 270° Counterclockwise: The text is rotated 270° counterclockwise.

Once the angle is determined, the system uses OpenCV’s rotation function to rotate the image so that it is properly aligned (upright). The corrected image is saved for further processing.

Step 2: Detecting and Extracting Text with AWS Textract
Next, the system sends the corrected image to AWS Textract. Textract identifies the words in the image and returns their bounding boxes.

What’s a Bounding Box?
A bounding box is an invisible rectangle around each word, defined by the coordinates of its corners. This is used to understand the position and size of each word in the image. Textract returns the X and Y coordinates of each word’s bounding box, specifically the first and last character of the word.

Step 3: Inferring the Orientation Using Bounding Boxes
Now, the system examines the X and Y coordinates of the first and last character of each word in the bounding boxes to infer the overall orientation of the text. Here’s how it works:

  • 0° (Upright): The first character of the word is on the left side, and the last character is on the right side.
  • 90° Clockwise: The first character is at the top, and the last character is at the bottom.
  • 180° (Upside Down): The first character is on the right, and the last character is on the left.
  • 270° Counterclockwise: The first character is at the bottom, and the last character is at the top.

The system compares the difference between the X and Y coordinates to determine how the text is oriented:

  • If the X-coordinate difference is larger (first character left, last right), the text is upright (0°).
  • If the Y-coordinate difference is larger (first character top, last bottom), the text is rotated (90° or 270°).

This analysis is done for each word, and the system keeps track of the majority orientation. This is because some words might appear differently (like vertical text in logos or headers), so the system uses majority voting to determine the most common orientation across all words.

Step 4: Final Image Rotation Based on AWS Textract
After determining the correct orientation using the bounding box data from Textract, the system rotates the image one last time if necessary. The final image is rotated using OpenCV to match the orientation identified by AWS Textract (0°, 90°, 180°, or 270°). The fully corrected image is saved as the final output.

After understanding this trick, I set out to write a script that would check the bounding box positions and correct the orientation before passing the image to Textract. With my trusty coding skills, I fixed the rotations, and voila! The system was back on track.

Now, I felt like a hero who had just defeated a boss-level bug.

Code

Feel free to explore the complete code Call to Action:
Code

If you’re looking for someone who can tackle real-world problems like these with a touch of cleverness, feel free to connect with me on LinkedIn

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Gofaizanashraf
Gofaizanashraf

No responses yet

Write a response