Image Recognition – The heart of sophisticated RPA

RPA platforms use IDs to identify the elements inside the target applications and then perform actions like mouse clicks or sending Hotkeys to get the desired end result. When we work with remote systems and databases, like Citrix or the open web, it is not easy for the robot to identify the selector regions on screen.

In case of Citrix automation/automation in virtual environments, there is no way to get the required information for automation such as the element ID. The information is available only as a pixel image and apparently what the bot sees is an image. Image recognition is the way forward when you are faced with this situation. It looks easy but it has its own challenges like colors might vary, resolution issues which can cause failure in image recognition etc.

How to overcome these Issues?

Is there a way to retrieve the desired text data from the applications?

The answer is Yes, OCR engine is the Hero for image based automations, which can magically recognize the text out of an image.

But does it ensure 100% accuracy in the output?

No, OCR engines currently available in the market are not 100% perfect. Hence, in some cases there are chances that our applications go for a toss if we consider the output given by OCR engines.

Is there a workaround?

Yes! The smartest and easiest way is to retrieve either Partial (in case of structured data having known what part of data needs to be fetched) or complete page of an application which can be extended to multiple pages depending upon the application from which data has to be fetched.

So, how do I fetch the data?

An Example:

My BOT has to open a remote desktop, login to a webpage and get me the data from the 3rd column of the first row, of a structured table, which is a part of the webpage.

First thing that anyone would think of is, find an image of the table name and fetch the relative positional data by specifying the co-ordinates to be fetched. That’s quite simple!

But here comes the challenging part!

The cells of this table are not of fixed co-ordinates. The cell shrinks when there is less or no data and expands to wrap-up the data when the data in the cell is big. So, the row size as well as the column size changes and my relative positional fetching of data will fail miserably!


The simple yet smart approach to solve this issue is –

  • Open the webpage, copy the whole content of the webpage by sending the hotkeys ‘ctrl + c’
  • Open an excel (since the data is in a table, excel is the best text editor in this scenario) and paste the whole data by sending the hotkeys ‘ctrl + v’
  • Now, in the excel sheet find your table from which you want to get the first row by specifying keywords such as table name or any other text relative to which you can get your row
    • In the excel sheet, make sure the cursor is in the first cell so that there won’t be issues in finding the specified element. (Send hotkeys ‘ctrl + g’ and write ‘A1’ and press Enter key)
    • Send hotkeys ‘ctrl + f’ to open the ‘Find and Replace’ dialog of excel sheet. Make sure you are in the ‘Find’ tab. Enter the text that you want to find and press Enter key
    • Based on the found element, by using up, down, left and right key strokes find out the third column of first row from which you have to retrieve data
  • Copy the found value by sending hotkeys ‘ctrl + c’
  • RPA tools will have clipboard activity to which you can copy your data and use it in your application
  • If you do not have clipboard activities, paste the copied data into a document or excel sheet, and read the document to fetch the data.
Where can I use this method to extract data?
  1. Extracting data from webpages
  2. Text editors like Word, PDF and Excel where steps can be made simpler by eliminating the copying to excel part
  3. Finding table data from a PDF in a simpler way by copying the data into excel which makes the trick of finding the table instead of fetching whose PDF and using string manipulation to find the table data
Where does this solution not work?

The scanned PDFs are treated as data that cannot be readable. The solution works in all the applications where copying the data is allowed.

Image recognition, when paired with other RPA features can expand the capabilities of business process automation. If you are looking for an automation tool which works for your process, iNatrix can assess your processes, work out an automation score and the overall predicted ROI.

The following two tabs change content below.


Vinutha is an RPA developer who is fascinated by BOTs. For her, Building a BOT and getting the job done by it is a great feeling in itself! A long drive with loud music refills her brain with Joy. A day at the beach restores her soul.

Latest posts by Vinutha (see all)


Vinutha is an RPA developer who is fascinated by BOTs. For her, Building a BOT and getting the job done by it is a great feeling in itself! A long drive with loud music refills her brain with Joy. A day at the beach restores her soul.

All stories by: Vinutha