Donated 104,390 rice grains using tesseract OCR (MARK I)

For each answer you get right, we donate 10 grains of rice through the World Food programme to help end hunger

That’s what guys at say. So, I answered 10,439 questions correctly, ending up donating 104,390 rice grains. 😀 As I mentioned in one of my previous posts, I am working on OCR (Optical Character Recognition) to win a bet with my friend. I have completed building an OCR system and donated 104,390 rice grains on this website in a single day under United Nations world food program. As I said, I built this just to win a bet. This whole post is one down and dirty way  to build a working OCR system. I know this isn’t efficient way to do this even. (that’s the reason for MARK I 😛 ) Here’s a screenshot of my score (aka rice grains donated) on that site-

I let the code to run for a whole night and this is what it did by morning. 104390 rice grains are donated under UN World Food Program.
I let the code to run for a whole night and this is what it did by morning. 104,390 rice grains are donated under UN World Food Program.

And a screenshot while the code is running:

Working fine but with 60% accuracy. (answered correctly 594 questions out of 1000, to be more specific)

So, lets dive into the juicy part, the code- The whole process is divided into small points.

  1. Take a screenshot while the required page, in our case, is opened, crop it to get the area we are interested in to recognize characters i.e where the multiplication tables are. (Here in this project it is going to answer only multiplication type questions) Save the cropped image.
  2. Analyse that image with Tesseract OCR to get a text file of recognized characters as output. (eventually saving it as a txt file next to our main program, I mean the folder)
  3. Analyze that txt file and get the recognized information. (We are dealing with integers, so, convert a string txt file into integers)
  4. After analysing the information in txt file, we get to know the question (7×4 or something like that) and check for answer in options.
  5. If the answer matches any of the options, move the mouse onto specific region. (works for me. It basically click on that pixel value, where that option is on screen, which I found by trail and error on my laptop)
  6. If tesseract couldn’t find the correct answer, (answer we got by solving the first line  (7×4 for example) and answer by analysing the options) it randomly clicks on any of the four options just not to break the loop. (LOOP? where’s that? See next point)
  7. Embed everything from 1 to 6 points in a loop so, it does its work while you are sleeping. 😀

I have briefly commented what’s each line is contributing to the code, making it as a whole.

#Import required libraries. We need to download some, if you don't have tham.
import cv2
import os
import pyscreenshot as ImageGrab
import numpy as np
import time
from pymouse import PyMouse
import random

#defining a function rand.
def rand():
    m = PyMouse()
    #find a random int and put it into 'do'
    do= random.randint(1,4)
    #basic if, elif loop
    if do == 1:
    #clicking at point (395, 429). Here 1 implies a   left-click, 429, 1)
    elif do== 2:, 466, 1)
    elif do == 3:, 505, 1)
    else:, 544, 1)
    #wait for 1 sec, giving time to browser to refresh

trails= 0
#two forloops because, I am waiting for 5 secs after every 10 calculations just to make the system stable
for guns in range (0,1000):
  for buns in range(0,10):
  #Using try,catch to avoid any errors
      img= ImageGrab.grab() #taking a screenshot'output.png')
      pic= cv2.imread('output.png')
      pic2= pic[360:570, 380:470] #cropping the pic, works in my case
      cv2.imwrite('output.png', pic2)
      u= 'convert output.png -resize 700 output.png'
      os.system(u) #writing to terminal (re-sizing the pic)
      s= 'tesseract output.png output'
      os.system(s) #writing to terminal (running Tesseract)
      f= open('output.txt', 'r')
      string='\n', ' ')
      string= string.replace(' ', ' ')
      string= string.replace(' ', ' ')
      first= string[:string.find('x')] #finding first integer
      second= string[string.find('x')+1:string.find(' ')] #finding second integer
      pro= int(first)*int(second)
      m= PyMouse()
      string= string[string.find(' ')+1:]
      a= int(string[:string.find(' ')])
      #checking if product is equal to any of answers and clicking on that particular option
      if pro == a:, 429, 1)
        m.move(50,50) #move cursor to any random point which is not in our area of interest, avoiding tesseract to think it as some character
        string= string[string.find(' ')+1:]
        b= int(string[:string.find(' ')])
        if pro == b:
, 466, 1)
          string= string[string.find(' ')+1:]
          c= int(string[:string.find(' ')])
          if pro == c:
  , 505, 1)
            d= int(string[string.find(' ')+1:])
            if pro == d:
    , 544, 1)
              rand() #tesseract can't detect 100% accurately. So, tick any option randomly in case it didn't find correct answer
    except (ValueError, NameError, TypeError):
      rand() #tick randomly in case of any errors
  trails+= 10
  print("Total= " + str(trails))
  time.sleep(5) #waiting for 5secs after every 10 loops to make my system stable.

I’ll be back soon with MARK II of OCR system and next time I’ll not be using TesseractOCR. (Target accuracy- 90% (just a thought, though)) If you have any questions or some feedback, please feel free to add comments. I’d be happy to get some feedback from you.   So, happy donating. Any time

-SuryaTeja Cheedella