Donated 104,390 rice grains using tesseract OCR (MARK I)

For each answer you get right, we donate 10 grains of rice through the World Food programme to help end hunger

That’s what guys at freerice.com say. So, I answered 10,439 questions correctly, ending up donating 104,390 rice grains. 😀 As I mentioned in one of my previous posts, I am working on OCR (Optical Character Recognition) to win a bet with my friend. I have completed building an OCR system and donated 104,390 rice grains on this website freerice.com in a single day under United Nations world food program. As I said, I built this just to win a bet. This whole post is one down and dirty way  to build a working OCR system. I know this isn’t efficient way to do this even. (that’s the reason for MARK I 😛 ) Here’s a screenshot of my score (aka rice grains donated) on that site-

I let the code to run for a whole night and this is what it did by morning. 104390 rice grains are donated under UN World Food Program.
I let the code to run for a whole night and this is what it did by morning. 104,390 rice grains are donated under UN World Food Program.

And a screenshot while the code is running:

Working fine but with 60% accuracy. (answered correctly 594 questions out of 1000, to be more specific)

So, lets dive into the juicy part, the code- The whole process is divided into small points.

  1. Take a screenshot while the required page, freerice.com in our case, is opened, crop it to get the area we are interested in to recognize characters i.e where the multiplication tables are. (Here in this project it is going to answer only multiplication type questions) Save the cropped image.
  2. Analyse that image with Tesseract OCR to get a text file of recognized characters as output. (eventually saving it as a txt file next to our main program, I mean the folder)
  3. Analyze that txt file and get the recognized information. (We are dealing with integers, so, convert a string txt file into integers)
  4. After analysing the information in txt file, we get to know the question (7×4 or something like that) and check for answer in options.
  5. If the answer matches any of the options, move the mouse onto specific region. (works for me. It basically click on that pixel value, where that option is on screen, which I found by trail and error on my laptop)
  6. If tesseract couldn’t find the correct answer, (answer we got by solving the first line  (7×4 for example) and answer by analysing the options) it randomly clicks on any of the four options just not to break the loop. (LOOP? where’s that? See next point)
  7. Embed everything from 1 to 6 points in a loop so, it does its work while you are sleeping. 😀

I have briefly commented what’s each line is contributing to the code, making it as a whole.

#Import required libraries. We need to download some, if you don't have tham.
import cv2
import os
import pyscreenshot as ImageGrab
import numpy as np
import time
from pymouse import PyMouse
import random

#defining a function rand.
def rand():
    m = PyMouse()
    #find a random int and put it into 'do'
    do= random.randint(1,4)
    #basic if, elif loop
    if do == 1:
    #clicking at point (395, 429). Here 1 implies a   left-click
    m.click(395, 429, 1)
    elif do== 2:
    m.click(395, 466, 1)
    elif do == 3:
    m.click(395, 505, 1)
    else:
    m.click(395, 544, 1)
    m.move(50,50)
    print("Rand")
    #wait for 1 sec, giving time to browser to refresh
    time.sleep(1)

trails= 0
#two forloops because, I am waiting for 5 secs after every 10 calculations just to make the system stable
for guns in range (0,1000):
  for buns in range(0,10):
  #Using try,catch to avoid any errors
    try:
      img= ImageGrab.grab() #taking a screenshot
      img.save('output.png')
      pic= cv2.imread('output.png')
      pic2= pic[360:570, 380:470] #cropping the pic, works in my case
      cv2.imwrite('output.png', pic2)
      u= 'convert output.png -resize 700 output.png'
      os.system(u) #writing to terminal (re-sizing the pic)
      s= 'tesseract output.png output'
      os.system(s) #writing to terminal (running Tesseract)
      f= open('output.txt', 'r')
      string= f.read().replace('\n', ' ')
      string= string.replace(' ', ' ')
      string= string.replace(' ', ' ')
      first= string[:string.find('x')] #finding first integer
      second= string[string.find('x')+1:string.find(' ')] #finding second integer
      pro= int(first)*int(second)
      print(pro)
      print(string)
      m= PyMouse()
      string= string[string.find(' ')+1:]
      a= int(string[:string.find(' ')])
      #print(a)
      #checking if product is equal to any of answers and clicking on that particular option
      if pro == a:
        m.click(395, 429, 1)
        m.move(50,50) #move cursor to any random point which is not in our area of interest, avoiding tesseract to think it as some character
        print("Pass")
        time.sleep(1)
      else:
        string= string[string.find(' ')+1:]
        b= int(string[:string.find(' ')])
        #print(b)
        if pro == b:
          m.click(395, 466, 1)
          m.move(50,50)
          print("Pass")
          time.sleep(1)
        else:
          string= string[string.find(' ')+1:]
          c= int(string[:string.find(' ')])
          #print(c)
          if pro == c:
            m.click(395, 505, 1)
            m.move(50,50)
            print("Pass")
            time.sleep(1)
          else:
            d= int(string[string.find(' ')+1:])
            #print(d)
            if pro == d:
              m.click(395, 544, 1)
              m.move(50,50)
              print("Pass")
              time.sleep(1)
            else:
              rand() #tesseract can't detect 100% accurately. So, tick any option randomly in case it didn't find correct answer
              #print("haha")
    except (ValueError, NameError, TypeError):
      rand() #tick randomly in case of any errors
  trails+= 10
  print("Total= " + str(trails))
  time.sleep(5) #waiting for 5secs after every 10 loops to make my system stable.

I’ll be back soon with MARK II of OCR system and next time I’ll not be using TesseractOCR. (Target accuracy- 90% (just a thought, though)) If you have any questions or some feedback, please feel free to add comments. I’d be happy to get some feedback from you.   So, happy donating. Any time

-SuryaTeja Cheedella

Car detection in MATLAB

myCarDetection

Hello guys, how’s it going

Today we are going to train a cascadeDetector, which returns an XML file. We can use that XML file  to detect objects, cars (only from side-view) in this case, in an image.

As we are going to use matlab, I assume you have matlab installed on your PC along with image processing and computer vision toolboxes. The whole post is of two steps:

  1. Train our cascade detector with all the data files.
  2. Use the output XML file to detect objects in a pic.

The following pic. says it all.

Overview
Overview of what we are going to do in here.

 

 

Before going into the topics, lets see what we are going to build:

Detected correctly
This is the final output we are going to get by the end.
1.  Training the cascade file. 

First things first, to train a cascade detector we need a dataSet. A dataSet contains a lot of positive and negative images for a specific object. So, download the image dataBase from here. You can see a lot of image files (.pgm) in folders testImages, trainImages. You can get an overview by reading the ‘README.txt’ file in that downloaded folder. In this part we are concentrating only on trainImages folder and in next part we get onto teatImages. Make new folders ‘trainImagesNeg’ and ‘trainImagesPos’ and remember the path. Copy&Paste or Cut&Paste the pictures in ‘trainImages’ folder to these new folders. (you may know all the negative pictures are named neg-n.pgm and positive pictures as pos-n.pgm if you read that ‘.txt’ file)

So, here is the line to train your data:

trainCascadeObjectDetector('carTraindata4.xml', mydata, negativeFolder);

So, what’s with those arguments? Where the heck are they initialized.

Here we go, the first argument, a xml file is going to be saved in our current directory, so that we can use it for detecting objects. You can name it as you wish, but don’t forget the extension ‘.xml’. Next argument is actually a struct in matlab, which is the data of all positive images. It contains two fields namely imageFilename and objectBoundingBoxes. Size of this struct would be 1x(no. of pos images), 1×550 in this case as we have 550 pos images. Have a look at this:

Struct-mydata
Screenshot of struct of positive images with objectBoundingBoxes field

In the first field, the path of all 550 pos images are entered and in the second field the bounding boxes of our image of interest. As we got this whole data of images from a dataSets site, rather than collecting from internet, we don’t need to take that huge task of manually putting that values of bounding boxes into second field. (Thank God) Those values in second field are like [x of top-left point, y of top-left, width, height]. All the pictures in the dataSet are of size (100,40), and are already cropped to the image of interest. So, we can just select the whole pic by giving arguments as [1, 1, 100, 40]. And add that folder trainImagesPos to matlab path by right-clicking on it and click addpath.

Okay, I see where this is going. You mean I should do this for 550 times? (as there are 550 pos images) 

It’s absolutely your wish or you could use this for loop after initializing the struct ‘mydata’- (code is self-explanatory)

mydata= struct('imageFilename', 'Just a random string', 'objectBoundingBoxes', 'Just a random string');
for i=0:549,
 mydata(i+1).imageFilename = strcat('trainImagesPos/pos-', num2str(i), '.pgm');
 mydata(i+1).objectBoundingBoxes = [1, 1, 100, 40]
end

 

Now, the whole thing with the second argument ‘mydata’ is closed. As the name suggests the third argument ‘negativeFolder’ is just a folder containing negative images. There is no need of bounding boxes for negative images. So, no need of thing like struct. Just assign the folder path to this variable named negativeFolder-

negativeFolder= fullfile('C:\Users\Surya Teja Cheedella\Documents\MATLAB\carDetection\carData\trainImagesNeg')

For a good training, there should be a large number of negative images. As the number of neg. images in the dataSet are relatively low, I copy&pasted a lot of my personal images into that trainImagesNeg folder (make sure they don’t have pics of cars in side-view).

You can learn more about this function trainCascadeObjectDetector here.

Now, run the code with all arguments initialized. It took around 40 mins. to complete 13 stages of training on my laptop and returned a xml file.

Stages? What do you mean by them? Where did they come from?

See THIS.

Stages while training
An overview of what it’s gonna do in various stages.

 

2.  Detecting objects in an image.

After successful training, we can use the xml file to detect objects (cars in this case) in a picture. These lines of code will do that for us:

%initialising the variable detector with the xml file
detector= vision.CascadeObjectDetector('carTraindata3.xml');
%reading an image file in current dir.
img= imread('sun.png');
%bounding box around detected object
box= step(detector, img);
%inserting that bounding box in given picture and showing it
figure,imshow(insertObjectAnnotation(img, 'rectangle', box,' '));

I have manually tested my trained xml file with all the pics in the testImages folder. It has an accuracy of 93% and out of 180 images these are the statistics:

  • False Positives- 10 (single object in 120 pics and double objects in remaining)
  • True Negatives- 9

Here is the code (just a for loop) to detect a large number of images and display them-

for j= 1:100,
 img= imread(strcat('test-', num2str(j-1), '.pgm'));
 bbox= step(detector, img);
 figure,imshow(insertObjectAnnotation(img, 'rectangle', bbox,' '));
 pause(0.5);
end

As usual my training has a small defect. You can understand by seeing the pic below 😛

Wrongly detected images

 

So, Happy Training!


Surya

Face Detection using openCV in Python

faceDetection

Hello folks, How’s it going!

Today I am going to introduce my face detection algorithm. (not a big one though) Don’t think that this is really a huge task! I am not working from scratch (means I am not actually gathering a huge data set of all pictures (both negative and positive, i.e having faces and not ) and train my algorithm). I have used haar Cascading files.

What the heck are they? Here we go.

Taking samples of a lot of image files of both types (having faces and not having faces) and train the algorithm, make it learn when ever (most probably every time 😛 ) it makes a mistake and store the whole data into xml files. In this case I am using haar Cascading xml files. As I already said they are huge, wanna know their size? 35k lines of code in each xml file! Yes, you read that right, 35K lines of random numbers. 😛

Coming to the juicy part, the code, here it is. I have commented briefly what each line is contributing.


#import numpy as np
import cv2
#import PIL

#import the xml files. here i am using frontal face and eye.
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

img= cv2.imread('5.jpg') #reading the image
gray_img= cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #converting into gratscale(algos work with grayscale images)

faces = face_cascade.detectMultiScale(gray_img, 1.3, 5) #detecting faces. lightning conditions may affect the output
print (faces) #just printing to console. this will print boundary points of detected face(s)

#searching for eyes only in faces. easy and efficient to search only in face rather than whole image
for (p,q,r,s) in faces:
 cv2.rectangle(img,(p,q),(p+r,q+s),(150,125,0),2)        #drawing a rectangle indicating face
 face_gray = gray_img[q:q+s, p:p+r] #cropping   face in gray image
 face_color = img[q:q+s, p:p+r] #cropping face in  color image
 eyes = eye_cascade.detectMultiScale(face_gray)  #searching for eyes in grayscale img
 for (ep,eq,er,es) in eyes:
  cv2.rectangle(face_color,(ep,eq),(ep+er,eq+es), (100,210,150),2) #for each eye drawing rectangle

cv2.imshow("output", img)
cv2.waitKey(0) #showing the img

#this only takes a image and shows the faces in the image. It dont modifies the image.
#if you want to save the resulted image use this...
#cv2.imwrite("output.jpg", img)

Wanna know what it did? I gave this image

Input image

and it returned THIS!…….

Output
Screenshot of the output

 

Currently, I am working on alphabet detection thing to won in a bet with my friend.

Bieeee  ^_^

Here somewhere in Milky Way

-Surya 

First ML Code on Gradient descent!

Hii there,

Hmm… my first program exceeding 20 lines.

Basically, it is a gradient descent problem (don’t know much about it 😛 ). As I am taking Machine Learning course on Cousera I wanna solve some problems on ML. I found some AI problems (don’t know much about this too) on HakerRank site and started solving this one. This guy is an output of my 5 hours of work. 😀

import java.util.Scanner;


public class houseCosts {
	public static void main(String[] args){
		System.out.println("Enter");
		Scanner in = new Scanner(System.in);
		int n = in.nextInt();
		int m = in.nextInt();
		//System.out.println(n+""+m);
		float[][] x = new float[n+2][m];
		float[] t = new float[n+1];                       
		float[] temp = new float[n+1];
		float alpha = (float) 0.3;                      //alpha
		//inputs
		for(int j = 0; j<m; j++){
			for(int i=1; i<n+2; i++){
				x[i][j] = in.nextFloat();
				//System.out.println("x "+i+" "+j+"= "+x[i][j]);
			}
		}
		
		int num = in.nextInt();
		//System.out.println(num);
		float[][] out = new float[n+1][num];
		for(int j = 0; j<num; j++){
			for(int i=1; i<n+1; i++){
				out[i][j] = in.nextFloat();
				//System.out.println("out "+i+" "+j+"= "+out[i][j]);
			}
		}
		for(int i=0; i<num; i++){
			out[0][i] = 1;
			//System.out.println("x "+0+" "+i+"= "+x[0][i]);
		}
		
		//Initializations 
		for(int i=0; i<m; i++){
			x[0][i] = 1;
			
			//System.out.println("x "+0+" "+i+"= "+x[0][i]);
		}
		for(int j=0; j<n+1; j++){
			t[j] = 0;                                  //theta value initializing
		}
		
		for(int p = 0; p<500; p++){                   //no. of times
			//body
			for(int k=0; k<n+1; k++){
				float dum = 0;
				for(int j=0; j<m; j++){
					float ans = 0;
					for(int i=0; i<n+1; i++){
						ans+= t[i] * x[i][j];
						//System.out.println(ans);
					}
					ans-= x[n+1][j];
					//System.out.println(ans);
					ans*= x[k][j];
					dum+=ans;
					//System.out.println("x "+k+" "+j+" ="+x[k][j]);
					//System.out.println(ans);
				}
				//System.out.println(dum);
				temp[k] = (float) (t[k]-(alpha * dum * (1.0/m)));
				//System.out.println(temp[k]);
			}
			for(int k=0; k<=n; k++){
				t[k]=temp[k];
				//System.out.print(t[k]+" ");
			}
			//System.out.println(" ");
		}
		
		for(int i = 0; i<num; i++){
			float foo = 0;
			for(int j=0; j<n+1; j++){
				foo+=out[j][i] * t[j];
			}
			System.out.println(foo);
		}
		
	}
}

Don’t judge me by this code coz, I don’t know much about algorithms and ML. And there are so many stdouts because I dunno how to debug in eclipse or any IDE for that mater. BTW, I got ten on ten for this problem.

Something Productive- CHECK!

Don’t forget to travel in time.

Cheerios

Surya