Object Classification on aYahoo image dataset

1. You need to have Caffe installed on your system to run the below tasks. If not follow the tutorial here to get Caffe up and running on your system.
2. You need to download aYahoo image dataset here.
3. You need to get all the pre-trained models: alexnet.caffemodel, coffenet.caffemodel and googlenet.caffemodel to get started on extracting features from images.

We will start by getting the paths to all the downloaded images into a text file. Run this line in terminal with correct path to images’ folder and path to text file in which we need to append all the paths to images.

find `pwd`/examples/images/ayahoo -type f -exec echo {} \; > examples/tempFeaExtr/temp.txt

Now we need to label the images. I am assigning labels starting from 0. It is recommended to sort all the names in text file, so we can divide the whole dataset into train and test datasets with reasonable number of images of each class in both datasets rather than randomly the whole data.

outfile= open('zz_temp.txt', 'w')
infile= open('z_temp.txt', 'r')
myset= []

for line in infile:
 a= line[(len(line)-(line[::-1].index('/'))):-1]
 a= a[:a.index('_')]

 if a== "bag":
 b= "0"
 elif a== "building":
 b= "1"
 elif a== "carriage":
 b= "2"
 elif a== "centaur":
 b= "3"
 elif a== "donkey":
 b= "4"
 elif a== "goat":
 b= "5"
 elif a== "jetski":
 b= "6"
 elif a== "monkey":
 b= "7"
 elif a== "mug":
 b= "8"
 elif a== "statue":
 b= "9"
 elif a== "wolf":
 b= "10"
 elif a== "zebra":
 b= "11"

 outfile.write(line[:-1]+ " "+ b+ "\n")


infile= open('zz_temp.txt', 'r')
outfile= open('z_temp.txt', 'w')
myset= []

for line in infile:

for a in myset:


As we have labeled and sorted the whole database, now, we divide it into two sets namely train_data.txt and test_data.txt. Try to keep the ratio of “no. of train images/ no. of test images” constant for all the classes. Now, we have train_data and test_data to proceed on to next step.

I am going to extract features of every image using the pre-trained models. After extracting I am gonna train SVM classifiers to predict the classes of test_data.

Link to the code to extract features and to train SVMs id here on GitHub.

Happy classifying images.

Install Caffe on Ubuntu 14.04

1. As my machine has no GPU hardware, I am going to install Caffe without CUDA support.
2. I am going to install only Python wrapper for Caffe.

As the installation page on Caffe has no detailed instructions to install all the dependencies required and get your system ready to run CNNs , I am writing this small tutorial to set Caffe Up and Running on your machine.

Firstly, we will get stared by downloading latest release of Caffe from github. (You need to have Git installed on your machine)

git clone https://github.com/BVLC/caffe.git
#add these following lines in the end of your .bashrc file
export PYTHONPATH=$PYTHONPATH:$HOME/caffe/python
export PYTHONPATH=$PYTHONPATH:$HOME/caffe/python/caffe

It is advised to use Anaconda Python distribution as it installs most of the requirements for python wrapper around Caffe (though it install a lot of packages we have no use). But, installing Anaconda is up to you.
Download Anaconda and run shell script. Add the path of bin to #PATH in file ~/.bashrc.

bash Anaconda-2.1.0-Linux-x86_64.sh
#add the following line at the end to your .bashrc
export PATH=/home/suryateja/anaconda/bin:$PATH

Get gflags and install on your system.

wget https://github.com/schuhschuh/gflags/archive/master.zip
unzip master.zip
cd gflags-master
mkdir build && cd build
export CXXFLAGS="-fPIC" && cmake .. && make VERBOSE=1</pre>
sudo make install

Download google-glog and install. You can also install through apt-get. And install snappy.

tar -xvzf google-glog_0.3.3.orig.tar.gz
cd glog-0.3.3/
sudo make install
agi libgoogle-glog-dev
agi libsnappy-dev

Get leveldb onto your machine and lmdb with its dependencies.

git clone https://code.google.com/p/leveldb/
cd leveldb
cp --preserve=links libleveldb.* /usr/local/lib
sudo cp --preserve=links libleveldb.* /usr/local/lib
sudo cp -r include/leveldb /usr/local/include/
sudo ldconfig
sudo apt-get install libffi-dev python-dev build-essential
sudo apt-get install liblmdb-dev

Download Atlas and Lapack (directly downloads when you click on link!). But before installing, you need to modify your CPU frequency/scaling. You you get errors saving the file, try to use different editor. I have tried sublime, gedit. Finally with I can modify with emacs. Change the scaling for all of your CPU cores to replace the single word in the file with ‘performance’.

gksu emacs /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
gksu emacs /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
gksu emacs /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
gksu emacs /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor

bunzip2 -c atlas3.10.2.tar.bz2 | tar xfm -
mv ATLAS ATLAS3.10.2
cd ATLAS3.10.2
mkdir Linux_C2D64SSE3
cd Linux_C2D64SSE3
../configure -b 64 -D c -DPentiumCPS=2400 --prefix=/home/suryateja/lib/atlas --with-netlib-lapack-tarfile=/home/suryateja/Downloads/lapack-3.5.0.tgz

make build
make check
make ptcheck
make time
sudo make install

Install dependencies for python wrapper if you didn’t choose to install Anaconda Python distrubution. The dependencies file is in caffe/python. And install hdf5 via apt-get.

cd python/
for req in $(cat requirements.txt); do sudo pip install $req; done
sudo apt-get install libhdf5-serial-dev

Download latest 3.0.0-beta version (file automatically downloads if you click) of OpenCV and build it. But before install all the dependencies of OpenCV.

sudo apt-get install build-essential
sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev

cd Downloads
cd opencv-3.0.0-beta/
mkdir release
cd release
sudo make install

Now, we are almost done with getting all the dependencies of Caffe. So, lets dive into getting Caffe up and running on our machines. Modify the makefile as your needs as shown in installation page.

cd caffe
cp Makefile.config.example Makefile.config
make all
make test
make runtest

If you have errors while installing even followed this tutorial, you can search for any known issues on their issues page. One of the errors I faced is this. And I am able to solve it as suggested in the comments section of issue page.

At last, now, you have Caffe installed and running on your machine.
Happy training Deep Neural Networks. 😀

Donated 104,390 rice grains using tesseract OCR (MARK I)

For each answer you get right, we donate 10 grains of rice through the World Food programme to help end hunger

That’s what guys at freerice.com say. So, I answered 10,439 questions correctly, ending up donating 104,390 rice grains. 😀 As I mentioned in one of my previous posts, I am working on OCR (Optical Character Recognition) to win a bet with my friend. I have completed building an OCR system and donated 104,390 rice grains on this website freerice.com in a single day under United Nations world food program. As I said, I built this just to win a bet. This whole post is one down and dirty way  to build a working OCR system. I know this isn’t efficient way to do this even. (that’s the reason for MARK I 😛 ) Here’s a screenshot of my score (aka rice grains donated) on that site-

I let the code to run for a whole night and this is what it did by morning. 104390 rice grains are donated under UN World Food Program.
I let the code to run for a whole night and this is what it did by morning. 104,390 rice grains are donated under UN World Food Program.

And a screenshot while the code is running:

Working fine but with 60% accuracy. (answered correctly 594 questions out of 1000, to be more specific)

So, lets dive into the juicy part, the code- The whole process is divided into small points.

  1. Take a screenshot while the required page, freerice.com in our case, is opened, crop it to get the area we are interested in to recognize characters i.e where the multiplication tables are. (Here in this project it is going to answer only multiplication type questions) Save the cropped image.
  2. Analyse that image with Tesseract OCR to get a text file of recognized characters as output. (eventually saving it as a txt file next to our main program, I mean the folder)
  3. Analyze that txt file and get the recognized information. (We are dealing with integers, so, convert a string txt file into integers)
  4. After analysing the information in txt file, we get to know the question (7×4 or something like that) and check for answer in options.
  5. If the answer matches any of the options, move the mouse onto specific region. (works for me. It basically click on that pixel value, where that option is on screen, which I found by trail and error on my laptop)
  6. If tesseract couldn’t find the correct answer, (answer we got by solving the first line  (7×4 for example) and answer by analysing the options) it randomly clicks on any of the four options just not to break the loop. (LOOP? where’s that? See next point)
  7. Embed everything from 1 to 6 points in a loop so, it does its work while you are sleeping. 😀

I have briefly commented what’s each line is contributing to the code, making it as a whole.

#Import required libraries. We need to download some, if you don't have tham.
import cv2
import os
import pyscreenshot as ImageGrab
import numpy as np
import time
from pymouse import PyMouse
import random

#defining a function rand.
def rand():
    m = PyMouse()
    #find a random int and put it into 'do'
    do= random.randint(1,4)
    #basic if, elif loop
    if do == 1:
    #clicking at point (395, 429). Here 1 implies a   left-click
    m.click(395, 429, 1)
    elif do== 2:
    m.click(395, 466, 1)
    elif do == 3:
    m.click(395, 505, 1)
    m.click(395, 544, 1)
    #wait for 1 sec, giving time to browser to refresh

trails= 0
#two forloops because, I am waiting for 5 secs after every 10 calculations just to make the system stable
for guns in range (0,1000):
  for buns in range(0,10):
  #Using try,catch to avoid any errors
      img= ImageGrab.grab() #taking a screenshot
      pic= cv2.imread('output.png')
      pic2= pic[360:570, 380:470] #cropping the pic, works in my case
      cv2.imwrite('output.png', pic2)
      u= 'convert output.png -resize 700 output.png'
      os.system(u) #writing to terminal (re-sizing the pic)
      s= 'tesseract output.png output'
      os.system(s) #writing to terminal (running Tesseract)
      f= open('output.txt', 'r')
      string= f.read().replace('\n', ' ')
      string= string.replace(' ', ' ')
      string= string.replace(' ', ' ')
      first= string[:string.find('x')] #finding first integer
      second= string[string.find('x')+1:string.find(' ')] #finding second integer
      pro= int(first)*int(second)
      m= PyMouse()
      string= string[string.find(' ')+1:]
      a= int(string[:string.find(' ')])
      #checking if product is equal to any of answers and clicking on that particular option
      if pro == a:
        m.click(395, 429, 1)
        m.move(50,50) #move cursor to any random point which is not in our area of interest, avoiding tesseract to think it as some character
        string= string[string.find(' ')+1:]
        b= int(string[:string.find(' ')])
        if pro == b:
          m.click(395, 466, 1)
          string= string[string.find(' ')+1:]
          c= int(string[:string.find(' ')])
          if pro == c:
            m.click(395, 505, 1)
            d= int(string[string.find(' ')+1:])
            if pro == d:
              m.click(395, 544, 1)
              rand() #tesseract can't detect 100% accurately. So, tick any option randomly in case it didn't find correct answer
    except (ValueError, NameError, TypeError):
      rand() #tick randomly in case of any errors
  trails+= 10
  print("Total= " + str(trails))
  time.sleep(5) #waiting for 5secs after every 10 loops to make my system stable.

I’ll be back soon with MARK II of OCR system and next time I’ll not be using TesseractOCR. (Target accuracy- 90% (just a thought, though)) If you have any questions or some feedback, please feel free to add comments. I’d be happy to get some feedback from you.   So, happy donating. Any time

-SuryaTeja Cheedella

Car detection in MATLAB


Hello guys, how’s it going

Today we are going to train a cascadeDetector, which returns an XML file. We can use that XML file  to detect objects, cars (only from side-view) in this case, in an image.

As we are going to use matlab, I assume you have matlab installed on your PC along with image processing and computer vision toolboxes. The whole post is of two steps:

  1. Train our cascade detector with all the data files.
  2. Use the output XML file to detect objects in a pic.

The following pic. says it all.

Overview of what we are going to do in here.



Before going into the topics, lets see what we are going to build:

Detected correctly
This is the final output we are going to get by the end.
1.  Training the cascade file

First things first, to train a cascade detector we need a dataSet. A dataSet contains a lot of positive and negative images for a specific object. So, download the image dataBase from here. You can see a lot of image files (.pgm) in folders testImages, trainImages. You can get an overview by reading the ‘README.txt’ file in that downloaded folder. In this part we are concentrating only on trainImages folder and in next part we get onto teatImages. Make new folders ‘trainImagesNeg’ and ‘trainImagesPos’ and remember the path. Copy&Paste or Cut&Paste the pictures in ‘trainImages’ folder to these new folders. (you may know all the negative pictures are named neg-n.pgm and positive pictures as pos-n.pgm if you read that ‘.txt’ file)

So, here is the line to train your data:

trainCascadeObjectDetector('carTraindata4.xml', mydata, negativeFolder);

So, what’s with those arguments? Where the heck are they initialized.

Here we go, the first argument, a xml file is going to be saved in our current directory, so that we can use it for detecting objects. You can name it as you wish, but don’t forget the extension ‘.xml’. Next argument is actually a struct in matlab, which is the data of all positive images. It contains two fields namely imageFilename and objectBoundingBoxes. Size of this struct would be 1x(no. of pos images), 1×550 in this case as we have 550 pos images. Have a look at this:

Screenshot of struct of positive images with objectBoundingBoxes field

In the first field, the path of all 550 pos images are entered and in the second field the bounding boxes of our image of interest. As we got this whole data of images from a dataSets site, rather than collecting from internet, we don’t need to take that huge task of manually putting that values of bounding boxes into second field. (Thank God) Those values in second field are like [x of top-left point, y of top-left, width, height]. All the pictures in the dataSet are of size (100,40), and are already cropped to the image of interest. So, we can just select the whole pic by giving arguments as [1, 1, 100, 40]. And add that folder trainImagesPos to matlab path by right-clicking on it and click addpath.

Okay, I see where this is going. You mean I should do this for 550 times? (as there are 550 pos images) 

It’s absolutely your wish or you could use this for loop after initializing the struct ‘mydata’- (code is self-explanatory)

mydata= struct('imageFilename', 'Just a random string', 'objectBoundingBoxes', 'Just a random string');
for i=0:549,
 mydata(i+1).imageFilename = strcat('trainImagesPos/pos-', num2str(i), '.pgm');
 mydata(i+1).objectBoundingBoxes = [1, 1, 100, 40]


Now, the whole thing with the second argument ‘mydata’ is closed. As the name suggests the third argument ‘negativeFolder’ is just a folder containing negative images. There is no need of bounding boxes for negative images. So, no need of thing like struct. Just assign the folder path to this variable named negativeFolder-

negativeFolder= fullfile('C:\Users\Surya Teja Cheedella\Documents\MATLAB\carDetection\carData\trainImagesNeg')

For a good training, there should be a large number of negative images. As the number of neg. images in the dataSet are relatively low, I copy&pasted a lot of my personal images into that trainImagesNeg folder (make sure they don’t have pics of cars in side-view).

You can learn more about this function trainCascadeObjectDetector here.

Now, run the code with all arguments initialized. It took around 40 mins. to complete 13 stages of training on my laptop and returned a xml file.

Stages? What do you mean by them? Where did they come from?


Stages while training
An overview of what it’s gonna do in various stages.


2.  Detecting objects in an image.

After successful training, we can use the xml file to detect objects (cars in this case) in a picture. These lines of code will do that for us:

%initialising the variable detector with the xml file
detector= vision.CascadeObjectDetector('carTraindata3.xml');
%reading an image file in current dir.
img= imread('sun.png');
%bounding box around detected object
box= step(detector, img);
%inserting that bounding box in given picture and showing it
figure,imshow(insertObjectAnnotation(img, 'rectangle', box,' '));

I have manually tested my trained xml file with all the pics in the testImages folder. It has an accuracy of 93% and out of 180 images these are the statistics:

  • False Positives- 10 (single object in 120 pics and double objects in remaining)
  • True Negatives- 9

Here is the code (just a for loop) to detect a large number of images and display them-

for j= 1:100,
 img= imread(strcat('test-', num2str(j-1), '.pgm'));
 bbox= step(detector, img);
 figure,imshow(insertObjectAnnotation(img, 'rectangle', bbox,' '));

As usual my training has a small defect. You can understand by seeing the pic below 😛

Wrongly detected images


So, Happy Training!