Object Classification on aYahoo image dataset

Disclaimer:
1. You need to have Caffe installed on your system to run the below tasks. If not follow the tutorial here to get Caffe up and running on your system.
2. You need to download aYahoo image dataset here.
3. You need to get all the pre-trained models: alexnet.caffemodel, coffenet.caffemodel and googlenet.caffemodel to get started on extracting features from images.

We will start by getting the paths to all the downloaded images into a text file. Run this line in terminal with correct path to images’ folder and path to text file in which we need to append all the paths to images.

find `pwd`/examples/images/ayahoo -type f -exec echo {} \; > examples/tempFeaExtr/temp.txt

Now we need to label the images. I am assigning labels starting from 0. It is recommended to sort all the names in text file, so we can divide the whole dataset into train and test datasets with reasonable number of images of each class in both datasets rather than randomly the whole data.

#to_label
outfile= open('zz_temp.txt', 'w')
infile= open('z_temp.txt', 'r')
myset= []

for line in infile:
 a= line[(len(line)-(line[::-1].index('/'))):-1]
 a= a[:a.index('_')]
 myset.append(a)

 if a== "bag":
 b= "0"
 elif a== "building":
 b= "1"
 elif a== "carriage":
 b= "2"
 elif a== "centaur":
 b= "3"
 elif a== "donkey":
 b= "4"
 elif a== "goat":
 b= "5"
 elif a== "jetski":
 b= "6"
 elif a== "monkey":
 b= "7"
 elif a== "mug":
 b= "8"
 elif a== "statue":
 b= "9"
 elif a== "wolf":
 b= "10"
 elif a== "zebra":
 b= "11"

 outfile.write(line[:-1]+ " "+ b+ "\n")
print(set(myset))

infile.close()
outfile.close()

#to_sort
infile= open('zz_temp.txt', 'r')
outfile= open('z_temp.txt', 'w')
myset= []

for line in infile:
 myset.append(line[:-1])

myset.sort()
for a in myset:
 outfile.write(a+"\n")

infile.close()
outfile.close()

As we have labeled and sorted the whole database, now, we divide it into two sets namely train_data.txt and test_data.txt. Try to keep the ratio of “no. of train images/ no. of test images” constant for all the classes. Now, we have train_data and test_data to proceed on to next step.

I am going to extract features of every image using the pre-trained models. After extracting I am gonna train SVM classifiers to predict the classes of test_data.

Link to the code to extract features and to train SVMs id here on GitHub.

Happy classifying images.
Adios