K-Nearest Neighbors in OpenCV

K-Nearest Neighbors is a very simple machine learning algorithm. And OpenCV comes with it built in! In this post, we’ll use a freely available dataset (of handwritten digits), train a K-Nearest algorithm, and then use it to recognize digits. We will be using a few file operations (fopen, fread, etc). If you’re not familiar with those, just review them once.

Handwritten digits dataset

A massive dataset of handwritten digits is available on Yann LeCun’s website. Get all four files. The files contain the training images, training labels, testing images and actual labels for test images.

The files won’t open with a standard image viewer (Picasa, etc). We’ll have to write our own code to read images and labels from these files.

Training: Loading the dataset

We’ll use standard file operations in C/C++ to load the dataset. Yann’s website describes how data is stored in each file.

For files containing images, the first sixteen bytes contain information about the file. The first four bytes form a magic number (to ensure you’re reading a correct file). The next four bytes contain the number of images (an integer has 4 bytes). The next four bytes have the number of rows. The next four have the number of columns. After this, individual pixels of each image are listed (row-wise).

And for files containing labels, the first four bytes form the magic number. The next four bytes is the number of labels. And after that, each byte corresponds to one label. Each label is between 0 and 9.

With that in mind, create a new project. Include the OpenCV headers + standard header and link to syour OpenCV libraries.

#include <cv.h>
#include <highgui.h>
#include <ml.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
 
using namespace cv;
 
int main()
{
    return 0;
}

I’ll be using the C++ interface of OpenCV, so I’ve included the cv namespace too. Start off by creating two new file pointers:

int main()
{
    FILE *fp = fopen("C:\\train-images.idx3-ubyte", "rb");
    FILE *fp2 = fopen("C:\\train-labels.idx1-ubyte", "rb");
 
    return 0;
}

This opens both files in binary, read only mode. Then a little exception handling. It one of the files didn’t open, we can’t proceed.

    if(!fp || !fp2)
        return 0;

Next, we read values from both files:

    int magicNumber = readFlippedInteger(fp);
    int numImages = readFlippedInteger(fp);
    int numRows = readFlippedInteger(fp);
    int numCols = readFlippedInteger(fp);
 
    fseek(fp2, 0x08, SEEK_SET);

We’ll get to the readFlippedInteger function a little later. If you’re on an Intel machine (which is little endian), you need to use it. If you’re on some other high-endian processor, you can simply read using fread.

Labels start at byte 8, so we also skip the first two integers (thus, 8 bytes) in fp2.

Now, we create memory to store all the images and labels:

    int size = numRows*numCols;
    CvMat *trainingVectors = cvCreateMat(numImages, size, CV_32FC1);
    CvMat *trainingLabels = cvCreateMat(numImages, 1, CV_32FC1);

“vectors” refers to the images. Okay, so with memory in place, we read data from the files:

    BYTE *temp = new BYTE[size];
    BYTE tempClass=0;
    for(int i=0;i<numImages;i++)
    {
        fread((void*)temp, size, 1, fp);
        fread((void*)(&tempClass), sizeof(BYTE), 1, fp2);
 
        trainingLabels->data.fl[i] = tempClass;
 
        for(int k=0;k<size;k++)
            trainingVectors->data.fl[i*size+k] = temp[k];
    }

This code loads the i’th image into temp and its label into tempClass. And then, it fills appropriate elements in the trainingVectors and traingLabels matrices. Why? Because the matrices are floating point matrices. And the files have just single bytes. 4 bytes vs 1 byte does not match. So all the mess. If you know a better way to do this, please do let me know!

With all images and labels in the RAM, we can load these into the K-Nearest Neighbors algorithm:

    KNearest knn(trainingVectors, trainingLabels);

And then, we can print out the maximum value of k. And we close both files.

    printf("Maximum k: %d\n", knn.get_max_k());  
 
    fclose(fp);
    fclose(fp2);

We can also release the matrices. KNearest keeps its own copy of training vectors and classes.

    cvReleaseMat(&trainingVectors);
    cvReleaseMat(&trainingLabels);

Testing: Using K-Nearest Neighbors to recognize

We’ve trained the algorithm. Now its time to run it against test images and see how accurate it is.

We’ll continue in the same project. Add these lines at the bottom:

    fp = fopen("C:\\t10k-images.idx3-ubyte", "rb");
    fp2 = fopen("C:\\t10k-labels.idx1-ubyte", "rb");

Then we read initial information from these files and set the position of fp2 to the 8th byte:

    magicNumber = readFlippedInteger(fp);
    numImages = readFlippedInteger(fp);
    numRows = readFlippedInteger(fp);
    numCols = readFlippedInteger(fp);  
 
    fseek(fp2, 0x08, SEEK_SET);

Next, we allocate memory for all the images and labels stored in those files:

    CvMat *testVectors = cvCreateMat(numImages, size, CV_32FC1);
    CvMat *testLabels = cvCreateMat(numImages, 1, CV_32FC1);
    CvMat *actualLabels = cvCreateMat(numImages, 1, CV_32FC1);

testVectors store the actual image. testLabels store the label predicted by the algorithm. actualLabels stores the correct label (taken from the file fp2).

Then, we create some temporary variables:

    temp = new BYTE[size];
    tempClass=1;
    CvMat *currentTest = cvCreateMat(1, size, CV_32FC1);
    CvMat *currentLabel = cvCreateMat(1, 1, CV_32FC1);
    int totalCorrect=0;

temp and tempClass hold  the current image and its actual label. totalCorrect keeps a track of correct predictions. Now, we iterate through all images in the file:

    for(int i=0;i<numImages;i++)
    {
        fread((void*)temp, size, 1, fp);
        fread((void*)(&tempClass), sizeof(BYTE), 1, fp2);
 
        actualLabels->data.fl[i] = (float)tempClass;
 
        for(int k=0;k<size;k++)
        {
            testVectors->data.fl[i*size+k] = temp[k];
            currentTest->data.fl[k] = temp[k];
        }

This simply reads an image and its label from the files and stores them in the matrices. Then we try to predict the label for this image:

        knn.find_nearest(currentTest, 5, currentLabel);
        testLabels->data.fl[i] = currentLabel->data.fl[0];
 
        if(currentLabel->data.fl[0]==actualLabels->data.fl[i])
            totalCorrect++;
    }

Finally, we print the number of correct predictions:

    printf("Time: %d\nAccuracy: %f\n\n", (int)time, (double)totalCorrect*100/(double)numImages);
 
    return 0;
}

And this finishes our program.

Flipped integers and Endian-ness

Different processors have different Endian-ness. Suppose you have a 32-bit integer. So it’s stored over 4 bytes. Some processors read the most significant byte first. Others read the least significant byte first.

True, this just adds to confusion, but you can’t really control Intel or AMD.

Probably, the creates of the MNIST dataset have a non-Intel machine. So they’ve designed the file format that way. Here’s the code for reading a “flipped integer”.

int readFlippedInteger(FILE *fp)
{
    int ret = 0;
    BYTE *temp;
 
    temp = (BYTE*)(&ret);
    fread(&temp[3], sizeof(BYTE), 1, fp);
    fread(&temp[2], sizeof(BYTE), 1, fp);
    fread(&temp[1], sizeof(BYTE), 1, fp);
    fread(&temp[0], sizeof(BYTE), 1, fp);
 
    return ret;
}

It creates a Byte pointer to an integer and then reads bytes accordingly.

Fin

And with that, you’ve just made a machine learn something! Of course, there are much better alternatives. See if you can find some algorithms. Or if you can improve this algorithm (hint: maybe not all pixels of an image are as important).

Enjoy!

Issues? Suggestions? Visit the Github issue tracker for AI Shack

Back to top

18 Comments

  1. Joel
    Posted October 6, 2010 at 6:29 am | Permalink

    very good tutorial and website! i love aishack.

    how about the recognition error rate using this method ?

  2. Paulo Trigueiros
    Posted November 23, 2010 at 5:59 pm | Permalink

    The tutorial is very good….
    unfortunatly i get stucked at the BYTE declaration. How do I declare a BYTE?
    Another problem is that when I try to compile the code it gives the following error:
    main.cpp:69: error: undefined reference to `CvKNearest::CvKNearest(CvMat const*, CvMat const*, CvMat const*, bool, int)’

    but I used the code has you show it!
    :-(
    Can you help me?
    Thanks

    • Posted December 9, 2010 at 11:56 am | Permalink

      What version of OpenCV are you using? Instead of BYTE you can use unsigned char. It should work almost the same.

  3. YPWong
    Posted December 13, 2010 at 6:35 pm | Permalink

    Hi Utkarsh,

    Very cool and useful site, I have spent a week going through your examples.

    It will be nice if you can provide the full source code so that we do not need to cut and paste.

    Some errors in your above program:
    First,

        FILE *fp = fopen("C:\\train-images-idx3-ubyte", "rb");
        FILE *fp2 = fopen("C:\\train-labels-idx1-ubyte", "rb");

    should be:

        FILE *fp = fopen("C:\\train-images.idx3-ubyte", "rb");
        FILE *fp2 = fopen("C:\\train-labels.idx1-ubyte", "rb");

    [i.e. a dot and not dash before idx]

    Second,

       CvMat *trainingLabels = cvCreateMat(numImages, 1, CV_32FC1);

    should be:

        CvMat *trainingClasses = cvCreateMat(numImages, 1, CV_32FC1);

    Third,

        KNearest knn(trainingVectors, trainingClasses);

    should be:

        CvKNearest knn(trainingVectors, trainingClasses);

    Fourth,

    #include <ml.h>

    [for CvKNearest]

    Fifth,

    #include <time.h>

    [for time()]

    #include <stdlib.h>
    #include <stdio.h>

    [for FILE and printf()]
    Not all compiler automatically include this.

    Btw, i am using MingW / G++

    Thanks.

    • Posted December 15, 2010 at 12:51 am | Permalink

      I’ll fix those right now. From the next article on, I’ll make it a point to attach source code! Thanks for the suggestions!

  4. Nadeeshani
    Posted January 11, 2011 at 10:46 pm | Permalink

    Hello… Your tutorial is very good. Thanks for it. I am doing an ancient coins recognition system for my final year undergraduate project. What I have done so far is,
    1.convert to grayscale
    2.Smoothing using Gaussian Blur
    3.Adaptive Thresholding
    4.cvMorphologyEx
    5.Canny edge detection
    6.Contour detection

    Can you please help me how to train coins? To train coins, I have to give feature vector right? How can I give it from the things done until now?

    • Posted January 13, 2011 at 9:48 pm | Permalink

      Um.. I don’t think it would work that way. Try having a look at template matching. That might work. Or maybe I’m missing some piece of information.

  5. Nadeeshani
    Posted January 15, 2011 at 11:22 am | Permalink

    Thx for replying me. When I go through research papers, almost all researchers has chosen k-nearest neighbour algorithm for ancient coins recognition. In my case, I can recognize coins based on King’s name minted on coins. Since ancient coins in our country was not minted using machines, but my human, even two coins from the same type may vary a little. For an example, just think that the character ‘A’ is in one type of an ancient coin. But if I take another coin from the same type, that character ‘A’ might be little vary. (like hand written characters are varied from person to person, so ‘A’ will write in different ways by different people). So, will I be able to use template matching?. To use template matching, I should have each character in different ways right? Problem is that those characters are not in english letters. Those are in some old character type. Even it’s very hard to find out separate characters. Can you please help me? I am new to image processing and AI. I was reading a lot of tutorials but those ware really hard to understand. But, your tutorials are easy to understand. I understood may things by reading your tutorials. Thanks a lot. :)

    • Posted January 19, 2011 at 9:47 pm | Permalink

      You could use photoshop to “extract” individual characters. I don’t think there could be any other way for using template matching.

      Or you could use the entire ruler’s name as a template “Ashoka” for example would be one template.

      • Nadeeshani
        Posted January 26, 2011 at 12:15 am | Permalink

        Thanks a lot for your help.

  6. Nadeeshani
    Posted January 26, 2011 at 12:14 am | Permalink

    Hi. You have used binary file C:\\train-images.idx3-ubyte . How can I create this file? Can you please explain about this file and how to generate this file? Thanks in advance.

  7. Richárd Szabó
    Posted April 13, 2011 at 7:27 pm | Permalink

    One small remark to your excellent tutorial: your are still using some parts of the C interface of openCV while you mention that you use the C++ one. For example you could create a matrix like this:

    Mat trainingVectors(numImages, size, CV_32FC1);

    instead of this:

    CvMat *trainingVectors = cvCreateMat(numImages, size, CV_32FC1);

    Anyway keep up this good work.

    • Posted April 29, 2011 at 9:25 pm | Permalink

      Yes. That was intentional. For some reason, a few functions don’t (didn’t?) have an equivalent C++ interface. So you’re forced to use the C interface.

  8. Richárd Szabó
    Posted April 13, 2011 at 7:51 pm | Permalink

    Could you compile your code with KNearest? Is it a valid reference in an OpenCV version?
    Because you still use KNearest instead of CvKNearest (YPWong already pointed that).

    It would be a good idea to make your final source code available.

  9. Posted May 24, 2011 at 3:15 pm | Permalink

    Can i use the code
    i mean is it free open source??

    • Posted June 3, 2011 at 12:11 am | Permalink

      Yes, you can do whatever you want with the code!

  10. Ursache
    Posted June 11, 2011 at 7:07 pm | Permalink

    Hello, I would like to thank you a lot about this excellent tutorial, it’s a great work and I appreciate your effort to help us. Thank you again.
    I’m just a beginner in this field, but I learn every day new things…because some people as you…

    I would also like to ask if I can use other images with the same form to recognize them … Firstly, where can I find an other database or just some samples to test them. And how can I change this source code, using OpenCV, to read, recognize and classify images. I mean, I want to input an image, recognize it and classify it, in this way, the user can see the results.

    Thank you a lot. It’s very important for me to get an answer.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>