Engineyard Competition
Recently I saw on some news site a competition to calculate SHA1 collisions. Naturally, I tried it out, but used Python, my preferred language. Unfortunately, my Python implementation was significantly slower than others using C (which was smoot since both C and Python implementations were completely dwarfed by CUDA implementations). I entered the competition after it was already in progress, and managed to get a pretty good score, but that was with an insane amount of computers crunching it.

So that leads me back to the topic of this post. Convinced that I had more hardware than anyone else in the competition, and that even though I started with less than 12 hours to beat everyone, I was pretty sure I should have won (before I saw the CUDA performance). So I started rewriting my hastily written Python program in C. The competition was already over, so this was purely for self improvement.

A Little Rusty
My first programming language was C, and I was an expert in it when I was 13 or so (probably 1992). Since then, my attention was driven towards Python and PHP, as the defacto languages of the web. This made me realize exactly how much I had forgotten about plain old C (my C++ was not horrible, atleast I could still understand C++ code written by decent programmers). Anyhow, part of the competition included taking numbers and comparing how much each of the 160 bits varied (ie 1011 and 1001 differ by 1 bit, 1100 and 1111 differ by 2 bits etc). So I wanted to make a C function that converts any arbitrary integer to a binary string (which really wasn’t the goal of the exercise, as the hashes were in hex, but it made me try and remember how to convert an integer as it had been so long).

Conclusion to a Very Boring Story
So I ended up writing an int2binary function and a program that converts them and prints them out. It will break with very large number, but I included a function to estimate it’s size (always going to the next number divisible by 4 as I like that padding), so linking that to a malloc would be trivial. I don’t suggest this is run anywhere as it hasn’t been really tested or code reviewed, but I like my answer better than most of the answers I found on the net, so I will share it with you, and hope to get a higher google score than the ones I don’t like:

int2bin.c

#include <stdio.h>

void int2binary(unsigned int n, char *buffer, unsigned int buffer_size);
unsigned int buffer_sizer(unsigned int n);

int main(int argc, char **argv)
{
        char buffer[2048];
        unsigned int binaryNumbers[9] = { 1, 4, 6, 8, 10, 55, 123, 2000, 2048 };
        unsigned int i = 0;

        for(i = 0; i < 9; i++)
        {
                int2binary(binaryNumbers[i], buffer, (buffer_sizer(binaryNumbers[i]) + 1));
                printf("%4u: %s\n", binaryNumbers[i], buffer);
        }

        return 0;
}

void int2binary(unsigned int n, char *buffer, unsigned int buffer_size)
{
        unsigned int i = (buffer_size - 1);
        buffer[i] = '\0';

        while(i > 0)
        {
                if(n & 0x01)
                {
                        buffer[--i] = '1';
                } else
                {
                        buffer[--i] = '0';
                }

                n >>= 1;
        }
}

unsigned int buffer_sizer(unsigned int n)
{
        unsigned int x = 1;
        unsigned int i = 0;
        unsigned int lastFound = 0;

        // Takes care of n = 0, and we dont need to loop for 1st case
        if(n < 16)
        {
                return 4;
        }

        while(1)
        {
                i++;
                if(n / x)
                {
                        lastFound = i;
                } else
                {
                        break;
                }

                x <<= 1;
        }

        // Not a multiple of 4, how much do we need to add to it
        if(lastFound % 4)
        {
                lastFound += (4 - (lastFound % 4));
        }

        return lastFound;
}

As you can see I did a pretty poor job commenting it and naming variables, basically though in int2binary we need to start from the right hand side and work our way left, so we start variable i at the end of the string and work our way back. We use & to compare variable n and 0×01 (which means that if the rightmost position has a 1, we add a 1 to the buffer, else add a 0 to the buffer).

For buffer sizer, this just tried to take a guess at how long that string should be. If the integer is less than 16 it will always be 4, so no need to do anything else (I was in recursive mode for a different algorithm so I was cutting things off at beginning mode). Otherwise we loop through, and try to divide variable n by variable x. Each time it is ran, variable x is shifted to the left, which has the effect of squaring the number. This is also super fast on the cpu.

For people not in C development mode, didn’t make a make file, but to compile it:

gcc int2bin.c

Output should be:

[sharms@sparrow ~]$ ./a.out
   1: 0001
   4: 0100
   6: 0110
   8: 1000
  10: 1010
  55: 00110111
 123: 01111011
2000: 011111010000
2048: 100000000000

Related posts:

  1. Why?
  2. Creating graphs in python
  3. Programming using IOCTL to interface with Linux kernel drivers
  4. Annoying people with code: A gentle introduction to C# and Mono Part 2 – Data Types
  5. Adventures in 3D Dual Head for Dell D620 with intel