I have a camera which has a motor attached which I can rotate using a serial cable. I figured it would be fun to have this camera analyze the webcam shots and turn in any direction there was motion. I pulled out python and pygame, and created a prototype. Unfortunately, I can’t make python go very fast. I made two test cases, 1 in C and 1 in python, to figure out if it would be worthwhile to rewrite it:
array-speed-test.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
void compare_arrays(int **screen1, int **screen2)
{
int x;
int y;
int diff;
int mult;
for(x = 0; x < 1920; x++) {
for(y = 0; y < 1080; y++)
{
mult = screen1[x][y] * screen2[x][y];
diff = abs(screen1[x][y] - screen2[x][y]);
}
}
}
int main(int argc, char **argv)
{
srand(time(NULL));
printf("Generating arrays...\n");
int **screen1;
int **screen2;
int i = 0;
int j = 0;
struct timeval now;
struct timeval end;
int usecs_passed;
screen1 = malloc(1920 * sizeof(int *));
screen2 = malloc(1920 * sizeof(int *));
for(i = 0; i < 1920; i++)
{
screen1[i] = malloc(1080 * sizeof(int));
screen2[i] = malloc(1080 * sizeof(int));
}
for(i = 0; i < 1920; i++)
{
for(j = 0; j < 1080; j++)
{
screen1[i][j] = rand() % 255;
screen2[i][j] = rand() % 255;
}
}
printf("Comparing arrays...\n");
gettimeofday(&now, NULL);
compare_arrays(screen1, screen2);
gettimeofday(&end, NULL);
usecs_passed = end.tv_usec - now.tv_usec;
printf("Time passed: %dms\n", (usecs_passed / 1000));
for(i = 0; i < 1920; i++)
{
free(screen1[i]);
free(screen2[i]);
}
return 0;
}
And my python code:
array-speed-test.py
#!/usr/bin/python
import random
import time
from math import fabs
def generateArray():
array_to_gen = [None] * 1920
for i in range(0, 1920):
array_to_gen[i] = [None] * 1080
for x in range(0,1920):
for y in range(0, 1080):
array_to_gen[x][y] = random.randrange(0,255)
return array_to_gen
def compareArrays(screen1, screen2):
for x in range(0, 1920):
for y in range(0, 1080):
diff = fabs(screen1[x][y] - screen2[x][y])
combo = screen1[x][y] * screen2[x][y]
if __name__ == "__main__":
print "Generating arrays..."
screen1 = generateArray()
screen2 = generateArray()
print "Created two screens. Comparing..."
startTime = time.time()
compareArrays(screen1, screen2)
print "Time taken: " + str((time.time() - startTime) * 1000) + "ms"
So far, the C program runs in 25ms, while the python program consistently takes 1100ms. Might have to ditch python for real time analysis, unless someone wants to point out how I am doing this completely wrong (I am assuming the comments will be use Numpy?)
Related posts:
- Python threads Frequently I need to launch sub processes from Python. Sometimes...
- Python Commands Module Python Commands Module The Python module ‘commands’ allows commands to...
- MMap to null I was reading an lwn article about an exploit: http://lwn.net/Articles/341773/...
- Python-RPM If you ever get to use this library, it’s design...
- Finding the difference between two files Overview of diff, colordiff, and meld...
#1 by redbrain on November 24, 2009 - 7:04 pm
Quote
Yeah image processing is one of the few areas which really test a computer to its limits, our normal x86 desktops of about 2gb ram is really massive for most tasks which most people forget these days, since image manipulation you have to process everything pretty much your probably best sticking with C i have a good friend who knows a lot in this area he done a motion detection thing in C# and its too slow, but if your clever you can implement some filters like sobel filter, i think it is called to make the image easier to read due to the reduced noise and color. http://en.wikipedia.org/wiki/Sobel_operator
But this isn’t an area i am comfortable in i am a compiler hacker lol. but yeah you have a memory leak in your c code you didn’t free the ’screen1′ and ’screen2′ arrays lol. The language i have been working at is a fair bit faster than python atm but it isn’t fair since my language is still very early days.
#2 by Wouter on November 24, 2009 - 7:23 pm
Quote
I wonder if using Psyco would get the python results close to the C results…?
#3 by sharms on November 24, 2009 - 7:42 pm
Quote
Unfortunately it appears Psyco doesn’t work with x86_64 architecture which is all I use
#4 by Leandro on November 24, 2009 - 8:38 pm
Quote
Have you tried using dicts instead of lists? Using (x, y) tuples as indices, things should be faster — lists in python are, AFAIK, linked lists, and not really suited for direct addressing.
#5 by Leandro on November 24, 2009 - 8:50 pm
Quote
Tried here; using dicts makes things about three times slower than using lists. Oops.
#6 by John Doe on November 24, 2009 - 8:52 pm
Quote
You should try using xrange() instead of range(). Also you may try Python arrays.
#7 by Eddward on November 24, 2009 - 9:06 pm
Quote
I don’t use python much, but I’m guessing it’s a bit like perl. How big does diff and combo get? Are they turning into bigints in python and just overflowing in C?
It looks like it’s all integer math in C. Python is probably using floating point unless you do something to make it do otherwise.
Lastly, diff & mult are just allocated on the stack without much in the way of initialisation. You probably have a lot more overhead in creating diff & combo in python. Could you create them globally or create and pre-size and result array in the python case?
These are all guesses. I only have a few minutes few right now and can’t try it and like I say, I’m deducing based on perl knowledge which could be all wrong.
Edd
#8 by John Doe on November 24, 2009 - 9:31 pm
Quote
I’ve tried using abs() instead of fabs() and I’m getting a 400ms speedup (~1500ms vs ~1900ms).
#9 by John Doe on November 24, 2009 - 9:34 pm
Quote
I’ve shaved 400ms more by doing this:
def compareArrays(screen1, screen2):
for x in xrange(0, 1920):
s1x = screen1[x]
s2x = screen2[x]
for y in xrange(0, 1080):
diff = abs(s1x[y] – s2x[y])
combo = s1x[y] * s2x[y]
#10 by Ulrik on November 24, 2009 - 9:34 pm
Quote
Hi. Numpy rocks.
You will see.
The reason is not only that it is blazingly fast. But look at the code. Do you like a good abstraction or do you like a good abstraction? The nice thing is how numpy totally removes the loops. Here it goes from 4.7 sec to 0.26 sec, not counting the awfully long array generation in the first case.
Full code here:
http://codepad.org/bwOhnS5D
Important part here:
import time
import numpy
WID = 1920
HEI = 1080
def generateArray():
return numpy.random.random_integers(0, 255, (WID,HEI))
def compareArrays(screen1, screen2):
diff_matrix = numpy.abs(screen1 – screen2)
combo_matrix = screen1 * screen2
#11 by Ulrik on November 24, 2009 - 9:38 pm
Quote
Numpy — written in C so that you don’t have to.
#12 by John Doe on November 24, 2009 - 9:53 pm
Quote
100ms more!
def compareArrays(screen1, screen2):
for x in xrange(0, 1920):
s2x = screen2[x]
for y, e in enumerate(screen1[x]):
diff = abs(e – s2x[y])
combo = e * s2x[y]
#13 by Ulrik on November 24, 2009 - 9:57 pm
Quote
John Doe — if you want to speed up pure-python loops, here are some things you can try: Loop by value, not by index.
for x, y in itertools.izip(screen1, screen2): diff = _abs(x – y); combo = x *y
tip nr 2:
for function calls inside loops (avoid these, but we have abs above), rebind the function to a local variable for the loop. So just before the for loop we define _abs = abs.
#14 by John Doe on November 24, 2009 - 10:35 pm
Quote
Ulrik: I’ve tried with itertools.izip (and normal zip) before posting the code and it was a lot slower. I forgot about the _abs trick but it worked well (-100ms).
def compareArrays(screen1, screen2):
_abs = abs
for x in xrange(0, 1920):
s2x = screen2[x]
for y, e in enumerate(screen1[x]):
diff = _abs(e – s2x[y])
combo = e * s2x[y]
#15 by sharms on November 25, 2009 - 12:19 am
Quote
Ulrik – that is fantastic, I will stick with numpy. Thanks for saving me a lot of time and ignorance
#16 by oliver on November 25, 2009 - 5:26 am
Quote
Would also recommend numpy, or opencv for more complicated image analysis.
#17 by Ronan on November 25, 2009 - 5:41 am
Quote
Do you realise that your compare_array() C function will be optimized into a noop by gcc if you use -O1 or above ?
#18 by Holly on November 25, 2009 - 5:53 am
Quote
Maybe I’m missing something, but why have you got
screen1[x][y] * screen2[x][y] in the loops. The result doesn’t appear to be used for anything.
#19 by Spider on November 25, 2009 - 6:29 am
Quote
@Ronan: Most probably yes. But my guess would be that in “reality” it’d be used to create a difference average/difference-map between the two. However, since this testcase just doesn’t actually do anything with the counts, it would end up being void.
#20 by sharms on November 25, 2009 - 10:28 am
Quote
Spider is correct, these are just tests, in real life I use the output of a bunch of these types of calculations.
I compiled it with ‘gcc -g -Wall’ which appears to use -O0, as when I recompiled with O1 my time passed is 0ms.