How to Write Faster Python

Peugeot F1 This is a high level guide on how to approach speeding up slow apps. I have been writing a computer vision app on the backend of a server and came to the realization that it is easy to write slow Python, and Python itself is not fast. If you’re iterating over a 1000 x 1000 pixels (1 megapixel) image, whatever you’re doing inside will run 1 million times. My code ran an iterative algorithm, and it took over 2 minutes to run the first version of the code. I was able to get that time down to under 10 seconds – and it only took a few hours time with minimal code changes. The hardest part was converting some snippets of code into C, which I plan to detail in the future. The basic approach I followed was:

  1. Set a baseline with working code & test
  2. Find the bottlenecks
  3. Find a way to optimize bottlenecks
  4. Repeat

And fair warning, these sorts of optimizations only work if you have selected a good algorithm. As we used to say at work “You can’t polish a turd”.

Slow, but working code

I am going to start with an example program. This resizes an image using the medians of each region. If you resize to 1/4th of the original size the new image will be the medians of 4×4 blocks. It is not too important you understand the details, but I still included code for reference:

import cv2
import numpy as np
import math
from datetime import datetime

def _median(list):
    """
    Return the median
    """

    sorted_list = sorted(list)
    if len(sorted_list) % 2 == 0:
        #have to take avg of middle two
        i = len(sorted_list) / 2
        #int() to convert from uint8 to avoid overflow
        return (int(sorted_list[i-1]) + int(sorted_list[i])) / 2.0
    else:
        #find the middle (remembering that lists start at 0)
        i = len(sorted_list) / 2
        return sorted_list[i]

def resize_median(img, block_size):
    """
    Take the original image, break it down into blocks of size block_size
    and get the median of each block.
    """

    #figure out how many blocks we'll have in the output
    height, width, depth = img.shape
    num_w = math.ceil(float(width)/block_size)
    num_h = math.ceil(float(height)/block_size)

    #create the output image
    out_img = np.zeros((num_h, num_w, 3), dtype=np.uint8)

    #iterate over the img and get medians
    row_min = 0
    #TODO: optimize this, maybe precalculate height & widths
    while row_min < height:
        r_max = row_min + block_size
        if r_max > height:
            r_max = height

        col_min = 0
        new_row = []
        while col_min < width:
            c_max = col_min + block_size
            if c_max > width:
                c_max = width

            #block info:
            block_b = []
            block_g = []
            block_r = []
            for r_i in xrange(row_min, r_max):
                for c_i in xrange(col_min, c_max):
                    block_b.append(img[r_i, c_i, 0])
                    block_g.append(img[r_i, c_i, 1])
                    block_r.append(img[r_i, c_i, 2])

            #have the block info, sort by L
            median_colors = [
                int(_median(block_b)),
                int(_median(block_g)),
                int(_median(block_r))
            ]

            out_img[int(row_min/block_size), int(col_min/block_size)] = median_colors
            col_min += block_size
        row_min += block_size
    return out_img

def run():
    img = cv2.imread('Brandberg_Massif_Landsat.jpg')
    block_size = 16
    start_time = datetime.now()
    resized = resize_median(img, block_size)
    print "Time: {0}".format(datetime.now() - start_time)
    cv2.imwrite('proc.png', resized)

if __name__ == '__main__':
    run()

This takes around 3 seconds to run against a random 1000×1000 image from Wikipedia (link):

500 loops, best of 3: 3.09 sec per loop

So this is a good starting point – it works! I am also saving the output image, so that when I make changes, I can spotcheck that I didn’t break something.

Profiling

Python’s standard library has a profiler that works very well (see The Python Profilers). Although I’m not a fan of the text output – I feel it does not intuitively roll up results – you can dump stats to a file which you can view in a UI.

So I can profile the script:

python -m cProfile -o stats.profile myscript.py

Note that it can sometimes add a lot of overhead.

And after installing runsnakerun I can view the stats:

runsnake stats.profile

which shows me: Runsnakerun GUI

So there are a few things that jump out as easy fixes. First, that app spends around 1/3rd of the time appending colors. Second, finding the median takes a significant amount of time as well – most of it from the sorted call – this is not visible in the image above. The other lines of code are not significant enough to display.

Optimizing

I have a few quick rules of thumb for optimizing:

  • creating/appending to lists is slow due to memory allocation – try to use list comprehensions
  • try not to run operations that create copies of objects unless it’s required – this is also slow due to memory allocation
  • dereferencing is slow: if you’re doing mylist[i] several times, just do myvar = mylist[i] up front
  • use libraries as much as possible – many are written in C/C++ and fast

Other than that, make use of search like Google or DuckDuckGo. You can tweak your Python code (you might discover you’re doing something wrong!), use Cython, write C libraries, or find another solution.

So profiling tells me that appending is slowing my code. I can get around this problem by declaring the list once and keeping track of how many elements I “add”. I also know that sorted() is not preferred because it creates a copy of the original list. I can instead use list.sort() and sort the list in place. I make these changes and run the code, and see the output is still good so I probably did not break it. Let’s time it.

500 loops, best of 3: 2.51 sec per loop

That’s almost 20% faster! Not bad for a few minutes of effort.

For completeness, here is the modified code:

import cv2
import numpy as np
import math
from datetime import datetime

def _median(list, ne):
    """
    Return the median.
    """

    #sort inplace
    if ne != len(list):
        list = list[0:ne]
    list.sort()
    if len(list) % 2 == 0:
        #have to take avg of middle two
        i = len(list) / 2
        #int() to convert from uint8 to avoid overflow
        return (int(list[i-1]) + int(list[i])) / 2.0
    else:
        #find the middle (remembering that lists start at 0)
        i = len(list) / 2
        return list[i]

def resize_median(img, block_size):
    """
    Take the original image, break it down into blocks of size block_size
    and get the median of each block.
    """

    #figure out how many blocks we'll have in the output
    height, width, depth = img.shape
    num_w = math.ceil(float(width)/block_size)
    num_h = math.ceil(float(height)/block_size)

    #create the output image
    out_img = np.zeros((num_h, num_w, 3), dtype=np.uint8)

    #iterate over the img and get medians
    row_min = 0
    #TODO: optimize this, maybe precalculate height & widths
    num_elems = block_size * block_size
    block_b = [0] * num_elems
    block_g = [0] * num_elems
    block_r = [0] * num_elems
    while row_min < height:
        r_max = row_min + block_size
        if r_max > height:
            r_max = height

        col_min = 0
        new_row = []
        while col_min < width:
            c_max = col_min + block_size
            if c_max > width:
                c_max = width

            #block info:
            num_elems = (r_max-row_min) * (c_max-col_min)
            block_i = 0
            for r_i in xrange(row_min, r_max):
                for c_i in xrange(col_min, c_max):
                    block_b[block_i] = img[r_i, c_i, 0]
                    block_g[block_i] = img[r_i, c_i, 1]
                    block_r[block_i] = img[r_i, c_i, 2]
                    block_i += 1

            #have the block info, sort by L
            median_colors = [
                int(_median(block_b, num_elems)),
                int(_median(block_g, num_elems)),
                int(_median(block_r, num_elems))
            ]

            out_img[int(row_min/block_size), int(col_min/block_size)] = median_colors
            col_min += block_size
        row_min += block_size
    return out_img

def run():
    img = cv2.imread('Brandberg_Massif_Landsat.jpg')
    block_size = 16
    start_time = datetime.now()
    resized = resize_median(img, block_size)
    print "Time: {0}".format(datetime.now() - start_time)
    cv2.imwrite('proc.png', resized)

if __name__ == '__main__':
    run()

Repeat

Now that I optimized the code a little, I repeat the process to make it even better. I try to decide what should be slow, and what should not. So, for example, sorting is not very fast here, but that makes sense. Sorting is the most complex part of this code – the rest is iterating and keeping track of observed colors.

Final Thoughts

After having optimized a few projects I have a few final thoughts and lessons learned:

  • Optimizing a slow/bad algorithm is a waste of time, you’ll need to redesign it anyways
  • The granularity of the profiler is sometimes a little funny. You can get around this by making sections of code into functions – metrics on functions are usually captured.
  • The profile will group the calls to the library functions together even if called from different places. So be careful with metrics for libraries/utils – i.e. numpy.
  • Use search to find ways to optimize your code.

If all else fails, you can implement slow parts in C relatively painlessly. This is the option I usually go with, and I will detail it out with some code in the next few weeks.

Have fun speeding up your code!

This entry was posted in How To, Languages, Performance and tagged by Charles Law. Bookmark the permalink.

About Charles Law

Charles started out in Systems Engineering, designing algorithms, but became a programmer once discovering Python's clean syntax. He has experience using many popular products and services in production environments including web2py, uwsgi, AWS, and OpenShift, as well as experience setting up front-ends in Javascript/Python. He tends to cover his experiences, and notes, from setting up servers, and talks about his experiments with different front-end tools.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>