Equality Operator

Matrix-agent-Smith-clones One thing that always bothered me, and many others, is how useless the == operator is in JavaScript. It seems to combine the worst of both worlds from equality and identity:

  • When comparing primitive types, it acts as a lax equality operator
  • When comparing objects, it acts as an identity operator

I can’t think of any use case for which such behavior is desired. In fact, reversing this behavior would produce a much more useful operator since the concept of “equals” tends to relax as data gets more complex because some dimensions are simply irrelevant. Do I care that two objects were not created at the same exact time? Do I care that their IDs differ? Do I care if the measured length is 4.00000000001 inches rather than 4? When it comes to equality comparison, JavaScript fails miserably. This is actually one of Douglas Crockford’s pet peeves about the language, which he explains in more detail in his book: JavaScript: The Good Parts.

Fortunately, there is also a more consistent identity operator (===), often erroneously called “strict equality” by people who don’t understand the difference between identity and equality. Unfortunately, as I just mentioned in previous paragraph, there is no concept of equality operator in the language at all. For those unfamiliar with these concepts, identity answers the question of “Does A and B refer to the same object?” while equality answers the question of “Is B a clone of A?“.

Naturally, both of these questions come up in programming a lot, so it’s a shame that only one can be easily answered in JavaScript. You probably already stumbled into this problem if you ever tried something like this:

new Date('1/1/2000') == new Date('1/1/2000')

or this:

{"foo": 1} == {"foo": 1}

The problem is not unique to JavaScript, many other languages lack equality operator as well. Typically, however, those languages have an alternative mechanism for handling this case, such as operator overloading. Indeed, there are cases when operator overloading is actually superior – such as when objects contain meta data that one wishes to omit from comparison (i.e. exact creation time, unique id, etc.). Unfortunately, JavaScript doesn’t allow for operator overloading either, and herein lies the problem. While one can easily roll a proper equality function (and many libraries such as underscore already include one) you would still have to decide whether it’s worth using on a case-by-case basis.

We’re not in FORTRAN age anymore, where developers had to tweak each operation. Today we’re spoiled, we often can get away with simply telling the compiler what to do rather than how, and enjoy optimal performance 90% of the time with negligible overhead. One such example is the sort function. When was the last time you had to wonder if you should use merge or insertion sort? You simply use the built-in sort and assume that unless there is something very special about your data, you’re better off moving on to other things rather than attempting to optimize the last 10% out of the algorithm. In most modern languages equality operator falls in the same category. Sure, you’d be able to shave off a few microseconds by replacing that == (equality) with is (identity) for certain comparison operations in Python, but is it worth the extra brain cycles? Most of the time the answer is “No”.

Why then do we have to be explicitly aware of this in JavaScript? Moreover, why can’t RapydScript compile == into deep equality? First, I should mention that, like some other libraries/languages for JavaScript, RapydScript already has a deep equality comparison via the inbuilt eq function. Yes, eq(a, b) (previously called deep_eq) has been supported for years. The problem is that (up until recently) I wasn’t able to make the decision for you of whether you want eq or ==. The issue boils down to the fact that if I decide to compile == to equality across the board for the developer, I effectively introduce enormous overhead (about 700% according to jsperf) on the user in cases where he/she expected identity comparison instead (which is about 90% of the time, since primitives are a lot more common). There has been a lot of discussion about this, which you can follow in issues 93 and 94 on my github page.

As you probably already guessed from previous paragraph, I’m now able to bridge that gap. Effective last month (that’s right, I snuck a change in without anyone noticing), RapydScript is the first JavaScript-based transcompiler to support high-performance deep equality. How performant is this operation, you may ask? Well, take a look for yourself:

Screen Shot 2015-12-05 at 2.59.38 PM

According to jsperf, the overhead is negligible (that’s right, the measurement noise is greater than any visible difference – as you can see from the deep-equality version outperforming identity). So what changed? Why am I suddenly able to blow the doors off of typical deep equality? Well, nothing new happened in JavaScript world. What changed is my approach. I decided to think about the problem creatively and had a sudden eureka moment (which I’m sure other compilers will copy in the future). Unfortunately you’ve probably already looked at the code from JsPerf above, ruining my surprise. But in case you haven’t, here is the pattern I came up with:

A === B || typeof A === "object" && eq(A, B)

How does it work? As you probably learned in your introductory programming class (assuming it was a legitimate C/Java class rather than an online tutorial about JavaScript), binary operators have short-circuit ability in just about all languages. This means that if left-hand side is truthy for || (or) operator, or falsy for && (and), the rest of the line is ignored (it’s as if it’s not there). That means that if A and B are primitives that are equal, the above will be equivalent to a simple A === B comparison. That handles the positive case, but negative is a bit trickier. I was struggling with it for a while (yes, the above equation makes it seem simpler than it really is – everything seems easy in hindsight), until I found a performant operation that works in both, browser and node (my first attempt was to check if A.constructor exists, which doesn’t work on all platforms). Fortunately, the very same “feature” that makes typeof useless in most cases (the fact that every non-primitive is an object) becomes the saving grace of this operation. As you can see from the JsPerf test, the overhead for this operation is negligible as well.

The best part is that unlike native JavaScript, where the developer would have to type that out by hand (because hiding it in a function call introduces the 700% overhead we’re trying to avoid), RapydScript can unroll == operator into that magic automatically. You’re probably wondering, then, how is it that this change has been in RapydScript for over a month if the == still compiles to ===? Well, for safety I’ve hidden this operator behind an import. If you wish all your == operators to compile to proper equality test shown above, add the following line at the top of your file:

from danger_zone import equality

That’s it. Now all your == will compile to the magic shown above, and != will compile to its inverse (thanks to DeMorgan’s Law). The above import will also tweak implementation of indexOf and lastIndexOf to perform deep equality as well, which in turn makes tests like if a in arr be based on deep equality (delivering a consistent experience across the board). As before, the identity operator is still there as well, via is and is not, which compile to === and !==, respectively. In the future, I also want to add an optimizer to the compiler, which will strip away the remainder of the line at compile time if one of the operands is a constant.

Python & C

Python CAs promised, I took a script written in Python and ported parts of it to C. The Python interface is the same, but behind the scenes is blazing fast C code. The example is a little different from what I usually do, but I discovered a few things I had never seen.

I am working off the same example as the previous post, How to Write Faster Python, which has the full original code. The example takes an input image and resizes it down by a factor of N. It takes NxN blocks and brings them down to 1 pixel which is the median RGB of the original block.

Small Port

This example is unusual for me because it’s doing 1 task, resizing, and the core of this 1 task is a very small bit of logic – the median function. I first ported the median code to C, and checked the performance. I know the C code is better because C’s sorting function lets you specify how many elements from the list you want to sort. In Python, you sort the whole list (as far as I know) and so I was able to avoid checks on the size of the list vs. the number of elements I want to sort.

Below is the code, but I have a few changes I want to mention as well.

myscript2.py:

import cv2
import numpy as np
import math
from datetime import datetime
import ctypes

fast_tools = ctypes.cdll.LoadLibrary('./myscript_tools.so')
fast_tools.median.argtypes = (ctypes.c_void_p, ctypes.c_int)


def _median(list, ne):
    """
    Return the median.
    """

    ret = fast_tools.median(list.ctypes.data, ne)
    return ret


def resize_median(img, block_size):
    """
    Take the original image, break it down into blocks of size block_size
    and get the median of each block.
    """

    #figure out how many blocks we'll have in the output
    height, width, depth = img.shape
    num_w = math.ceil(float(width)/block_size)
    num_h = math.ceil(float(height)/block_size)

    #create the output image
    out_img = np.zeros((num_h, num_w, 3), dtype=np.uint8)

    #iterate over the img and get medians
    row_min = 0
    num_elems = block_size * block_size
    block_b = np.zeros(num_elems, dtype=np.uint8)
    block_g = np.zeros(num_elems, dtype=np.uint8)
    block_r = np.zeros(num_elems, dtype=np.uint8)
    while row_min < height:
        r_max = row_min + block_size
        if r_max > height:
            r_max = height

        col_min = 0
        new_row = []
        while col_min < width:
            c_max = col_min + block_size
            if c_max > width:
                c_max = width

            #block info:
            num_elems = (r_max-row_min) * (c_max-col_min)
            block_i = 0
            for r_i in xrange(row_min, r_max):
                for c_i in xrange(col_min, c_max):
                    block_b[block_i] = img[r_i, c_i, 0]
                    block_g[block_i] = img[r_i, c_i, 1]
                    block_r[block_i] = img[r_i, c_i, 2]
                    block_i += 1

            #have the block info, sort by L
            median_colors = [
                int(_median(block_b, num_elems)),
                int(_median(block_g, num_elems)),
                int(_median(block_r, num_elems))
            ]

            out_img[int(row_min/block_size), int(col_min/block_size)] = median_colors
            col_min += block_size
        row_min += block_size
    return out_img

def run():
    img = cv2.imread('Brandberg_Massif_Landsat.jpg')
    block_size = 16
    start_time = datetime.now()
    resized = resize_median(img, block_size)
    print "Time: {0}".format(datetime.now() - start_time)
    cv2.imwrite('proc.png', resized)

if __name__ == '__main__':
    run()

myscript_tools.c

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h> // for malloc
#include <math.h>

int cmpfunc (const void * a, const void * b)
{
   return ( *(uint8_t*)a - *(uint8_t*)b );
}

int _median(uint8_t * list, int ne){
    //sort inplace
    qsort(list, ne, sizeof(uint8_t), cmpfunc);
    int i;
    if (ne % 2 == 0){
        i = (int)(ne / 2);
        return ((int)(list[i-1]) + (int)(list[i])) / 2;
    } else {
        i = (int)(ne / 2);
        return list[i];
    }
}

int median(const void * outdatav, int ne){
    int med;
    uint8_t * outdata = (uint8_t *) outdatav;
    med = _median(outdata, ne);
    return med;
}

and to compile:

gcc -fPIC -shared -o myscript_tools.so myscript_tools.c

Note: I discovered the .so filename should not match any Python script you plan to import. Otherwise you will get a “ImportError: dynamic module does not define init function” because Python is trying to import the C code instead of your Python module.

The main change I made was to port the median function to C. I also switched from using a list of colors, to using a numpy array of colors. I did this because it’s easier to pass numpy arrays, and because that is more useful in general. Also, so I wouldn’t have to convert the list to a numpy array on every median call, I crated the numpy array in the resize_median code itself. Before this last change, my benchmark showed I was actually running slower. The faster C code was not making up for all the list –> numpy array conversions!

After the 2nd fix I timed it

python -m timeit "import myscript2; myscript2.run()"

and got

500 loops, best of 3: 2.04 sec per loop

Nice! That’s compared to 2.51 from my last post, so it’s around 18% faster.

All C (sort of)

I wasn’t happy with the 18% boost – I thought it would be faster! But, it was only a very small portion ported. I decided to port more, but there wasn’t a good natural place to split the resize function up any further, so I ported the whole thing. This way I could also show how to send and receive a whole image back from C – which I do by reference/pointer.

myscript2.py

import cv2
import numpy as np
import math
from datetime import datetime
import ctypes

fast_tools = ctypes.cdll.LoadLibrary('./myscript_tools.so')

def c_resize_median(img, block_size):
    height, width, depth = img.shape

    num_w = math.ceil(float(width)/block_size)
    num_h = math.ceil(float(height)/block_size)

    #create a numpy array where the C will write the output
    out_img = np.zeros((num_h, num_w, 3), dtype=np.uint8)

    fast_tools.resize_median(ctypes.c_void_p(img.ctypes.data),
                             ctypes.c_int(height), ctypes.c_int(width),
                             ctypes.c_int(block_size),
                             ctypes.c_void_p(out_img.ctypes.data))
    return out_img


def run():
    img = cv2.imread('Brandberg_Massif_Landsat.jpg')
    block_size = 16
    start_time = datetime.now()
    resized = c_resize_median(img, block_size)
    print "Time: {0}".format(datetime.now() - start_time)
    cv2.imwrite('proc.png', resized)

if __name__ == '__main__':
    run()

myscript_tools.c

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h> // for malloc
#include <math.h>

int cmpfunc (const void * a, const void * b)
{
   return ( *(uint8_t*)a - *(uint8_t*)b );
}

int _median(uint8_t * list, int ne){
    //sort inplace
    qsort(list, ne, sizeof(uint8_t), cmpfunc);
    int i;
    if (ne % 2 == 0){
        i = (int)(ne / 2);
        return ((int)(list[i-1]) + (int)(list[i])) / 2;
    } else {
        i = (int)(ne / 2);
        return list[i];
    }
}

void resize_median(const void * bgr_imgv, const int height, const int width,
                   const int block_size, void * outdatav) {
    const uint8_t * bgr_img = (uint8_t  *) bgr_imgv;
    uint8_t * outdata = (uint8_t *) outdatav;

    int col, row, c_max, r_max, c_si, r_si;
    int max_val;
    const int num_elems = block_size * block_size;
    int j, offset;

    uint8_t *b_values;
    uint8_t *g_values;
    uint8_t *r_values;
    b_values = (uint8_t *) malloc(sizeof(uint8_t) * num_elems);
    g_values = (uint8_t *) malloc(sizeof(uint8_t) * num_elems);
    r_values = (uint8_t *) malloc(sizeof(uint8_t) * num_elems);

    int out_i = 0;

    row = 0;
    while (row < height) {
        col = 0;
        r_max = row + block_size;
        if (r_max > height) {
            r_max = height;
        }
        while (col < width) {
            c_max = col + block_size;
            if (c_max > width) {
                c_max = width;
            }

            // block info:
            j = 0;
            for (r_si = row; r_si < r_max; ++r_si) {
                for (c_si = col; c_si < c_max; ++c_si) {
                    offset = ((r_si*width)+c_si)*3;
                    b_values[j] = bgr_img[offset];
                    g_values[j] = bgr_img[offset+1];
                    r_values[j] = bgr_img[offset+2];
                    j += 1;
                }
            }

            // have the block info, get medians
            max_val = j;
            outdata[out_i]   = _median(b_values, max_val);
            outdata[out_i+1] = _median(g_values, max_val);
            outdata[out_i+2] = _median(r_values, max_val);

            // update indexes
            out_i += 3;
            col += block_size;
        }
        row += block_size;
    }

    free(b_values);
    free(g_values);
    free(r_values);
}

This one is a little more complicated. With multi-dimensional arrays, numpy flattens the data. The new array is [pixel00_blue, pixel00_green, pixel00_red, pixel01_blue, ...] – see the offset variable above for the row/column equation. I also pass pointers with no type, so before using the arrays, I have to typecast them. I realize this example is unusual because the whole script is basically ported, but it illustrates many things that are non-trivial and required some time to figure out.

And for the moment of truth…

500 loops, best of 3: 440 msec per loop

82% faster!

Although it’s a mostly C at this point.

Final Thoughts

So in this case, the Python took about 5 times as long to run compared to the C version. When something takes a few milliseconds, 5-10x isn’t that important, but when it gets more complex, it starts to matter (assuming you’re not I/O bound). Here I ported pretty much the whole module. I think this only makes sense with a library, and not code that you will refer to often – Python is just so easy to read.

An idea popped in my head as I was writing this as well. This should be easy to port to RapydScript. JavaScript has the Image object I could use in place of opencv. I wonder where JavaScript would fall on the spectrum.

How to Write Faster Python

Peugeot F1 This is a high level guide on how to approach speeding up slow apps. I have been writing a computer vision app on the backend of a server and came to the realization that it is easy to write slow Python, and Python itself is not fast. If you’re iterating over a 1000 x 1000 pixels (1 megapixel) image, whatever you’re doing inside will run 1 million times. My code ran an iterative algorithm, and it took over 2 minutes to run the first version of the code. I was able to get that time down to under 10 seconds – and it only took a few hours time with minimal code changes. The hardest part was converting some snippets of code into C, which I plan to detail in the future. The basic approach I followed was:

  1. Set a baseline with working code & test
  2. Find the bottlenecks
  3. Find a way to optimize bottlenecks
  4. Repeat

And fair warning, these sorts of optimizations only work if you have selected a good algorithm. As we used to say at work “You can’t polish a turd”.

Slow, but working code

I am going to start with an example program. This resizes an image using the medians of each region. If you resize to 1/4th of the original size the new image will be the medians of 4×4 blocks. It is not too important you understand the details, but I still included code for reference:

import cv2
import numpy as np
import math
from datetime import datetime

def _median(list):
    """
    Return the median
    """

    sorted_list = sorted(list)
    if len(sorted_list) % 2 == 0:
        #have to take avg of middle two
        i = len(sorted_list) / 2
        #int() to convert from uint8 to avoid overflow
        return (int(sorted_list[i-1]) + int(sorted_list[i])) / 2.0
    else:
        #find the middle (remembering that lists start at 0)
        i = len(sorted_list) / 2
        return sorted_list[i]

def resize_median(img, block_size):
    """
    Take the original image, break it down into blocks of size block_size
    and get the median of each block.
    """

    #figure out how many blocks we'll have in the output
    height, width, depth = img.shape
    num_w = math.ceil(float(width)/block_size)
    num_h = math.ceil(float(height)/block_size)

    #create the output image
    out_img = np.zeros((num_h, num_w, 3), dtype=np.uint8)

    #iterate over the img and get medians
    row_min = 0
    #TODO: optimize this, maybe precalculate height & widths
    while row_min < height:
        r_max = row_min + block_size
        if r_max > height:
            r_max = height

        col_min = 0
        new_row = []
        while col_min < width:
            c_max = col_min + block_size
            if c_max > width:
                c_max = width

            #block info:
            block_b = []
            block_g = []
            block_r = []
            for r_i in xrange(row_min, r_max):
                for c_i in xrange(col_min, c_max):
                    block_b.append(img[r_i, c_i, 0])
                    block_g.append(img[r_i, c_i, 1])
                    block_r.append(img[r_i, c_i, 2])

            #have the block info, sort by L
            median_colors = [
                int(_median(block_b)),
                int(_median(block_g)),
                int(_median(block_r))
            ]

            out_img[int(row_min/block_size), int(col_min/block_size)] = median_colors
            col_min += block_size
        row_min += block_size
    return out_img

def run():
    img = cv2.imread('Brandberg_Massif_Landsat.jpg')
    block_size = 16
    start_time = datetime.now()
    resized = resize_median(img, block_size)
    print "Time: {0}".format(datetime.now() - start_time)
    cv2.imwrite('proc.png', resized)

if __name__ == '__main__':
    run()

This takes around 3 seconds to run against a random 1000×1000 image from Wikipedia (link):

500 loops, best of 3: 3.09 sec per loop

So this is a good starting point – it works! I am also saving the output image, so that when I make changes, I can spotcheck that I didn’t break something.

Profiling

Python’s standard library has a profiler that works very well (see The Python Profilers). Although I’m not a fan of the text output – I feel it does not intuitively roll up results – you can dump stats to a file which you can view in a UI.

So I can profile the script:

python -m cProfile -o stats.profile myscript.py

Note that it can sometimes add a lot of overhead.

And after installing runsnakerun I can view the stats:

runsnake stats.profile

which shows me: Runsnakerun GUI

So there are a few things that jump out as easy fixes. First, that app spends around 1/3rd of the time appending colors. Second, finding the median takes a significant amount of time as well – most of it from the sorted call – this is not visible in the image above. The other lines of code are not significant enough to display.

Optimizing

I have a few quick rules of thumb for optimizing:

  • creating/appending to lists is slow due to memory allocation – try to use list comprehensions
  • try not to run operations that create copies of objects unless it’s required – this is also slow due to memory allocation
  • dereferencing is slow: if you’re doing mylist[i] several times, just do myvar = mylist[i] up front
  • use libraries as much as possible – many are written in C/C++ and fast

Other than that, make use of search like Google or DuckDuckGo. You can tweak your Python code (you might discover you’re doing something wrong!), use Cython, write C libraries, or find another solution.

So profiling tells me that appending is slowing my code. I can get around this problem by declaring the list once and keeping track of how many elements I “add”. I also know that sorted() is not preferred because it creates a copy of the original list. I can instead use list.sort() and sort the list in place. I make these changes and run the code, and see the output is still good so I probably did not break it. Let’s time it.

500 loops, best of 3: 2.51 sec per loop

That’s almost 20% faster! Not bad for a few minutes of effort.

For completeness, here is the modified code:

import cv2
import numpy as np
import math
from datetime import datetime

def _median(list, ne):
    """
    Return the median.
    """

    #sort inplace
    if ne != len(list):
        list = list[0:ne]
    list.sort()
    if len(list) % 2 == 0:
        #have to take avg of middle two
        i = len(list) / 2
        #int() to convert from uint8 to avoid overflow
        return (int(list[i-1]) + int(list[i])) / 2.0
    else:
        #find the middle (remembering that lists start at 0)
        i = len(list) / 2
        return list[i]

def resize_median(img, block_size):
    """
    Take the original image, break it down into blocks of size block_size
    and get the median of each block.
    """

    #figure out how many blocks we'll have in the output
    height, width, depth = img.shape
    num_w = math.ceil(float(width)/block_size)
    num_h = math.ceil(float(height)/block_size)

    #create the output image
    out_img = np.zeros((num_h, num_w, 3), dtype=np.uint8)

    #iterate over the img and get medians
    row_min = 0
    #TODO: optimize this, maybe precalculate height & widths
    num_elems = block_size * block_size
    block_b = [0] * num_elems
    block_g = [0] * num_elems
    block_r = [0] * num_elems
    while row_min < height:
        r_max = row_min + block_size
        if r_max > height:
            r_max = height

        col_min = 0
        new_row = []
        while col_min < width:
            c_max = col_min + block_size
            if c_max > width:
                c_max = width

            #block info:
            num_elems = (r_max-row_min) * (c_max-col_min)
            block_i = 0
            for r_i in xrange(row_min, r_max):
                for c_i in xrange(col_min, c_max):
                    block_b[block_i] = img[r_i, c_i, 0]
                    block_g[block_i] = img[r_i, c_i, 1]
                    block_r[block_i] = img[r_i, c_i, 2]
                    block_i += 1

            #have the block info, sort by L
            median_colors = [
                int(_median(block_b, num_elems)),
                int(_median(block_g, num_elems)),
                int(_median(block_r, num_elems))
            ]

            out_img[int(row_min/block_size), int(col_min/block_size)] = median_colors
            col_min += block_size
        row_min += block_size
    return out_img

def run():
    img = cv2.imread('Brandberg_Massif_Landsat.jpg')
    block_size = 16
    start_time = datetime.now()
    resized = resize_median(img, block_size)
    print "Time: {0}".format(datetime.now() - start_time)
    cv2.imwrite('proc.png', resized)

if __name__ == '__main__':
    run()

Repeat

Now that I optimized the code a little, I repeat the process to make it even better. I try to decide what should be slow, and what should not. So, for example, sorting is not very fast here, but that makes sense. Sorting is the most complex part of this code – the rest is iterating and keeping track of observed colors.

Final Thoughts

After having optimized a few projects I have a few final thoughts and lessons learned:

  • Optimizing a slow/bad algorithm is a waste of time, you’ll need to redesign it anyways
  • The granularity of the profiler is sometimes a little funny. You can get around this by making sections of code into functions – metrics on functions are usually captured.
  • The profile will group the calls to the library functions together even if called from different places. So be careful with metrics for libraries/utils – i.e. numpy.
  • Use search to find ways to optimize your code.

If all else fails, you can implement slow parts in C relatively painlessly. This is the option I usually go with, and I will detail it out with some code in the next few weeks.

Have fun speeding up your code!

Productivity vs Performance

Productivity vs Performance When I was writing software in college, there was more emphasis on program execution speed than on time spent implementing it. In startups and most work environments, the reverse tends to be true. It took me a while to figure this out, and for the first few years of programming, I would often introduce optimizations that were not necessary, or make code uglier than it needed to be for the sake of performance. I’m not talking about premature optimization, I’m talking about poor design decisions stemming from assumption that performance trumps legibility.

I’ve spent a lot of time refactoring poorly written code in Grafpad – code that wasn’t necessarily bad to begin with, but quickly outgrew its initial purpose as more special cases were introduced to it. What gave me even more grief, however, were the special cases I imposed on myself, in an attempt to preserve bandwidth, CPU and memory usage. For example, each shape point in Grafpad consists of 3 items: x-coordinate, y-coordinate, and curvature flag. In original version of Grafpad, shapes that I knew couldn’t have curvatures omitted that curvature flag. As a further optimization, I wrote faster versions of multiple algorithms (edge detection, intersection, bounding box computations) which didn’t have to deal with curved lines. I later ended up regretting that, having realized that I did twice as much work to handle a case that may have been fast enough anyway. I wasted time I could have invested elsewhere, I introduced special case logic that didn’t need to be there.

Another example is the logic I created for transmitting data to server and back. I didn’t like that JSON.stringify included a lot of irrelevant information, which I wouldn’t need since I knew exactly what kind of object I’m sending over. My packing method transmitted only the values themselves, and I was unpacking them correctly by following the same order of operations. Once again, I performed a bunch of work that JSON.stringify could have handled for me, and ended up with a more fragile solution that depended on pack/unpack logic on both front-end and back-end to be identical.

I’m not saying the work I did was pointless, it simply wasn’t the kind of work I needed to do at an early-stage startup. By the time these kinds of optimizations become relevant, the product should already have multiple users and a team of developers who have time to do these optimizations. An early-stage startup should concentrate on getting the product out the door and fixing bugs that affect users, performance issues rarely matter at that stage. And with proper use of polymorphism, those optimizations will be easy to add later on in cases where they do matter.

Why Pyjamas Isn’t a Good Framework for Web Apps

Earlier this week, I stated that Pyjamas no longer seems like a viable solution for Grafpad (or many other web-apps for that matter). In this post, I will explain the flaws with Pyjamas that ultimately made me decide to switch away from it. I’m aware that Pyjamas project is currently getting an overhaul, and I hope that these flaws get addressed in the upcoming Pyjamas releases. Before I go any further into bashing Pyjamas, I want to mention that I’ve been using Pyjamas for several years, writing over 20,000 lines of Python code that runs inside the browser (as well as several Pyjamas wrappers for extending its functionality). I appreciate the problem Pyjamas is trying to solve, and I definitely think it’s a useful tool. Perhaps one day Pyjamas will be good enough for the browser, unfortuantely it has a lot of issues to solve before that’s the case.

Experienced JavaScript developers might already be familiar with many of the points I will bring up. To summarize Pyjamas’ flaws in one sentence, it basically assumes that JavaScript is still a joke of a language it was several years ago and tries to apply outdated solutions that don’t scale well. Today’s JavaScript, however, can run circles around its predecessor, both in terms of performance and functionality. Many innovative design patterns have also been posted for keeping JavaScript code clean and object-oriented. In some ways, JavaScript has even surpassed Python in terms of design, which still lacks proper private variables, for example. So what are some of the big offenders in Pyjamas?

Browser Detection instead of Feature Detection

Many of you are probably familiar with Pyjamas’ compilation scheme. If not, it basically creates multiple versions of the JavaScript code, one for each major browser (IE, Firefox, Safari/Chrome, Opera) and serves the appropriate one depending on your user-agent string. A quick Google search will reveal thousands of pages explaining the problems with this technique (called browser detection), so there is really no point for me to go into much detail here. The first problem with browser detection is that we assume that the user will be using one of the browsers we’re detecting (sorry Konqueror). The second problem is that we’re assuming the user is using one of the versions of this browser that still has the same issues/functionality. I’ve already posted about the changes I had to make in Pyjamas to make it use IE9 properly, which has full canvas support, yet Pyjamas still treats it like IE6 (ironically, IE9 actually behaves more like WebKit than IE6). The third problem is that many browsers spoof the user-agent string, pretending to be a different browser (for various reasons). These browsers may support features that the spoofed browser doesn’t support and vice versa, forcing us to use an unnecessary work-around for a feature that the browser supports natively (just like IE9 being forced to use VML instead of canvas), or preventing the feature from working altogether (imagine if Chrome, with no VML support, spoofed IE6 user-agent string).

Bloat and Boilerplate Hell

If you’ve peeked at Grafpad’s JavaScript, you probably saw 80,000 lines of code in a 3.5MB file. But did you know that the pre-compiled version of Grafpad front-end is only about 8,000 lines of code? We have 10 times the needed code just to pretend like we’re still using Python. What’s worse, most of that code is only there to support obscure Python functionality most of us are never going to use in a web-app anyway. Pyjamas has become the most complete Python framework for the browser, unfortunately it has also become the most bloated one, with most other frameworks (such as py2js) only needing to generate 1.5 lines of JavaScript for each line of Python code. You can see the 80/20 principle at work here, where 20% of Python’s features account for 80% of Pyjamas’ boilerplate. In my opinion, it would make a lot more sense to only support the commonly used features of Python, allowing the user to rewrite the bits that don’t work well for JavaScript. After all, the most tedious things to port between languages are the algorithms, not the object structure.

Debugging

In theory, Pyjamas is much easier to debug than JavaScript. Unlike JavaScript, which either throws vague errors or worse yet, silently fails a block of code and continues execution like nothing happened, Pyjamas throws Pythonic exceptions, which most of the time do a very good job pinpointing the exact line that caused the problem… at least when you run your program through Pyjamas Desktop. The problem is, Pyjamas Desktop has been broken for almost 3 years now, requiring you to either use a 3-year old Linux distribution (last known version to have support for python-hulahop) or rely on WebKit or MSXML implementations, neither of which supports canvas.

Alternatively, you can debug your code directly in the browser. Pyjamas sports a good set of Python exceptions emulated in JavaScript through clever use of try/catch blocks. Unfortunately, this alternative not only lacks proper stack trace, but also the original code (and compiling your Pyjamas app with any of the debug modes doesn’t solve this, regardless of what various outdated posts on the mailing list claim). Needless to say, the errors raised by Pyjamas in the browser are not very useful. If you made an IndexError on line 50 of your code by referencing “object.array[5]“, for example, expect Pyjamas to throw some weird error (that’s right, chances are it won’t even realize it’s an IndexError – or at least won’t report it well, the except blocks seem to work correctly) on line 30,000 of your compiled JavaScript, which will reference $p['getattr'](object, 'array').__getitem__(5), among a bunch of other boilerplate which could have caused an error as a result of an earlier error in your code or a Pyjamas bug. Even when debugging using Pyjamas Desktop, the browser errors can occasionally be inconsistent with normal Python (usually due to a bug in Pyjamas), and it’s a pain to troubleshoot these. And there is really not much Pyjamas can do to remedy this, in my opinion.

Adding additional assertions to catch every possible case to throw Pythonic errors is a fool’s errand no different than trying to parse HTML using regular expressions. Python’s ability to throw relevant assertions stems from its fundamental design. It’s very strict about using non-existing/undefined variables and comparison of irrelevant types. JavaScript, on the other hand, is very lazy/permissive about these, much like Perl. Python is proactive about its assertions, Pyjamas tries to be reactive. It’s unrealistic to forsee every special case that could arise and account for it with an assertion the same way Python would. Even if you manage to do so, you will have added even more code to Pyjamas’ already large chunk of boilerplate (not to mention potential for new Pyjamas bugs). One option is to compile these assertions away when the debug flag isn’t set, but even then you would be doing the exercise of examining all possible errors that Python could throw in each case, plugging in more “reactive” logic to make JavaScript work the same way. Instead, we should make the framework easy to debug in the environment it’s meant to be in. Since we can’t make JavaScript behave like Python, and we can’t do compile-time debugging like we would with C++ or Java, we should make the output easy to understand, so that we can map it back to the original code.

Python is not Java, DOM is not a Desktop

This brings me to my next point. GWT (the original inspiration for Pyjamas) might be more bloated than Pyjamas, but there is something it can do that Pyjamas can’t: compile-time error catching. If it wasn’t for Python being a dynamically-typed language, a lot of my rant in the previous section about debugging would be irrelevant. Additionally, I don’t feel that Pyjamas is approaching the problem from the right angle. Python has the advantage of being much more similar to JavaScript than Java ever will, and a lot of Pyjamas’ wrapper logic wouldn’t even be necessary if Pyjamas didn’t try to pretend to be GWT (in addition to pretending to be Python). GWT was designed to make web development similar to Desktop GUI development, since that’s the background many Java developers come from. What other purpose is there to fake MouseListener and KeyboardListener in an environment that wasn’t designed to need either (KeyboardListener, by the way, is another source of grief for Pyjamas – it’s what makes the keyboard pop-up all over the place on mobile devices, it also attaches a fake input element to the current element, pretending like they’re the same element, adding even more boilerplate and wrappers to the code)? What other purpose is there to build the entire DOM dynamically (which, by the way, is also extremely inefficient)? The browser page was not designed to function the same way your Desktop calculator app does. Anyone who has taken a few minutes to learn how the DOM works probably agrees that it’s actually superior to the old-fashioned Desktop way of writing the GUI. I’m lazy (otherwise I wouldn’t have written my front-end in Python), so when a new technology comes along that clearly makes my life easier, why ignore it?

If it wasn’t for trying to fake a Desktop GUI, Pyjamas wouldn’t need all these wrappers. Most other Python-faking frameworks allow one to invoke JavaScript logic as if it was a regular Python object/function. Pyjamas, on the other hand, requires one to first write a wrapper for Pyjamas Desktop using Python, then for the browser using some limbo version of Python/JavaScript hybrid (where you can’t even access elements of array using standard indexing), and finally rewrite a separate version of your limbo code for each non-compliant browser (definitely IE, and possibly some others). This wrapping might have been necessary in Java, but should not be needed for Python at all, and could have been prevented with better design. But wait, there IS an alternative! You can put raw JavaScript in your code using JS() method and passing it one giant string of JavaScript code. Unfortunately, that chunk of the code will get completely ignored in Pyjamas Desktop (which you’re using to debug your entire app, since the browser debugger is no help at all), and to actually reference anything from this chunk of code in the browser, you will need to reference these variables the same way: “a = JS(‘a’)” (again, don’t expect “a” to get set in Pyjamas Desktop). Oh, and don’t try to modify any of the DOM elements created by Pyjamas from anything other than Pyjamas, you will run into object state sync issues. Pyjamas wraps each DOM element in a Python object, which then stores the element’s state as a set of variables, and assumes it doesn’t change without Pyjamas’ permission. Pyjamas plays well with other JavaScript frameworks… as long as they don’t touch any portion of the DOM Pyjamas uses.

JavaScript has its Strengths

JavaScript might not be the cleanest language, and I still much prefer Python to it. But I must give it credit where credit is due. First of all, it integrates the DOM into itself really well. I can take any DOM element, assign a function to onMouseDown event as if it was a regular JavaScript object, and all of a sudden I got an element that reacts to my mouse clicks. No need for complicated ClickHandlers.

Pyjamas has a lot of abstraction layers, both to hide JavaScript inconsistencies, and make it easier to build widgets. However, native JavaScript libraries, like jQuery, do a much better job at both. Yes, jQuery doesn’t scale well for larger projects, but there are libraries that do, like MooTools (which, by the way, was inspired by Python). But realistically, if you create a simple wrapper for generating classes (or loot one from John Resig’s blog – the same guy who wrote jQuery), even jQuery becomes good enough for creating large projects. Pyjamas, on the other hand, adds so much abstraction, that sometimes I need hacks just to manipulate the DOM. If you look at the DOM of a typical Pyjamas app, you will notice layers of unnecessary elements: images wrapped in divs, wrapped in more divs, placed inside some table that resides inside yet another table. When I try to render my app on a tablet, it often crashes due to the DOM bloat.

Pyjamas also assumes that JavaScript is slow, which was true when the project first started. As a result, it duplicates parts of its boilerplate code to avoid an extra function call (while adding excessive function calls and abstractions in other places). Ironically, JavaScript engines have come a long way, and a lot of Pyjamas’ optimizations are no longer relevant (such as using object["property"] instead of object.property). In fact, a quick paint app I wrote in Pyjamas actually runs faster in Chrome than Pyjamas Desktop. That same app runs faster still when written in pure JavaScript. It’s especially noticeable when using the paint-bucket tool, which works by pixel-scanning and takes a couple seconds in Pyjamas yet almost instantaneous in JavaScript.

Summary

While Pyjamas is the most complete Python emulation in a browser, it has become a very bloated and brittle framework. It doesn’t embrace any portion of JavaScript, nor the DOM, trying to hide them away like some sort of deformed beast. By pretending to be pure Python, it not only puts unrealistic expecations on itself, but also fails to make use of good parts of JavaScript. Instead, Pyjamas embraces a solution designed for a statically-typed language, favoring a GUI structure that should have died a decade ago.

So What’s The Alternative?

I did mention that I am porting Grafpad away from Pyjamas. However, I’m not crazy enough to rewrite the entire project in pure JavaScript. Rewriting all the code in a language with different quirks and troubleshooting differences like division rounding and modulo signage is not my idea of fun. I also still prefer to keep my front-end code interchangeable with the back-end (more or less), which has already provided multiple advantages, such as moving the proprietary clipping and recognition algorithms to the back-end in just a few hours of work. I happen to have another ace up my sleeve. In the next post, I will review multiple alternatives for Pyjamas and explain the solution I’ve chosen.