State of RapydScript

Matrix-agent-Smith-clones Many of you probably noticed that I’ve been slow to contribute to RapydScript lately. It’s hard to summarize all of the reasons, but here are the top few:

  • Despite the occasional bug, the project is in solid state and quite usable in production (I do eat my own dog food by the way).
  • I’ve been working on other projects, most recent one being TileZone.
  • I am one man, and my time doesn’t scale. I wish I could continue improving the language and fix the bugs people report but at the end of the day I don’t get paid for it, and as a result I prioritize my other projects and my job over this. I’m grateful for the contributions from other members, but the community isn’t large enough to encourage others to contribute regularly.
  • With ECMAScript 6 and Babel, the future of RapydScript seems somewhat uncertain. Babel is an awesome library that brings many similar features to RapydScript (although with uglier syntax, in my opinion). I think, however, that RapydScript, if continued, will be more relevant in ES6 world than something like CoffeeScript.
  • As mentioned before, I don’t believe Python is a panacea. It’s one of my favorite languages, but I’ll be the first one to admit that it made quite a few mistakes design-wise. Some issues that come to mind are global-scope collection methods (len, map, filter), lambda functions, inconsistent naming scheme for native objects (why are native classes called int and str when the convention is to use title case for them?), exposing of optimization methods (xrange) to the user when they should really be handled behind the scenes, and so on. As a result, it’s frustrating to see new RapydScript users complain that the language should be more like Python, especially the parts of Python that I hate.

With that said, I love RapydScript, and will continue using it. I prefer it to both, Python and JavaScript. Ironically, I started RapydScript to avoid dealing with JavaScript, but in the process learned to appreciate it more. Some design patterns I use in RapydScript would not be possible in Python. JavaScript is a powerful language, and yet it’s messy. This StackExchange answer to a question about language verbosity does an amazing job explaining what I like about RapydScript. If we were to grade the 4 languages I mentioned on those 2 categories, I would rate them as follows:

Language Signal vs Noise Cryptic vs Clear
JavaScript ⭐⭐⭐⭐⭐
CoffeeScript ⭐⭐⭐⭐⭐
Python ⭐⭐⭐⭐ ⭐⭐⭐⭐
RapydScript ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐

JavaScript is definitely the most verbose of the 4, having to type out all of those prototypes, optional argument checks, array[array.length-1], and brackets gets repetitive very fast. Looking at all that code gets repetitive even faster. As far as CoffeeScript, there is no arguing that it’s terse. Its signal to noise ratio is amazing, every character serves a purpose. The only problem is making sense of all that code later, remembering differences between fat and skinny arrow, and wrapping your head around all those invisible parentheses. CoffeeScript solved JavaScript’s verbosity problem, but introduced a new one. Its cryptic syntax is the reason many JavaScript developers recommend steering clear of it. Python has a good balance of the two, and as a result has gained quite a large following from developers who’re tired of dealing with the visual clutter of other languages. But, to be honest, it’s not as clear as modern JavaScript. Let me show you a few examples:

Python JavaScript Explanation
len(a) a.length Global scope, unnecessary abbreviation, should I say more?
def function Another vague abbreviation, but at least def makes it very clear what type of object we’re defining
lambda function At least they didn’t abbreviate it
*rest ...rest They went out of their way to rename all && and || and then introduced this?
**kw {foo:"bar"} Nothing says dictionary like 2 multiply signs next to each other
','.join(array) array.join(',') Are we joining elements via delimiter or delimiters via elements?
str String Another useless abbreviation that ignores the convention of capitalizing classes
list Array Linked list?
ZeroDivisionError Infinity Sure, it’s mathematically correct, but when I divide by zero, 9 times out of 10 infinity is what I want, I’ll end up assigning 999999 to the number in the catch block anyway

You may argue that Python’s operators are more clear than JavaScript’s ||, &&, but which one do most languages follow? Developers are so used to seeing those operators that it really doesn’t matter. RapydScript does as well as Python on signal, and sometimes a bit better (thanks to closures and ability to inject JS), I gave it an extra point for clarity because it can often leverage the cleaner JS approach, as I usually do in my code (although to be fair JavaScript still inches ahead a bit here).

Going Forward

So where does that leave us? As I said, I love RapydScript, but I don’t have the manpower to continue fixing the bugs or make RapydScript more like Python when I have no need for Python. My backends are all in node.js now. It also seems like Kovid has been more active in enabling Python users to feel at home with his fork, so I’ll let him continue that. At this point the philosophies have diverged somewhat. I disagree with adding extra syntax and special-casing variables for Python, he disagrees with breaking Python rules. At this point, I’m feeling like Python has become a bit of a shackle, there is no reason I should repeat the same mistakes as Guido in the name of backwards compatibility with code that wasn’t even written for the same platform.

My goal is to make RapydScript relevant in ES6 age, and here is how I plan to do that (at my own pace of course, given other projects, but you’re more than welcome to help me out):

  1. Fix low-hanging fruit bugs.
  2. Separate tokenizer from lexer (I’ll explain why in a bit).
  3. Rewrite output module to generate ES6 instead, let Babel handle special cases, not us.
  4. Modify output module to fill in templates rather than handcrafting output code.
  5. Drop pythonisms from core (no more int, str, bool) and embrace native JS types (Number, String, Boolean)
  6. Add optional static typing and ability to infer types, let users leverage this information to create/optimize better templates.
  7. Now that tokenizer is separate from lexer (step 2 accomplished this), add a macro-processing step in-between that can mutate the code as the AST is being formed, this will give RapydScript powers similar to sweet.js, but which can handle whitespace.

After all 7 of those steps are complete, users can enhance the language as they see fit, without having to modify the core. If they want a new operator, or pythonic variable names, it would be as simple as creating a macro. If they want different output/format, it would be as easy as writing an output template. Admittedly, this will probably take me a while. But if anyone wants to help, we can probably get this done much sooner.

Productivity vs Performance

Productivity vs Performance When I was writing software in college, there was more emphasis on program execution speed than on time spent implementing it. In startups and most work environments, the reverse tends to be true. It took me a while to figure this out, and for the first few years of programming, I would often introduce optimizations that were not necessary, or make code uglier than it needed to be for the sake of performance. I’m not talking about premature optimization, I’m talking about poor design decisions stemming from assumption that performance trumps legibility.

I’ve spent a lot of time refactoring poorly written code in Grafpad – code that wasn’t necessarily bad to begin with, but quickly outgrew its initial purpose as more special cases were introduced to it. What gave me even more grief, however, were the special cases I imposed on myself, in an attempt to preserve bandwidth, CPU and memory usage. For example, each shape point in Grafpad consists of 3 items: x-coordinate, y-coordinate, and curvature flag. In original version of Grafpad, shapes that I knew couldn’t have curvatures omitted that curvature flag. As a further optimization, I wrote faster versions of multiple algorithms (edge detection, intersection, bounding box computations) which didn’t have to deal with curved lines. I later ended up regretting that, having realized that I did twice as much work to handle a case that may have been fast enough anyway. I wasted time I could have invested elsewhere, I introduced special case logic that didn’t need to be there.

Another example is the logic I created for transmitting data to server and back. I didn’t like that JSON.stringify included a lot of irrelevant information, which I wouldn’t need since I knew exactly what kind of object I’m sending over. My packing method transmitted only the values themselves, and I was unpacking them correctly by following the same order of operations. Once again, I performed a bunch of work that JSON.stringify could have handled for me, and ended up with a more fragile solution that depended on pack/unpack logic on both front-end and back-end to be identical.

I’m not saying the work I did was pointless, it simply wasn’t the kind of work I needed to do at an early-stage startup. By the time these kinds of optimizations become relevant, the product should already have multiple users and a team of developers who have time to do these optimizations. An early-stage startup should concentrate on getting the product out the door and fixing bugs that affect users, performance issues rarely matter at that stage. And with proper use of polymorphism, those optimizations will be easy to add later on in cases where they do matter.

Python to Javsacript: Compilers vs. Translators

One thing Alex and I always say about RapydScript is that it is really JavaScript with a Pythonic syntax. Following that, people often ask me what that means for them – they want to know how this affects how they develop code, and how it is different than something like Pyjamas/Pyjs. I want to answer that here to for anyone that has wondered what “RapydScript is Pythonic JavaScript” means, and how compilers like Pyjs are different from translators like RapydScript, and why I (full disclosure) prefer translators.

An Example

A clear example of the difference is with division. Say your source looks like:

a = 5
b = 0
c = a / b

The translator will output JavaScript like:

var a = 5;
var b = 0;
var c = a / b;

The translator process is very simple to understand – it’s pretty much just changing the syntax, but this leads to some gotcha’s. When this code runs, c will be set to Infinity, a JavaScript constant, while the original Pythonic source would have raised an Exception.

A compiler, on the other hand, attempts to mimic Python exactly so it may have an output more similar to:

var py_int = function(val){
    this.val = val;
};
py_int.prototype.div = function(denom){
    if (denom.val == 0) {
        throw ZeroDivisionError;
    }
    return py_int(self.val / denom.val);
};
var a = py_int(5);
var b = py_int(0);
var c = a.div(b);

The variables here will all be objects that include methods for all the operations. The division doesn’t directly divide 2 numbers, it runs the divide method in the objects. So when this code runs it will throw a ZeroDivisionError exception just like Python does.

So what are the tradeoffs?

Writing using a compiler is nice because you get to think like a Python developer, which can abstract away some things like cross browser support. It also means that, in many cases, code can be moved between the frontend and backend with no changes. So it’s easy to have Python code compile to Javascript. But if you’re doing something that’s JavaScript specific, like getting HTML elements, taking in keyboard inputs, etc, the compiler you’re using will have to have a working and documented API for accessing these functions.

The real drawback, though, is with the output code is slower, significantly heavier, and, with the compilers I’ve used, unreadable. There are several issues I have with this, but it really boils down unreadable code leads to usless tracebacks when running code in a browser, and apps don’t run (or run well) on mobile devices.

Translators take a very different approach and don’t try to run just like Python. The idea here is to do 80% of the work for 20% of the cost. Your code may look like Python but it will run like JavaScript, as with the division example. There’s a lot of overlap between the two languages, but they’re not exactly the same, so the main drawback here is that you might see some unexpected, but predictable, behavior. This is very easy though if you know JavaScript, and if not, it’s easy to learn the differences.

There are some nice benefits to using a translator. Your input code and output code will look very similar. They will be roughly the same size and it will be easy to map a line with an error in the JavaScript output back to the Pythonic input for easy debugging. The output will be on the order of kB instead of MB and will run faster.

So those are the main differences between compilers and translators like RapydScript. RapydScript is really JavaScript behind the scenes so it will behave differently than Python, which lets it run a lot more efficiently.

Which is right for you?

The choice of which to use comes down to a few things:

  • First, something I have not mentioned, libraries. In general, JavaScript libraries work better with translators and Python libraries work better with compilers. The one caveat is compilers can only translate Pure Python, so if you’re using something like numpy, which uses C, there’s no easy answer for you.
  • Second, if you don’t know any JavaScript, you will have a tougher time with a translator. Speaking from experience though, JavaScript is not very different from Python, and I encourage you to try a translator because you’ll save time debugging, and your app will be more maintainable in the long term.
  • Lastly is performance requirements. If performance is important, you will want to use a translator over a compiler.

I think it makes sense for a beginner who may writing a simple internal app that won’t have any performance requirements to use a compiler. But if you’ll be writing many apps, or even a complex one, I would go with a translator. Having picked up the differences between JavaScript and Python, I now exclusively use RapydScript, a translator, even if I don’t need the performance. It’s easier to debug in the browser where it actually runs, which saves me time.

Implicit Logic Is Not Your Friend

When creating RapydML and RapydScript, I had to make quite a few design choices – similar design choices other developers make when coming up with a new language, or even an API. For inspiration, I’ve looked into Python, existing JavaScript abstraction languages like CoffeeScript, and even JavaScript itself. While doing so, I’ve noticed a few features in CoffeeScript and related languages that should never have been borrowed from Ruby, and that Ruby in turn should never have borrowed from Perl. Most of these features relate to implicit logic, where the compiler makes assumptions for you. While they seem like nice shortcuts at first, more often than not, they harm your productivity more than they help. In fact, they’re not shortcuts at all, but rather branching paths in a maze that often lead to a dead end.

You’ve probably already been bitten by a few of these implicit “shortcuts” in the past, such as JavaScript’s “optional” semi-colons. If this feature didn’t exist, the compiler would complain about the missing semi-colon as soon as the page loads, and you would be able to fix the bug right away. But since it’s a “feature”, JavaScript tries to guess where to insert the semi-colon for you. As a rule of thumb, whenever you have the compiler guessing anything, you’re asking for trouble. You’ve probably already seen an example bug resulting from this logic, something along the lines of:

return
    {
        font: 'Verdana',
        size: 10,
        type: ['italic', 'bold']
    };

The intent here was to return the object literal, instead JavaScript assumes a semi-colon at the end of the return statement and returns nothing. While I would disagree with such alignment of return statement anyway, I can definitely understand the frustration a programmer writing this would go through. An easy solution would be to move the bracket to the same line as the return statement, but a novice programmer unaware of this trying to follow a simple code convention that says curly brackets must have the same indentation as a matching bracket will likely let this one slip through the cracks.

As you can see, implicit semi-colons prevented an easy-to-find bug we could have fixed at compile time at the cost of a more annoying one that we won’t find until several hours of debugging later. Some might argue that this is an easy bug to prevent if the programmer knows the language, but the truth is most bugs are easy to prevent if you design your code conventions around them. All code conventions do is train the eye to notice errors, in this case JavaScript does the reverse. In most languages it’s either the semi-colon or the newline that finalizes a statement, your eye is trained to look for them. In JavaScript, it’s the semi-colon, unless there is a newline, unless the statement is incomplete. Your eye can’t do that kind of logic, and your brain should be scanning for more serious bugs. This is a common trend I noticed with implicit logic, it prevents easily-detectable bugs at the expense of more devious ones later on.

Let’s look at a few more examples. CoffeeScript introduced optional parentheses (like Ruby and Perl). At first it seems like a cool feature, the code has less clutter in it and we save a character. The problems start occurring when we wrap function calls, or even use multiple arguments. For example, let’s say you’ve written some code and a few weeks later noticed a bug. You traced the bug to this line:

a b,c d

Without additional context, you have no way of telling what the bug is by glancing at this line, or even what the line is trying to do. Was d supposed to be a third argument to a and you accidentally omitted the comma? Was the comma placed there in error and b is a method that was supposed to take c(d) as an argument? Was the comma supposed to be between c and d instead? Had you used parentheses, the error would immediately be obvious without looking at the definitions of these variables. In fact, you probably wouldn’t have made it in the first place.

Sure, this example uses poor variable names, but if you’ve been developing for a while, you’ve probably noticed that unless there are strict code conventions, many projects’ variable names aren’t much better. And even if you do use good naming conventions, you’re not immune from this. Imagine if the line you were debugging looked like this instead:

my_function MyClass ['item']

Was the intent here to pass a new instance of MyClass (whose constructor was initialized using an array consisting of 1 string) or to pass the item attribute of My_Class? LiveScript takes this “feature” a step further, making commas implicit as well for non-callable arguments (strings, numbers, arrays), making things even more ambiguous. Take a look at the following line of valid LiveScript, and try to figure out who’s calling who with what arguments:

a b c 1 [d 2] 3 e [f 'g' h] 4 i [j 5] k 'l' m

This is great for code golf and maybe riddles, but I definitely don’t want to see this kind of code in my project.

Shall we continue with more examples? How about implicit returns. Automatically returning last-performed operation of a function seems like a great idea, because we can’t be bothered with putting 6 extra characters at the bottom of our function to signify a proper return. Too bad you (or another developer) could miss the subtle returns when modifying the function later.

For example, let’s imagine you have a function with an implicit return whose return value is used by another function. Several months later you notice a bug due to the function not resetting some global setting or a setting in the class it belongs to. Being a busy guy, you delegate this task to another developer. Sure enough, he goes and fixes the bug by setting that global/class setting correctly at the end of the function. Too bad he forgot to check that another function was using this function’s return value. If you’re lucky, the code will break as soon as it runs, developer will notice his error and fix it before submitting the change. If you’re not lucky, the affected logic won’t get triggered during the test (not all tests have 100% coverage), developer will submit broken code and you will pat him on the back for doing a good job.

Even if you’re perfect, and never make mistakes, code is rarely developed in isolation. It’s in your interest to make code easy to understand to other developers, not just yourself. But if you’re like the rest of us, mortals, you will probably break your own code if you have to deal with it several months later. As another example, let’s imagine you have a long function with the following format (assuming implicit returns):

def fun(args):
    ...
    some_var = ...
    ...
    if SOME_GLOBAL_VAR == True:
        if some_var:
            ...
        else:
            ...
    else
        ...

Let’s also imagine that you’re calling it from multiple places, one of which uses its return for doing additional computations. Let’s also imagine that you’ve modified the logic in one of the other places calling this function (that previously didn’t need the return value, and that always sets SOME_GLOBAL_VAR to True before calling fun()) such that it now needs to know if some_var got set or not. “No problem” you decide to yourself, slapping “return some_var” at the end of the outer “if” block, breaking the implicit return that one of the other functions was expecting.

There are countless other examples of implicit logic in languages that seemed like a great idea at first, but with time proved to do more harm than good. Some examples are:

  • JavaScript/Perl functions automatically discarding extra arguments
  • JavaScript/Perl functions automatically setting missing arguments to undefined
  • JavaScript implicitly converting operand types when using + operator
  • JavaScript implicitly converting unrelated types when using ==
  • JavaScript/C++ making brackets optional for single-line conditional statements
  • Switch statements without break in JavaScript/C++ automatically falling through to next case
  • Object attributes defaulting to public in Python
  • JavaScript assuming global scope when var isn’t used

There are very few cases when implicit logic doesn’t cause confusion. A couple that come to mind are tuple packing/unpacking in Python and implicit boolean typecasting in many languages’ if statements without having to say == True. As a rule of thumb, if you’re asking yourself whether you should make something implicit, you probably should not.

To summarize, here are all the reasons why implicit anything is bad:

  • It saves time when writing the code at the expense of time spent debugging it
  • It makes code more ambiguous to other developers as well as yourself in the future
  • In cases when it relies on compiler inferring your intent, it can be inferred incorrectly (or rather your assumptions about how it will be interpreted could be incorrect)
  • It makes the code depend on nearby context, increasing the likelihood that something will break when you add more logic
  • It hides some of the logic from untrained eye, increasing the likelihood that something will break when you add more logic and you won’t notice it
  • It hides some of the logic from untrained eye, increasing the likelihood that something will be lost in translation when refactoring the code, or rewriting it in a different language

Even if you never make mistakes, you probably have other developers on the team. It’s in your interest to make the code clear to them, not just yourself. You want to decrease ambiguity, and implicit logic does the opposite.