Modifying Native JavaScript Objects

Random Number Generator There are still debates in JavaScript community about modifying native JavaScript objects (Array, Number, String, Object). Some developers believe it’s evil, while other encourage it. If you’ve looked at RapydScript’s stdlib, you probably already know my stance on it – I’m actually in favor of such modifications when they make sense. I might not have as much experience in JavaScript as the gurus, but I have enough experience in language design to form sane opinions about which practices are evil and which are not.

Most arguments against modifying the native objects hold no water. They give examples where the developer overrides a native method to work differently from original implementation. That is indeed evil, assuming that calling this method with the same arguments as original JavaScript implementation no longer produces the same result. For example, if I decided to rewrite String.prototype.replace such that it worked globally, like in most other languages, instead of on a first occurrence:

String.prototype.replace = function(orig, sub) {
    return this.split(orig).join(sub);
}

I could break (or make them behave differently than they should) other libraries/widgets on the same page that assume that replace() will only replace the first occurrence. I completely agree that one should never override existing methods to do something else. It’s important that the basic subset of the JavaScript language works exactly as others would expect it to. If you can’t guarantee that the foundation of your house stays level, you can’t guarantee that your house will not collapse.

There are some gray area cases, where the developer could extend the functionality of existing method, such that it still works as expected given original arguments, but does something else when more arguments are given:

Array.prototype.pop = function(index) {
    if (!arguments.length) {
        index = this.length - 1;
    }
    return this.splice(index, 1)[0];
}

In this case, myArray.pop() will still work exactly as the user would expect. However, if called with myArray.pop(0), the function will behave like myArray.shift(), or like splice() method if called with an index in the middle. The only real disadvantage here is that by overriding the native method, the developer has made the function slower than the original (native methods tend to work faster). Claiming that this is bad because a function could break another logic that mistakenly called it with an argument (which it ignored before) on the other hand is not a legitimate argument. The bug is in the function making the bad call, not in the overriding logic. This is why I like Python, it will complain about unexpected arguments right away instead of ignoring them so they could become a bigger problem later.

Appending your own methods, on the other hand, is the most benign way to modify a native object. This is when we implement something like this, for example:

Array.prototype.copy = function() {
    return this.slice();
}

When doing, so, however, it’s important to be aware if other libraries are appending a method with the same name but different functionality, then you might want to pick a different name (this is usually not a problem with libraries/APIs since you’re unlikely to need more than one library for native object manipulation). So far, the most popular argument I’ve seen against this type of native object modification is potential name collision if future version of JavaScript decides to add a method by the same name. This argument is moot. You’re not developing your app for a hypothetical language that will exist 10 years from now, and when the time comes you’ll easily be able to rename the offending function, since you know all references to it are your own. The entire EcmaScript specification is easily available, and unless you plan to drop support for all browsers older than 6 months, the JavaScript implementation you’ll be using as basis will probably lag behind that by a few months to a few years, giving you more than enough time to handle naming collisions.

You might notice, however, that if you start appending methods to the native Object object, any jQuery code running on the same page will break. The problem is not with overwriting native methods, but with poor assumptions made by jQuery developers who wrote buggy code. The problem is that jQuery developers are part of the group that believe that appending to native JavaScript objects is evil, and instead of properly testing that they’re iterating through object’s own properties, they make the assumption that no one else will append anything to Object when they scan through it using “for key in Object” type logic. To avoid the same mistake as jQuery made, make sure to iterate only through your own keys:

for (var key in obj) {
    if obj.hasOwnProperty(key) {
        ...
    }
}

Alternatively, if you only care about the latest browsers, you can use Object.getOwnPropertyNames() instead. It was a poor design on JavaScript’s part to default to iterating through every property of the object, but that can’t be changed now. jQuery developers have been confronted about their assumption before, they claim the main reason was performance. Independent tests showed about 5% performance hit from adding this check, so I don’t buy the performance claim, especially since jQuery already makes multiple performance sacrifices in the name of usability (show/hide logic having safety checks to figure out the object’s current state, for example). In my opinion, John Resig’s stance on this is no different than failing to do a “division by zero” check (in Python, because JavaScript will automatically return infinity) and then claiming that it’s for performance reasons and anyone who passes arguments that eventually result in a division by zero is the one at fault.

I haven’t checked if this is fixed in jQuery 2.0. Without support for older browsers there is no reason not to use Object.getOwnPropertyNames(). In the meantime, try not to overwrite Object (other native objects are fine) if you’re using jQuery anywhere at all (if you’re not, it’s not a problem). I should also mention that if you don’t plan to support older browsers (such as IE8), a better practice would be to use defineProperty method rather than appending to the prototype, since it will omit the method from getting picked up when iterating through the object, making the jQuery bug a non-issue.

As far as my own stance, it’s fine to append to the functionality of a native object, but never to remove. Modifying existing property to work differently than before, given the same function signature, counts as removal, and is evil as well. I have removed the dependence on overwriting native methods in RapydScript’s stdlib2, but that was mainly to clean up the standard library and make it compatible with jQuery, not because I’m against modifying native objects.

Implicit Logic Is Not Your Friend

When creating RapydML and RapydScript, I had to make quite a few design choices – similar design choices other developers make when coming up with a new language, or even an API. For inspiration, I’ve looked into Python, existing JavaScript abstraction languages like CoffeeScript, and even JavaScript itself. While doing so, I’ve noticed a few features in CoffeeScript and related languages that should never have been borrowed from Ruby, and that Ruby in turn should never have borrowed from Perl. Most of these features relate to implicit logic, where the compiler makes assumptions for you. While they seem like nice shortcuts at first, more often than not, they harm your productivity more than they help. In fact, they’re not shortcuts at all, but rather branching paths in a maze that often lead to a dead end.

You’ve probably already been bitten by a few of these implicit “shortcuts” in the past, such as JavaScript’s “optional” semi-colons. If this feature didn’t exist, the compiler would complain about the missing semi-colon as soon as the page loads, and you would be able to fix the bug right away. But since it’s a “feature”, JavaScript tries to guess where to insert the semi-colon for you. As a rule of thumb, whenever you have the compiler guessing anything, you’re asking for trouble. You’ve probably already seen an example bug resulting from this logic, something along the lines of:

return
    {
        font: 'Verdana',
        size: 10,
        type: ['italic', 'bold']
    };

The intent here was to return the object literal, instead JavaScript assumes a semi-colon at the end of the return statement and returns nothing. While I would disagree with such alignment of return statement anyway, I can definitely understand the frustration a programmer writing this would go through. An easy solution would be to move the bracket to the same line as the return statement, but a novice programmer unaware of this trying to follow a simple code convention that says curly brackets must have the same indentation as a matching bracket will likely let this one slip through the cracks.

As you can see, implicit semi-colons prevented an easy-to-find bug we could have fixed at compile time at the cost of a more annoying one that we won’t find until several hours of debugging later. Some might argue that this is an easy bug to prevent if the programmer knows the language, but the truth is most bugs are easy to prevent if you design your code conventions around them. All code conventions do is train the eye to notice errors, in this case JavaScript does the reverse. In most languages it’s either the semi-colon or the newline that finalizes a statement, your eye is trained to look for them. In JavaScript, it’s the semi-colon, unless there is a newline, unless the statement is incomplete. Your eye can’t do that kind of logic, and your brain should be scanning for more serious bugs. This is a common trend I noticed with implicit logic, it prevents easily-detectable bugs at the expense of more devious ones later on.

Let’s look at a few more examples. CoffeeScript introduced optional parentheses (like Ruby and Perl). At first it seems like a cool feature, the code has less clutter in it and we save a character. The problems start occurring when we wrap function calls, or even use multiple arguments. For example, let’s say you’ve written some code and a few weeks later noticed a bug. You traced the bug to this line:

a b,c d

Without additional context, you have no way of telling what the bug is by glancing at this line, or even what the line is trying to do. Was d supposed to be a third argument to a and you accidentally omitted the comma? Was the comma placed there in error and b is a method that was supposed to take c(d) as an argument? Was the comma supposed to be between c and d instead? Had you used parentheses, the error would immediately be obvious without looking at the definitions of these variables. In fact, you probably wouldn’t have made it in the first place.

Sure, this example uses poor variable names, but if you’ve been developing for a while, you’ve probably noticed that unless there are strict code conventions, many projects’ variable names aren’t much better. And even if you do use good naming conventions, you’re not immune from this. Imagine if the line you were debugging looked like this instead:

my_function MyClass ['item']

Was the intent here to pass a new instance of MyClass (whose constructor was initialized using an array consisting of 1 string) or to pass the item attribute of My_Class? LiveScript takes this “feature” a step further, making commas implicit as well for non-callable arguments (strings, numbers, arrays), making things even more ambiguous. Take a look at the following line of valid LiveScript, and try to figure out who’s calling who with what arguments:

a b c 1 [d 2] 3 e [f 'g' h] 4 i [j 5] k 'l' m

This is great for code golf and maybe riddles, but I definitely don’t want to see this kind of code in my project.

Shall we continue with more examples? How about implicit returns. Automatically returning last-performed operation of a function seems like a great idea, because we can’t be bothered with putting 6 extra characters at the bottom of our function to signify a proper return. Too bad you (or another developer) could miss the subtle returns when modifying the function later.

For example, let’s imagine you have a function with an implicit return whose return value is used by another function. Several months later you notice a bug due to the function not resetting some global setting or a setting in the class it belongs to. Being a busy guy, you delegate this task to another developer. Sure enough, he goes and fixes the bug by setting that global/class setting correctly at the end of the function. Too bad he forgot to check that another function was using this function’s return value. If you’re lucky, the code will break as soon as it runs, developer will notice his error and fix it before submitting the change. If you’re not lucky, the affected logic won’t get triggered during the test (not all tests have 100% coverage), developer will submit broken code and you will pat him on the back for doing a good job.

Even if you’re perfect, and never make mistakes, code is rarely developed in isolation. It’s in your interest to make code easy to understand to other developers, not just yourself. But if you’re like the rest of us, mortals, you will probably break your own code if you have to deal with it several months later. As another example, let’s imagine you have a long function with the following format (assuming implicit returns):

def fun(args):
    ...
    some_var = ...
    ...
    if SOME_GLOBAL_VAR == True:
        if some_var:
            ...
        else:
            ...
    else
        ...

Let’s also imagine that you’re calling it from multiple places, one of which uses its return for doing additional computations. Let’s also imagine that you’ve modified the logic in one of the other places calling this function (that previously didn’t need the return value, and that always sets SOME_GLOBAL_VAR to True before calling fun()) such that it now needs to know if some_var got set or not. “No problem” you decide to yourself, slapping “return some_var” at the end of the outer “if” block, breaking the implicit return that one of the other functions was expecting.

There are countless other examples of implicit logic in languages that seemed like a great idea at first, but with time proved to do more harm than good. Some examples are:

  • JavaScript/Perl functions automatically discarding extra arguments
  • JavaScript/Perl functions automatically setting missing arguments to undefined
  • JavaScript implicitly converting operand types when using + operator
  • JavaScript implicitly converting unrelated types when using ==
  • JavaScript/C++ making brackets optional for single-line conditional statements
  • Switch statements without break in JavaScript/C++ automatically falling through to next case
  • Object attributes defaulting to public in Python
  • JavaScript assuming global scope when var isn’t used

There are very few cases when implicit logic doesn’t cause confusion. A couple that come to mind are tuple packing/unpacking in Python and implicit boolean typecasting in many languages’ if statements without having to say == True. As a rule of thumb, if you’re asking yourself whether you should make something implicit, you probably should not.

To summarize, here are all the reasons why implicit anything is bad:

  • It saves time when writing the code at the expense of time spent debugging it
  • It makes code more ambiguous to other developers as well as yourself in the future
  • In cases when it relies on compiler inferring your intent, it can be inferred incorrectly (or rather your assumptions about how it will be interpreted could be incorrect)
  • It makes the code depend on nearby context, increasing the likelihood that something will break when you add more logic
  • It hides some of the logic from untrained eye, increasing the likelihood that something will break when you add more logic and you won’t notice it
  • It hides some of the logic from untrained eye, increasing the likelihood that something will be lost in translation when refactoring the code, or rewriting it in a different language

Even if you never make mistakes, you probably have other developers on the team. It’s in your interest to make the code clear to them, not just yourself. You want to decrease ambiguity, and implicit logic does the opposite.