Imports, Namespaces, and References

References I had a decent idea of how importing and namespaces worked, but it wasn’t until I started setting up unit tests that everything clicked. False assumptions I had about namespaces and imports caused test failures. It forced me to do things the right way, so that I was changing the right objects instead of creating objects in random places. I learned how most things are references, and when you think of variables that way, namespaces and imports make a lot of sense.

Some basics

When you write a module, you start defining things: functions, variables, classes, etc.. When you define something, Python creates an object in memory. The variable name you use is how you refer to that chunk of memory. Everything is references behind the scenes.

A namespace is a container for these references. It maps variable names to addresses in memory. Any module you write has its own namespace. When you define variable mylist in your module, you are defining mylist in your module’s namespace. Variables defined in other modules, or I should say in other modules’ namespaces, are not accessible in your module – unless you import them.

You will work with open source libraries, and to unlock their power you need to access their functions and classes which are in the library’s namespace. You gain access though importing. At a high level Python imports modules by doing the following:

  1. See if the module name is in sys.modules – this like a cache of imported modules.
  2. If not, look for the file that matches the module name
  3. Run the module being imported. This setups up all global variables, including variables for classes & functions
  4. Put the loaded module into sys.modules
  5. Setup references so you can access imports – where the fun happens.

I want to explain these in a little more detail as well as how they affect modifying variables inside of an import. I will go through the steps out of order because step 1, loading from the cache, doesn’t mean much except after step 4, putting a module into the cache.

Step 2: Finding what to import

Finding the modules is pretty useful and interesting, but does not have a lot to do with references so I am skipping the details here. Also, the Python docs outline this clearly and succulently:

https://docs.python.org/3.3/tutorial/modules.html#the-module-search-path

Step 3: Executing Code

After finding the file, Python loads and executes it. To give a concrete example, I have 2 simple modules:

cars.py

car_list = [1]

def myfun():
    return car_list

print "Importing..."

dealer.py

import cars

If you run cardealer.py, you see the following:

> python cardealer.py
Importing...

So by loading and executing I mean Python runs everything at the top level. That includes setting variables, defining functions, and running any other code, such as the print statement in cars.

It also does this in each module’s own namespace. Defining car_list in cars basically adds a pointer in cars to this new list. It does not affect dealer, which could have its own, completely independent variable named car_list. After the import is done, dealer will also not have direct access to car_list. cars has the reference/pointer to that object in memory, and the only way dealer can access the list is to go through cars: cars.car_list.

Step 4 & Step 1

Caching takes place after loading the module. Python stores the module in a module class object, which it puts into the sys.modules dictionary. The key is usually the name of the module, with 1 exception. If the module is the the one you are directly running, its key is ‘main‘. (Aside: this is why we use if name == ‘main‘).

What’s cool is that you can access modules through sys.modules. So back to the simple module. After importing it, you can access the list as cars.car_list. You can also access it through sys.modules:

>>> import sys
>>> sys.modules['cars'].car_list
[1]

The caching in sys.modules has many effects. Importing a cached module means it doesn’t get executed again. Reimporting something is cheap! It also means that all the classes, functions, and variables defined at the top level are only defined once, and statements only execute once. The print statement in cars won’t print if it is imported again, or imported anywhere else in the same process.

Step 5: Reference Setup

This is where the most interesting stuff happens. Depending on how you import something, references are setup in your module’s namespace to let you access what you imported. There are a few variations, and I have some code that shows what Python effectively does. Remember, the modules are already loaded into sys.modules at this point.

  1. import cars

    cars= sys.modules['cars']
  2. import cars as vehicles

    vehicles= sys.modules['cars']
  3. from cars import car_list

    car_list = sys.modules['cars'].car_list

No matter how you import something, code from a module always runs in its own namespace. So when cars looks up a variable, it looks it up in cars’s own namespace, not who imported.

Mutability

Imported variables behave is the same way as regular variables, which depends on mutability.

Remember, when Python creates an object, it puts it somewhere in memory. For mutable objects (dict, list, most other objects), when you make a change to that object, you change the value in memory. For immutable objects (int, float, str, tuple), you can never change the value in memory. For immutable objects you can reassign the variable to point to a new value, but you cannot change the original value in memory.

To illustrate this I can use the id function which prints an address in memory for a variable. I modify the value of an int and a list.

Immutable:

>>> a = 1
>>> id(a)
140619792058920
>>> a += 1
>>> id(a)
140619792058896

Mutable

>>> b = [1]
>>> id(b)
4444200832
>>> b += [2]
>>> id(b)
4444200832

When I change the value of a, the immutable integer, the id changes. But when I do the same for a list, the id remains the same.

There is a sneaky catch though! Many times when you assign a variable, even a mutable one, you change the reference.

>>> b = [1]
>>> id(b)
4320637944
>>> b = b + [2]
>>> id(b)
4320647936

Behind the scene, Python is creating a new list, and changing where the b reference is pointing. The difference can be subtle between this case and the += case above. I wouldn’t worry about it much, it’s never affected me, but it’s good to be aware.

Modifying Imports

So how can I change values in imported modules? And how does mutability affect imports?

The way you access, and sometimes change, variables in imported modules may, or may not, have an effect on the module. This is where unit tests taught me a lot, since my tests imported modules and tried to set variables in order to setup certain test cases. I learned it all comes down to references. There are a few key rules.

Modules execute their code in their own namespace

When a function in cars tries to use car_list, it looks up car_list in cars namespace. That means:

The module is only affected if:

  • you update the object data behind that reference, or
  • **you change where the reference points in the module’s namespace **

There are some simple ways to update what’s behind the reference – which works no matter how you import. Let’s run this code:

tester_1.py

import cars
from cars import car_list, myfun
# update data through the cars module
cars.car_list.append(2)
print "updated through cars:", cars.myfun()
# update through my namespace
car_list.append(3)
print "updated imported var:", myfun()

This outputs:

> python tester_1.py
Importing...
updated through cars: [1, 2]
updated imported var: [1, 2, 3]

It’s trickier to change cars’s reference, which does depends on how you import:

tester_2.py

import cars
from cars import car_list, myfun
# change referece through the cars module
cars.car_list = [2]
print "changed cars reference    :", cars.myfun()
# the imported variable still points to the original value
print "test original import      :", cars_list
# changing the imported variable will not affect the "real" value
# so this should print the same as the first call
car_list = [3]
print "changed imported reference:", myfun()

This outputs:

> python tester_2.py
changed cars reference    : [2]
test original import      : [1]
changed imported reference: [2]

In both cases, we are creating a new list, and changing what the variables are referencing. But remember, myfun is in cars, and gets variables from the cars namespace. In the first case we are going to the cars namespace, and setting its car_list to this new list. When myfun runs, it gets the reference we just updated and returns our new list.

When we from cars import car_list we create a car_list in the tester_2 namespace which points to the same list cars has. When cars’s reference is set to [2], tester_2′s reference stays unchanged, which is why it keeps the value [1]. When we update car_list in tester_2′s namespace, it only changes where tester_2 is pointing. It is not changing where cars points (and not changing the value in memory where cars points), so it has no effect on how the cars module runs. So when we check myfun, it returns [2].

Final thoughts

When you run into a problem where you set a variable in one module, then access it though another, knowing what’s going on behind the scenes helps make sense of it all. With imports it’s easy to figure out what’s going on when you realize that variables are pointers to objects in memory. If your code is not doing what you expect when setting variables, then something is pointing to the wrong object. Knowing how Python works behind the scenes with namespaces and imports will help you figure out what’s going on. It’s really not bad once you realize it’s all just references.