Python modules, how do they work?
TRIBON 2011/10/16 21:15 -
As we start writing larger Python programs, the amount of names of our variables, functions, classes, etc. grows so, that it becomes necessary to organise them into some categories or subsets, commonly called namespaces. The following language structures offer such facilities:
In this article we will not talk about classes, as they deserve a separate discussion. We will try, however, to describe the functionality offered by Python modules and packages, paying special attention to the module importing mechanism. I hope that this will help the Python programmers to understand better the usage of such useful packages, as wxPython or NumPy.
This article has been inspired by questions, which I was asked by the users of the shipbuilding system Tribon M3, made by AVEVA Solutions Ltd, where the embedded Python 2.3 interpreter with more than 600 specialised functions is used as the customisation tool. The functionality discussed here is, however, a feature of the Python language in general, valid not only for Python 2.3, but also for the newer releases (2.4 and 2.5).
2.1 What are they?
In short, modules are just collections of definitions of variables, functions, and classes. They also can contain some directly executable code, but we will discuss this feature later. In general, we can distinguish three types of modules:
- Built-in modules
They are always accessible, as they are built into the Python interpreter in use. Examples: sys, math, time.
The names of the built-in modules are kept in a list sys.builtin_module_names. The following code prints out the names of all built-in modules available in the running implementation of the Python interpreter:
for name in sys.builtin_module_names:
- Standard modules written in Python
Here we need to have either the source code of the module (written in the Python language), or at least a special binary version (file extension .pyc), containing the precompiled bytecodes of the module. To be available for the program, this file needs to be located in one of the folders defined in the list sys.path. Examples: os, re, csv, types.
- Modules written in other programming languages (e.g. C), and compiled into a DLL
The use of fully compiled programming languages improves a performance over a Python-only solution, and allows to use even a very low-level API calls. In order to make the DLL available as a module to the Python interpreter, the code must follow certain conventions described in details in the manual 'Extending and Embedding the Python Interpreter' written by Guido van Rossum, the author of the Python language.
By convention, the compiled DLL file has the extension .pyd, not .dll, and should be located in one of the folders defined in the list sys.path, as any other standard Python module. Examples: _tkinter, _sre.
2.2 How they are used?
In order to use a definition from any module, the module must be imported first. This is done through the import statement, which can use any of the following syntax variants:
· import moduleName
· import moduleName as alternateName
· from moduleName import *
· from moduleName import id1, id2, moduleName
· import moduleName as alternateName
· from moduleName import *
· from moduleName import id1, id2, …
· from moduleName import id1 as altName1, id2 as altName2, …
The general workflow of the module import is presented below:
1. First, the interpreter searches for the given module name in the dictionary sys.modules, which contains the names of the modules that have been so far loaded and initialised. If the module is found, the system proceeds directly to step 3, skipping the step 2.
2. If the module has not been loaded yet, the system first searches the list of built-in modules, and if unsuccessful, continues by searching for the matching file in the folders listed in the sys.path list. If the module is found, it is loaded, and initialised. Finally, if there was no error during the module loading or initialising, it is registered in the sys.modules dictionary. If the module is not found, the ImportError exception is raised.
Note, that if a module has been found in the sys.modules dictionary, any subsequent 'import' statements trying to import the same module WILL NOT initialise the module again!
3. At this stage, the activities performed by the Python interpreter depend on the syntax variant of the import statement that has been used.
A. import moduleName
The name of the module is bound in the local namespace. All identifiers defined in the module become available to the program by using the names qualified with the module's name prefix (see section 2.7).
res = myModule.doIt(arg)
B. import moduleName as alternateName
The alternate name of the module is bound in the local namespace. All identifiers defined in the module become available to the program by using the names qualified with the module's alternate name prefix (see section 2.7).
#Define the module 'utils' by importing either
#winUtils or linuxUtils depending on the platform
if sys.platform == 'win32':
import winUtils as utils
import linuxUtils as utils
#platform-independent module access
user = utils.getUser()
C. from moduleName import *
Here, the name of the module is not bound in the local namespace, but instead the system searches the namespace of the loaded module, and bounds to the local namespace the names of all PUBLIC identifiers in the module.
How Python determines, which identifiers in a module are PUBLIC, and which are not? You will find an answer to this question in section 2.3.
D. from moduleName import id1, id2,…
from moduleName import id1 as altName1, id2 as altName1, …
Here, the name of the module is also not bound in the local namespace, but the system searches the namespace of the loaded module, and bounds to the local namespace the names of the LISTED identifiers (or their alternate names).
Of course, if some of the listed names are not found, an ImportError exception is raised.
Note, that when using the syntaxes C or D, there is a danger of rebinding some existing names to the imported items, effectively loosing access to the previously bound items.
doIt = True
from myModule import *
print "doIt =", doIt
The above code simply assumes that the identifier doIt still refers to a Boolean variable defined in the first line. We are in trouble, however, if the module myModule defines a public identifier with the same name. Then, our Boolean variable is rebound (overwritten) with the definition of the identifier imported from the module myModule. When we try later to print out the original variable, we find out, that doIt is no longer a Boolean variable, but e.g. a function or a string, and we have effectively lost the access to the original variable (it might even have been garbage-collected!).
Of course, it would not happen, if we used a different syntax of the import statement:
doIt = True
print "doIt =", doIt
Here, the identifier doIt still refers to the Boolean variable defined in the first line: doIt, and myModule.doIt are simply two different objects!
Of course, a good naming convention for identifiers in our modules would also minimise the risk of such name conflict.
2.3 Public identifiers
In the previous section we have mentioned, that the statement:
from myModule import *
imports all public identifiers from the given module. How Python recognises, which names are public, and which are not?
First, Python checks, if the imported module defines the global variable __all__. If it does, this variable should be a sequence of strings, defining the names of the identifiers, which are considered public. This feature of the language allows to prevent certain names from being imported, especially those, that are private to the module.
You can imagine having a large module, containing many functions, but only one main function, that is called from the other modules. The other functions are then just the auxiliary functions called internally within the module only. This is an ideal situation to use the __all__ variable! Example:
The module: myModule.py
def fun1(): #Private function
def main(): #Main (public) function
__all__ = ['main']
Now, we can try to execute the following commands:
from myModule import *
res = main()
No problem here … Let's try something else:
res = fun1()
Oops! The NameError exception is raised! What has happened?
Even though the function fun1 and main both exist in the imported module, only one of them is really accessible. The exception is raised, because the name 'fun1' does not appear in the __all__ list, causing this function to be ignored by the import statement.
If the __all__ variable does not exist in the module, all names in the module's namespace that do not start with an underscore ('_') are considered public.
2.4 Module initialisation
If Python needs to initialise a module, it does it either by running a special module initialisation function (for modules not written in Python language), or by executing the module's body (modules written in Python). In the latter case, all the module's definitions are parsed, and the directly executable code is executed.
As already said in section 2.2, a module is initialised only once – when it is not found in the sys.modules dictionary. This creates an important implication to the behaviour of the program importing a module, which contains a directly executable code. Since this code is executed only during the module initialisation, it is executed only ONCE, no matter how many times we import the given module in our program!
Therefore, it is a bad design practice to put directly executable code in a module, which is imported several times in the program. We simply cannot assume that this code will be executed on every module's import! Instead, we should rather put this code inside a function (e.g. run()), and use the following pattern:
calling the run() function explicitly.
In certain situations, however, we need to write a module that is sometimes imported, and sometimes just run directly as the main program. This is especially useful during the development of the module, as we can run the module to execute some unit tests.
Following the previous advice we would hide the test code (in general – the directly executable code) inside a function (e.g. runTest()). This function should be executed, if the module is run, but it must not, if the module is imported. In order to achieve this, we need a way to execute the function conditionally, detecting, if the module is run, or imported. We can do this by adding the following code at the end of the module:
if __name__ == '__main__':
The special string variable __name__ is assigned the name of the module, if it is imported, and the value '__main__', if the module is run. Therefore, the above if statement will execute the runTest() function only if the module is run, and not when it is imported.
In order to overcome the limitation of the one-time initialisation of a module, the Python language offers the reload() function.
This causes the module to be reinitialised, and returns the module object as a result.
2.5 .py, .pyc, and .pyo files
If a module is written in Python, its source code is stored in the file with the '.py' extension (e.g.'myModule.py' for the module 'myModule').
Whenever this file is successfully compiled, an attempt is made to write the compiled version to the file with the same base name, and the '.pyc' extension (e.g. 'myModule.pyc'). It is not an error if this attempt fails; if for any reason the file is not written completely, the resulting '.pyc' file will be recognized as invalid and thus ignored later.
The contents of the '.pyc' files are platform independent, so a Python module directory can be shared by machines of different architectures.
If the Python interpreter is invoked with the –O flag (or –OO), Python performs some optimisation on the compiled source code, and instead of the '.pyc' file, it creates a file with the extension '.pyo'. It is used in the same way, as the '.pyc' file – the only difference is that '.pyc' files are used when no optimisation is requested, whereas the '.pyo' files are used when Python interpreter works in the optimising mode.
The compiled version ('.pyc' or '.pyo') has the modification date of the corresponding '.py' file stored within the file. Python loads the compiled version of the module (without recompiling the '.py' file):
- If the '.py' file does not exist
- If the '.py' file does exist, and its modification date matches the date stored in the compiled file
Otherwise, Python ignores the '.pyc' file, and parses the source file. This automates the development process, as you don't have to worry about explicitly updating your compiled files – Python will do it for you … usually.
When developing a Python program, you may sometimes notice, that Python 'does not see' the changes you have made to your module, unless you quit and restart your development environment. This can be explained as follows:
- The first time you run your program (e.g. testing some part of it), Python parses the source file, and sees the import statement referring to your module. Since this module has not been imported before (in the current session of the Python interpreter), it is not listed in the sys.modules dictionary. Therefore, Python loads and parses the module, creating/updating the compiled version ('.pyc' file), and storing the compiled module object in the sys.modulesdictionary.
- Then you make a change in the module's source code, you save the file, and run your test again. Oops! Something wrong! The test run looks the same as before, as if you have not made any changes to your module.
- This happens, because Python parses the source file, as before, seeing the import statement referring to our changed module. However, contrary to the situation in step 1, now our module IS already listed in the sys.modules dictionary. Therefore, Python does not make any attempt to load and parse the module – it just takes the existing version of the compiled module object from the sys.modules dictionary. Unfortunately, it is not the latest version of our module – it does not contain the recent changes, made in step 2.
Fortunately, most development environments provide facilities to request the reloading of an already imported module (some kind of the 'Reload Modules' button). If not, you could always put a temporary reload(moduleName) statement in your program, forcing the immediate reloading of the module after import.
2.6 sys.path variable
The variable sys.path defines a list of folders searched for imported module files. It is built during the interpreter's initialisation, and can be also customised at run-time. It is important to know, how this list is built, and how we can add our own path to the list, to let our scripts find their modules.
First of all, this variable is initialised from the Windows environment variable PYTHONPATH. Both the Python interpreter itself and various other software systems using Python interpreter (e.g. Tribon M3), can define or update this variable accordingly, setting it to a list of paths separated by semi-colons.
Then, the special site module is imported. Note, that there is no need to issue the statement import site – Python interpreter will do this for you. This module is a standard Python module, which can be customised by the user to add specific changes to the environment, e.g. adding new folders to the sys.path list. By default (if you install a standalone Python interpreter), it adds some standard folders, like: '\Python23\lib\site-packages'.
Additionally, the site module searches the folders in the sys.path list for the *.pth files (path configuration files). If found, they are all read, and the paths defined therein are automatically added to the sys.path list, extending it. Such files are used by some Python packages, like e.g. wxPython, to define the location of the wxPython package modules.
Further system customisation can be placed in an optional sitecustomize module, which the site module attempts to import. Then, we can leave the site module unchanged, and put all the customisation in the sitecustomise module.
Finally, the module search path (sys.path) can be customised at run-time. Example:
path = 'E:\\PRIVATE\\MODULES'
if path not in sys.path:
where the file my_test_module.py is located in the folder E:\PRIVATE\MODULES.
In the above example, the user-defined folder is placed at the end of the sys.path list. If you prefer to place it at the beginning of this list, just replace the statement:
Why it is important? Let's imagine that the module my_test_module is located not only in the folder E:\PRIVATE\MODULES, but also in some of the other folders listed in the sys.path variable. It might be the same module (a copy), another (maybe older!) version of the same module, or even a completely unrelated module, only by coincidence having the same name, as our module. No matter, what is the reason of the existence of this duplicate, the rule is simple:
The first matching file found is selected, when searching the folders from the sys.path list in the order, as defined by this list.
So, without a warning, you might import a different module, than the one, you wanted to import …
Of course, a good naming convention for modules can minimise the risk of such ambiguities.
2.7 Accessing the imported data
As discussed in section 2.2, the access to the identifiers imported from a module, depends on how they were imported. The performance of your program also depends on the namespace, where the imported identifier is bound.
Let's analyse the following two import statements:
1. import math -> math.sin(…)
2. from math import sin -> sin(…)
In the first case, the module name itself is registered in the current namespace, but the name 'sin' is registered in the module's namespace. Therefore, we need to use the qualified name math.sin here.
In the second example, the identifier sin itself is registered in the current namespace, which allows using this name directly. You cannot use here the qualified name math.sin, because the module's name (math) does not exist in the current namespace.
Summing up, the from version of the import statement allows to write shorter identifiers, possibly also improving performance, but also clutters the current namespace with many new identifiers. This negative effect takes place especially for the statement:
from moduleName import *
as here there can be really many names imported into the current namespace.
2.8 Namespaces and the name resolution
In order to understand, how Python resolves the names (qualified or not), translating them into the memory addresses of some variables, functions, etc., we need to be aware of the namespaces available for searching. In general, there are three namespaces, which are searched in the sequence given below:
- Local namespace
- Global (module's) namespace
- Built-in namespace
The local namespace contains all names available in the current scope (e.g. local variables in a function). The global or module's namespace contains all names available in the current module (the main program is also considered to be a module here). Here you will find all 'global' variables, functions, classes, etc. The last, built-in namespace contains the names defined in the module __builtin__, which is always accessible. It contains such names, like:
- names of exceptions (e.g. ArithmeticError, KeyError, ValueError, etc.)
- names of built-in functions (e.g. ord, abs, int, str, range, etc.)
Understanding the order, in which the namespaces are searched, when resolving a name, may help to write more efficient programs. Example:
codeList = 
for c in name:
The above function produces a list of ASCII codes of the characters in the string name. If this is a really long string, it may be worthwhile to optimise the for loop inside this function. Let's consider here the names that are resolved within the loop:
codeList.append is a qualified name, which requires two searches. First, Python needs to find codeList, which fortunately happens to be defined in the local namespace. Then it finds out, that it is a list, and searches for the identifier append in the list object's definition. This two-level search has to be done for each iteration of the for loop. We can reduce the overhead, by defining a local alias to the bound method codeList.append, and using it inside a loop.
The next candidate for optimisation is the ord function. It is a built-in function, so Python will find it after spuriously searching the local and global namespaces. By defining a local alias to this function, we let Python find it during the first pass – in the local namespace.
Summing up, the optimised code looks as follows:
codeList = 
locAppend = codeList.append #Note: no parentheses!
locOrd = ord #Again: no parentheses!
for c in name:
Of course, it is not yet the fastest version of our function. There is yet some room for optimisation, but it goes beyond our topic of namespaces and the name resolution. Therefore, I leave the finding of a better code as an exercise to the reader.
If we have understood the basic principles of the name resolution, it becomes clear, how Python deals with qualified names. For example, the identifier csv.DictWriter.writerow refers to the method writerow(…) of the class DictWriter, defined in the module csv. Let's analyse the following code:
method = csv.DictWriter.writerow
When executing the function fun(), after successfully importing the module csv, Python places the name 'csv' in the local namespace of the function fun().
When executing the next statement, Python performs the following activities:
- It looks for the identifier 'csv' in the local namespace. Since the search is successful, Python does not look for it any further (in the global and built-in namespaces).
- Python finds out, that the name 'csv' refers to a module, so it locates its definition from the dictionary sys.modules.
- Python then searches the namespace of the csvmodule, looking for the identifier 'DictWriter'. The search is successful.
- Python finds out, that the name 'DictWriter' refers to a class definition, so it locates its definition in memory.
- Python then searches the namespace of the DictWriterclass, looking for the identifier 'writerow'. The search is successful.
- Python finds out, that the name 'writerow' refers to a method in the DictWriterclass, so it locates the address of its code block.
- Summing up, Python has resolved the qualified name csv.DictWriter.writerow as the memory address of a writerow method, which has then been bound to the local name 'method' for a faster access.
Finally, we must understand, how Python handles the assignments, bounding values to the old or new identifiers. By default, Python binds the value to a variable in the local namespace, possibly hiding the other objects with the same name, existing in other namespaces. Example:
abs = 5
After executing the above assignment, we have simply lost an access to the built-in function abs(). As long, as the newly created variable abs lives in the current scope, the built-in function abs() is hidden. We can recover from this situation by deleting the variable abs. This will unhide the function abs().
x = abs(-3)
The alternative is to define the alias BEFORE hiding the function:
locAbs = abs
abs = 5
x = locAbs(-3)
Of course, the best solution is to avoid such ambiguities at all.
It is possible also to request, that the assignment should go to the global namespace, and not to the local one. Example:
lastX = None
lastX = x
In the above code, the assignment lastX = x does not create a local variable lastX, but instead updates the global variable – thanks to the global declaration.
2.9 The __import__ function
The import statement, as discussed so far, offers the static import facilities. The name of the imported module is hardcoded in the source code of your program. Python language offers a function, which enables to import modules dynamically, where the name of the imported module is not known in advance. Example:
module = __import__(modName)
The above function returns a list of global identifiers defined in the module, whose name is passed as the argument. The __import__ function is invoked internally by the import statement. It imports the given module, and returns the module object, which can be then accessed in the same way, as through the module name. For example, we could write:
res = module.run()
using the module variable obtained as a result of the __import__ function call.
The __import__ function supports additional, optional arguments:
module = __import__(modName, globalDict, localDict, fromList)
The globalDict dictionary contains the global identifiers (you may use the globals() function here), and the localDict – the local identifiers (available through the locals() function). The fromList argument is used to simulate the from modName import id1, id2, … syntax – it contains the list of names to import.
Sometimes we replace the built-in __import__ function by our own implementation with the compatible interface to support some special way of importing modules. The imp module is useful, if you need to write your own __import__ function.
2.10 Nested functions
This topic does not concern modules explicitly, but it is strongly related to our recent discussion of namespaces and the name resolution. Python language allows defining functions inside other functions. Example:
status = 0
if status == 0:
if status == 0:
Here, the argument x is passed to the internal functions in the usual way. Please note, however, that the internal functions have also access to the variables defined in the outer scope (status) – in the main function fun(). This allows passing fewer variables as arguments to the internal functions.
This is, however, not 100% foolproof. You may safely read such variables, but if you attempt to modify this variable inside a nested function, you will instead create a local version of it, valid inside this nested function only, hiding the outer variable.
Summing up, nested functions can help you hide some private functions. It is sometimes useful, but only for short, simple functions, not modifying the local variables from the outer scope.
There can be a temptation to use this approach to reduce the number of externally 'visible' functions in a module, but this can rather lead to difficult to find errors, and decrease the readability of the source code. Therefore, I recommend using this feature rather sparingly.
2.11 Is my module available?
Sometimes, we write the code assuming, that the given module will be available on the target system. If it is possible, that this module may be missing, we may want to write some alternative for this case. This requires the ability to detect, if the given module can be imported, or not:
myModuleOK = True
myModuleOK = False
import alternativeModule as myModule
The Boolean variable myModuleOK will tell, if the module myModule could be imported, or not. Additionally, you may want to import an alternative module instead, offering similar functionality, but coded using some alternative methods or resources.
Thanks to the as clause, the rest of the code can just assume, that the module myModule is present – it does not need to know, that a replacement has been provided instead of the original version.
Note that we should only react to the ImportError exception. The other exceptions indicate rather an error in the code, not the inability to import the module.
In general, packages are hierarchical structures of modules. Therefore, most of what we have said about modules, applies also to the packages. In this section we will focus on the features specific to packages.
When using packages, the module names are composed from a few names separated by dots, e.g. win32com.client (from the pywin32 Python for Windows extensions package).
3.1 Package folder structure
We will analyse the package folder structure on an example of the pywin32 package. After you install it, you will find in the folder \Python23\Lib\site-packages the file pywin32.pth, which extends the sys.path list by adding the following folders (see section 2.6):
These are standard Windows folders with additional modules supplied by the package. But that's not all! In the site-packages folder we can find some more folders coming from the pywin32 package, but not listed in the pywin32.pth folder. One of them is the \Python23\Lib\site-packages\win32com folder.
What's special in this folder? You will find out, that it contains the file __init__.py, commented at the top as 'Initialization for the win32com package'. This folder contains a few other Python source files (e.g. 'util.py'), but also a few subfolders. One of them is 'client'. What's interesting, it also contains the __init__.py Python file.
It turns out, that Python's import statement considers the subfolders containing the __init__.py file as modules, so that the following statements work fine:
- import win32com – loads the file win32com\__init__.py, offering the names 'util' and 'client' as internal names (accessible as 'win32com.util', and 'win32com.client')
- from win32com import util – loads the file win32com\__init__.py, offering the name 'util' as an identifier in the local namespace
This approach is then applied recursively, if the imported element is also a subfolder with the __init__.py file:
- import win32com.client – loads the file win32com\client\__init__.py, offering the names such as 'Dispatch' as internal names (win32com.client.Dispatchis a commonly used function to generate COM object instances)
- from win32com.client import Dispatch - loads the file win32com\client\__init__.py, offering the name 'Dispatch' as an identifier in the local namespace.
The __init__.py file must exist (it may be empty!) in the subfolder to be considered as a subpackage.
If you look closely, you will find, that the function Dispatch is defined in the file __init__.py in the subfolder 'client', but apart from this file, the 'client' subfolder contains also some other Python source files. They are submodules of the win32com.client package. Summing up, in order to use the function win32com.client.Dispatch(), we should first execute one of the following import statements:
- import win32com.client -> obj = win32com.client.Dispatch(…)
- from win32com.client import Dispatch -> obj = Dispatch(…)
- from win32com import client -> obj = client.Dispatch(…)
As you can see from the above examples, the statement from package import item is able to import either a subpackage, submodule, or some other name defined in the package.
3.2 from package import *
Python interpreter cannot find by itself the submodules of the given package or subpackage. We need to help here by providing the __all__ variable (see also section 2.3), being a sequence defining the names of the available public submodules. We should define this variable in the __init__.py file of the particular package.
Let's assume, that we add the following definition to the file __init__.py in the 'client' subfolder:
__all__ = ['build', 'util']
Then, the statement
from win32com.client import *
would import into the current namespace the following submodules only:
even though the 'client' folder contains some more modules.
If the __all__ definition is missing, the import statement does NOT import all the submodules of the package. It only ensures, that the package has been successfully loaded, and then imports into the current namespace only the following identifiers:
- names defined in the package's __init__.pyfile,
- submodules explicitly loaded by the package's __init__.pyfile,
- any submodules, that have been loaded by some previous import statements.
3.3 Intra-package references
Submodules often need to reference some other submodules of the package. If the other submodule is defined within the same subpackage, then we can use the simple non-qualified reference.
If we had a submodule named 'test' defined in the win32com.client subpackage, it might import the util submodule (also from the win32com.client subpackage) using the simple statement:
without having to use the qualified name:
It works, because for packages, the module search sequence is modified to include the current subpackage as the first place to search.
If a reference is made to a submodule of another subpackage, the fully qualified name must be used.
Here we are still considering the fictious submodule 'test' in the win32com.client subpackage. We would like to import the submodule 'policy' from the package win32com.server. The following statement is required:
3.4 A few comments about wxPython
One of the common questions asked about wxPython is the one about its dual naming convention. You can use either one of them, although the old style is becoming deprecated, and any use of it is discouraged:
Old style (first variant):
from wxPython import wx
Old style (second variant):
from wxPython.wx import *
Article written by Tomasz Lisowski