How python's import machinery works
Posted on Mon 22 January 2018 in python
Glossary
Let's begin with the basics. From now on when we say the word python we mean python 3.6. Python 3 is the future and (who knows) python 4 will be the future of python 3 etc. Stop using python 2!
I'll try to be concise and simple, although the documentation about modules and packages is pretty straight forward. Go ahead and take a look to either learn what these concepts are or just refresh your memory.
Python module
When we say python module or just module
we mean a simple python file, i.e models.py
or utils.py
.
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
When you put a bunch of modules under a directory then this directory is called a python package.
Python package
When we say python package or simply just package
we mean a directory that contains one or more modules.
Packages are a way of structuring Python’s module namespace by using “dotted module names”.
Of course, a package may contain a bunch of other directories (packages) which themselves may contain a bunch of other modules mixed with other packages etc. You get the point. That's how an application is structured. This, of course, does not apply to Python only but to every programming language out there.
Now, a package may or may not contain a __init.py__
file which distinguishes it between a regular package
and a
namespace package
. But I'll not go into details with namespace packages because first of all I have never used
a namespaced package and second, this article assumes a regular package (the one which has a __init__.py
in it).
Project setup
The following (super simple) project setup is assumed throughout this article:
tmp/
my_package/
__init__.py
my_module.py
You can follow along with this article by putting the above structure inside a directory, i.e your ~/tmp/
directory.
This is the content of my_module.py
:
class MyClass:
def __init__(self):
print('init called')
def caps(self, word):
print(word.upper())
Nothing fancy here. Super simple because we want to focus on the import system. Not classes, methods etc.
The easy way
From now on we will assume that the python interpreter is always enabled under the directory in which the above structure exists. No virtual environments. I promised to keep it simple in order to get a grip on this import machinery monster.
In my system, I have it under ~/tmp
.
~> cd tmp
~/tmp> python3.6
Python 3.6.3 (default, Dec 3 2017, 22:10:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Side tip: You can change the interactive python prompt from the default one (>>>
) to i.e >
with: import sys; sys.ps1 = '> '
~/tmp> python3.6
Python 3.6.3 (default, Dec 3 2017, 22:10:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.ps1 = '> '
>
Now, if we type:
>>> from my_package import my_module
>>>
>>> obj = my_module.MyClass()
init called
>>> obj.caps('hello')
HELLO
>>>
Import done! MyClass
instance created! caps()
method called! But you already know that. This is the first thing taught when you start learning python.
Go get a coffee/chocolate/tea/beer because here comes the good part. The internals have not yet begun...
The dynamic way
There is another way to import a module (or package). Using the import_module method.
Well, this is quite easy to understand and you're encouraged to use it if you want to import things dynamically (instead of the old classic way of
calling __import__()
).
>>> from importlib import import_module
>>>
>>> my_module = import_module('my_package.my_module')
>>> obj = my_module.MyClass()
init called
>>> obj.caps('hello')
HELLO
As seen from above, we have named the module, returned by the import_module
method, as my_module
, the same as the module's name in order to be consistent.
We could have also name it i.e mod
(or whatever) but then we must do obj = mod.MyClass()
.
Now take a look at this:
>>> from importlib import import_module
>>>
>>> my_module = import_module('my_package')
>>> obj = my_module.my_module.MyClass()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'my_package' has no attribute 'my_module'
What? But I have imported my_package
. Why the my_module
module cannot be found? Because, simply, it has not been loaded (imported).
We should import it explicit just like we did in the previous example. When importing a package, it is not assumed that all other
sub-packages/sub-modules are imported too. This applies to all Python import mechanisms, not only to import_module
method.
The (almost) manual way
Hang on. We haven't reach our goal to the pure manual way of importing things. Here is another way of importing a module (or package) without the direct usage of import_module method.
>>> from importlib.util import spec_from_loader, module_from_spec
>>> from importlib.machinery import SourceFileLoader
>>>
>>> spec = spec_from_loader("my_package.my_module", SourceFileLoader("my_package.my_module", "/home/nick/tmp/my_package/my_module.py"))
>>> spec
ModuleSpec(name='my_package.my_module', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f877568fcc0>, origin='/home/nick/tmp/my_package/my_module.py')
>>> dir(spec)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__',
'__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cached', '_set_fileattr', 'cached', 'has_location',
'loader', 'loader_state', 'name', 'origin', 'parent', 'submodule_search_locations']
>>>
>>> mod = module_from_spec(spec)
>>> mod
<module 'my_package.my_module' from '/home/nick/tmp/my_package/my_module.py'>
>>> dir(mod)
['__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
>>>
>>> spec.loader.exec_module(mod)
>>> dir(mod)
['MyClass', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
>>>
>>> obj = mod.MyClass()
init called
[READER] Hey, wait mister. What are these? Looks more like Klingon language than Python!
[ME] No, no. It does not. It is pretty self-explanatory once we start digging the python's import internals.
Note that there is also another way (the pure manual way) to import a Python module but I keep it for last in order to stay with me till the end :)
Import machinery's concept
Python has a very nice abstraction-philosophy of thinking about how to import things. On the other hand someone else could say that this system is quite complicated.
Nevertheless, the system comprises of finders, loaders, specifications, modules, caching and a conductor that orchestrates all these,
named python import machinery
. All these come to play whenever you type import my_package
.
At a glance, the job of the finders is to find (locate) the requested module. If the module can be found, a specification (for that module) is returned. This spec
is used by the loaders to create the module, initialize it and execute it. If the module cannot be found, the classic ModuleNotFoundError
exception is raised.
The procedure
1. sys.modules
Every time you import a module, the first thing searched is sys.modules
dictionary.
The keys are just plain strings which consist of module names (with or without the dotted path, which means that are either top-level modules or sub-modules)
and the value is the actual module
itself. sys.modules
dict acts like a cache.
If it's there, go get it. Do not instantiate the import cycle again.
Let's have a look at the contents of sys.modules
:
>>> import sys
>>>
>>> print(sys.modules)
{'builtins': <module 'builtins' (built-in)>,
'sys': <module 'sys' (built-in)>,
'_frozen_importlib': <module '_frozen_importlib' (frozen)>,
'_imp': <module '_imp' (built-in)>,
'_warnings': <module '_warnings' (built-in)>,
'_thread': <module '_thread' (built-in)>,
'_weakref': <module '_weakref' (built-in)>,
'_frozen_importlib_external': <module '_frozen_importlib_external' (frozen)>,
'_io': <module 'io' (built-in)>, 'marshal': <module 'marshal' (built-in)>,
'posix': <module 'posix' (built-in)>,
'zipimport': <module 'zipimport' (built-in)>,
'encodings': <module 'encodings' from '/usr/local/lib/python3.6/encodings/__init__.py'>,
'codecs': <module 'codecs' from '/usr/local/lib/python3.6/codecs.py'>,
'_codecs': <module '_codecs' (built-in)>,
'encodings.aliases': <module 'encodings.aliases' from '/usr/local/lib/python3.6/encodings/aliases.py'>,
'encodings.utf_8': <module 'encodings.utf_8' from '/usr/local/lib/python3.6/encodings/utf_8.py'>,
'_signal': <module '_signal' (built-in)>, '__main__': <module '__main__' (built-in)>,
'encodings.latin_1': <module 'encodings.latin_1' from '/usr/local/lib/python3.6/encodings/latin_1.py'>,
'io': <module 'io' from '/usr/local/lib/python3.6/io.py'>,
'abc': <module 'abc' from '/usr/local/lib/python3.6/abc.py'>,
'_weakrefset': <module '_weakrefset' from '/usr/local/lib/python3.6/_weakrefset.py'>,
'site': <module 'site' from '/usr/local/lib/python3.6/site.py'>,
'os': <module 'os' from '/usr/local/lib/python3.6/os.py'>,
'errno': <module 'errno' (built-in)>,
'stat': <module 'stat' from '/usr/local/lib/python3.6/stat.py'>,
'_stat': <module '_stat' (built-in)>,
'posixpath': <module 'posixpath' from '/usr/local/lib/python3.6/posixpath.py'>,
'genericpath': <module 'genericpath' from '/usr/local/lib/python3.6/genericpath.py'>,
'os.path': <module 'posixpath' from '/usr/local/lib/python3.6/posixpath.py'>,
'_collections_abc': <module '_collections_abc' from '/usr/local/lib/python3.6/_collections_abc.py'>,
'_sitebuiltins': <module '_sitebuiltins' from '/usr/local/lib/python3.6/_sitebuiltins.py'>,
'sysconfig': <module 'sysconfig' from '/usr/local/lib/python3.6/sysconfig.py'>,
'_sysconfigdata_m_linux_x86_64-linux-gnu': <module '_sysconfigdata_m_linux_x86_64-linux-gnu' from '/usr/local/lib/python3.6/_sysconfigdata_m_linux_x86_64-linux-gnu.py'>,
'readline': <module 'readline' from '/usr/local/lib/python3.6/lib-dynload/readline.cpython-36m-x86_64-linux-gnu.so'>,
'atexit': <module 'atexit' (built-in)>,
'rlcompleter': <module 'rlcompleter' from '/usr/local/lib/python3.6/rlcompleter.py'>
}
>>>
>>> 'my_package.my_module' in sys.modules
False
>>> len(sys.modules.keys())
60
Can you see my_package.my_module
in there? No? Good! That's because it has not been imported (yet) and not added to this dictionary.
If you write from my_package import my_module
then:
# Continue from previous example
>>> from my_package import my_module
>>> print(sys.modules)
{'builtins': <module 'builtins' (built-in)>,
'sys': <module 'sys' (built-in)>,
'_frozen_importlib': <module '_frozen_importlib' (frozen)>,
# ... same as before
'my_package': <module 'my_package' from '/home/nick/tmp/my_package/__init__.py'>,
'my_package.my_module': <module 'my_package.my_module' from '/home/nick/tmp/my_package/my_module.py'>
}
>>> 'my_package' in sys.modules
True
>>> 'my_package.my_module' in sys.modules
True
>>> len(sys.modules.keys())
62
As seen from above, when we added the import
statement, two entries added to sys.modules
. The top-level module (which acts like a package) and the dotted path
to the module (my_module.py
) itself. Now, every other time we reference to my_module
it will be loaded from sys.modules
.
The import
statement took care of all the internals and added it to sys.modules
, plus made it available to work with it. But how did all these happen?
Enter import machinery!
2. Finders
If the module requested is not found in the sys.modules
dict, then the list sys.meta_path
arises.
This list contains meta path finder objects
. Lets look at it:
>>> import sys
>>> print(sys.meta_path)
[
<class '_frozen_importlib.BuiltinImporter'>,
<class '_frozen_importlib.FrozenImporter'>,
<class '_frozen_importlib_external.PathFinder'>
]
The job of each one is to locate the requested module by returning the specification (we'll get to that) of the module or simply None
. The competent for that is the
find_spec
method. This method is called first (for each meta finder
) whenever
you import
something.
from the docs
These finders are queried in order to see if they know how to handle the named module. Meta path finders must implement a method called
find_spec()
which takes three arguments: a name, an import path, and (optionally) a target module.
The BuiltinImporter
is used purely for builtin modules. You may ask which are
these modules. Lets take a look.
>>> import sys
>>> sys.builtin_module_names
('_ast', '_codecs', '_collections', '_functools', '_imp', '_io', '_locale', '_operator',
'_signal', '_sre', '_stat', '_string', '_symtable', '_thread', '_tracemalloc', '_warnings',
'_weakref', 'atexit', 'builtins', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal',
'posix', 'pwd', 'sys', 'time', 'xxsubtype', 'zipimport')
So, every time you type import sys
, or import itertools
etc, the finder used to locate that module (remember, not to load it) is the BuiltinImporter
.
Next, comes the FrozenImporter
which is used to locate frozen modules.
We will not concern with that type of modules, though.
Last but not least is the PathFinder
which is responsible for locating any
module/package not in the range of the previous two. More specifically, it handles modules that are located either on sys.path
, sys.path_hooks
, sys.path_importer_cache
or
the __path__
attribute on the package.
If
sys.meta_path
processing reaches the end of its list without returning a spec, then aModuleNotFoundError
is raised.
Imagine the above three (default) guys like persons with a flashlight, each looking at a specific locations to find a module. If all of them fail then either the package exists but the finders are not looking in the correct location or the package does not exist at all. I think until here, all these are quite easy to understand.
And one more thing: you can extend this list writing your own meta finder
but more on this later.
Lets dive deeper...
2.1 Spec object
Specification objects or just spec objects
are the meta data of
modules. Every module
has a spec
because in order to import a module the
spec is required. Thus, every module
exposes a __spec__
attribute. Lets inspect one:
>>> import my_package
>>>
>>> dir(my_package)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>>
>>> spec = my_package.__spec__
ModuleSpec(name='my_package', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f9673ceee48>, origin='/home/nick/tmp/my_package/__init__.py', submodule_search_locations=['/home/nick/tmp/my_package'])
>>>
>>> dir(spec)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__',
'_cached', '_initializing', '_set_fileattr', 'cached', 'has_location', 'loader', 'loader_state', 'name', 'origin',
'parent', 'submodule_search_locations'
]
>>> spec.origin
'/home/nick/tmp/my_package/__init__.py'
>>> spec.parent
'my_package'
>>> spec.submodule_search_locations
['/home/nick/tmp/my_package']
Examining the above code, we can see that the spec
(object of the ModuleSpec
class) of my_package
package has its own methods and attributes.
One of the attributes it has is the loader
. The loader
indicates to the import machinery which loader to use while creating and executing the module.
Remember, until now the module has not yet been created. Only the spec
. From a spec derives a module.
There are other attributes as well, like origin
(name of the place from which the module is loaded, “builtin” for built-in modules and the filename for modules loaded from source.),
parent
(fully-qualified name of the package) and submodule_search_locations
(list of strings for where to find submodules), all of which are useful to the importer.
As you have guessed, if you import a module (and not a package) you will see that the submodule_search_locations
will be None
since the module is always the
leaf in the tree. In technical terms, the module has no __path__
attribute as opposed to a package.
Conclusion: spec
objects hold valuable information about the creation of the modules
. spec
objects are needed to construct and import a module
.
3 Path Finder
Remember the list sys.meta_path
where there three default meta path finder objects
(aka importers
)? The most common scenario is where you import packages
and modules from within your project and also other modules not handled by the first two importers. So, the most busy guy is the last one. Let me introduce you
the Path Based Finder
.
The Path Based Finder
(PBF for short) can be imported (if you want to play with him) with different names. All of them are aliases.
>>> from _frozen_importlib_external import PathFinder as A
>>> from importlib._bootstrap_external import PathFinder as B
>>> from importlib.machinery import PathFinder as C
>>>
>>> A == B == C
True
>>> A is B is C
True
>>> id(A) == id(B) == id(C)
True
>>>
>>> dir(A)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_spec', '_legacy_get_spec',
'_path_hooks', '_path_importer_cache', 'find_module', 'find_spec', 'invalidate_caches'
]
All of them aliases. But, in my opinion I'll stay with the last import statement since I don't want to mess with the underscore prefix imports.
Although, PBF seems like a smart guy, he doesn't know how to import anything. He needs help. Help from other guys. Now this may sound strange. Why the PBF does not handle the findings all by himself? Well, this is an implementation detail and maybe it is for good.
All that PBF does is that he traverses the individual path entries, associating each of them with a path entry finder
that knows how to handle that particular kind of path.
Where are these path entries and these path entry finders you may ask. Well, the sys
package has one more time the answer.
- path entry no1:
sys.path
- path entry no2:
sys.path_hooks
- path entry no3:
sys.path_importer_cache
- path entry no4:
__path__
attribute on package objects
-
You all know the first one. It's a list of string defined by the environment variable
PYTHONPATH
as well as from other mechanisms (virtualenvs etc). -
The second is another list consists of Path Entry Finders (PEF for short) callables. Different from the meta one we're talking about (PBF).
-
The third one is a dictionary, mapping locations (strings) to PEF objects.
-
The last one is the
spec.submodule_search_locations
(ormy_package.__path__
, if you like) which contains a list of strings indicating sub-modules/sub-packages locations.
>>> import sys
>>>
>>> sys.path
['', '/usr/local/lib/python36.zip', '/usr/local/lib/python3.6', '/usr/local/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/site-packages']
>>>
>>> sys.path_hooks
[<class 'zipimport.zipimporter'>, <function FileFinder.path_hook.<locals>.path_hook_for_FileFinder at 0x7f9673dfa6a8>]
>>>
>>> sys.path_importer_cache
{'/usr/local/lib/python36.zip': None,
'/usr/local/lib/python3.6': FileFinder('/usr/local/lib/python3.6'),
'/usr/local/lib/python3.6/encodings': FileFinder('/usr/local/lib/python3.6/encodings'),
'/usr/local/lib/python3.6/lib-dynload': FileFinder('/usr/local/lib/python3.6/lib-dynload'),
'/usr/local/lib/python3.6/site-packages': FileFinder('/usr/local/lib/python3.6/site-packages'),
'/home/nick/tmp': FileFinder('/home/nick/tmp'),
'/usr/local/lib/python3.6/collections': FileFinder('/usr/local/lib/python3.6/collections'),
'/home/nick/tmp/my_package': FileFinder('/home/nick/tmp/my_package'),
'/usr/local/lib/python3.6/importlib': FileFinder('/usr/local/lib/python3.6/importlib')
}
Feel a little lost with all these? Lets make a very brief recap to see where are we:
- An import statement is encountered
- Is the preferred imported name in
sys.modules
? - If yes import it, done!
- If not, call
find_spec()
on eachsys.meta_path
importer and wait for aspec
← this is where we are! - If no
spec
returned, raiseModuleNotFoundError
- If a
spec
returned, create the module (have not talk about it yet) - Once the module is created, initialize it with default attributes
- Add it to
sys.modules
(aka add it to cache) - Execute the module
Lets continue!
In order to find what are the above path entries
and how they are related to the PEF, we must take a look at the
find_spec()
implementation of the
last meta path finder in the sys.meta_path
list, PathFinder
.
def _path_hooks(cls, path):
"""Search sys.path_hooks for a finder for 'path'."""
if sys.path_hooks is not None and not sys.path_hooks:
_warnings.warn('sys.path_hooks is empty', ImportWarning)
for hook in sys.path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
def _path_importer_cache(cls, path):
"""Get the finder for the path entry from sys.path_importer_cache.
If the path entry is not in the cache, find the appropriate finder
and cache it. If no finder is available, store None.
"""
if path == '':
try:
path = _os.getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = sys.path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
sys.path_importer_cache[path] = finder
return finder
def _get_spec(cls, fullname, path, target=None):
"""Find the loader or namespace_path for this module/package name."""
# [...]
for entry in path:
finder = cls._path_importer_cache(entry)
if finder is not None:
if hasattr(finder, 'find_spec'):
spec = finder.find_spec(fullname, target)
else:
spec = cls._legacy_get_spec(fullname, finder)
if spec is None:
continue
if spec.loader is not None:
return spec
# other stuff here [...]
else:
spec = _bootstrap.ModuleSpec(fullname, None)
spec.submodule_search_locations = namespace_path
return spec
def find_spec(cls, fullname, path=None, target=None):
"""Try to find a spec for 'fullname' on sys.path or 'path'.
The search is based on sys.path_hooks and sys.path_importer_cache.
"""
if path is None:
path = sys.path
spec = cls._get_spec(fullname, path, target)
if spec is None:
return None
elif spec.loader is None:
namespace_path = spec.submodule_search_locations
if namespace_path:
# We found at least one namespace path. Return a
# spec which can create the namespace package.
spec.origin = 'namespace'
spec.submodule_search_locations = _NamespacePath(fullname, namespace_path, cls._get_spec)
return spec
else:
return None
else:
return spec
Take a minute to navigate yourself to these pretty self-explained methods. It all starts from the
find_spec('my_package.my_module', path=['/home/nick/tmp/my_package/my_module.py'])
call.
Done?
OK, lets look at these together. Note, you're looking at the meat of how python locates modules in a filesystem.
- If a
path
is not passed tofind_spec
then thesys.path
is used. - The
_get_spec()
is called. - For each
path entry
in the list of paths (eithersys.path
or the passedpath
parameter) the_path_importer_cache()
is called with this entry. - As said above, the
sys.path_importer_cache
is a dictionary and acts like a cache (as the name implies). The_path_importer_cache()
method returns the value (the Path Entry Finder -FileFinder
, if you recall) if a value is found for that path. - If the cache fails, then a new Path Entry Finder tries to be created (we're inside the
_path_hooks()
method now) and returned. - If that fails too, then
None
is returned and thesys.path_importer_cache
gets updated with the key as the failed path and the value asNone
. So, next time the same path is tried to imported, the cache will get queried,None
will return and aModuleNotFoundError
gets raised. - If the creation (initialization, to be more accurate) of the new PEF succeeds (
try
block of the_path_hooks
method) then this PEF for this path is stored in the cache (we are inside the_path_importer_cache
method now). - (back to
_get_spec
method) thefind_spec
method is called. If this fails, thencontinue
to next path. If spec found and the spec has aloader
, return it. - Finally, if the
path
list reaches the end without a result (we're still inside the_get_spec
method), theelse
clause (of thefor
loop) is executed and a newspec
is created without a loader. Just an emptyModuleSpec
object. This object is returned. - Because this
ModuleSpec
object has no loader associated with and thesubmodule_search_locations
attribute is an empty list,None
will be returned by thefind_spec
method.
I want to make clear that a spec is a totally different thing than a finder. A finder is just responsible for finding paths, locations, files, URLs, etc. For example, the FileFinder
is responsible to locate paths that interact with the file system. Not URLs. Neither database queries. Just plain strings, i.e /path/to/a/location
that represent a valid file system
path. A custom Finder could be made to interpret other locations, i.e a RedisFinder that can locate redis based urls.
On the other hand, a spec is created given a set of parameters (location
, loader
, submodule_search_location
). The real creation of the spec, the place where the initialization of
the types.ModuleSpec()
takes place, is just a plain function in the python's source code and its name is spec_from_file_location
.
There is also another place where the ModuleSpec()
get called, inside the _get_spec()
method as well.
Am I too explicit of describing things and got lost (again)? Here is a short version of the above:
- There are two default Path Entry Finders (
sys.path_hooks
, thezipimporter
and theFileFinder
, actually the first one is a class and the second is a callable) which used by the Path Based Finder (PathFinder
). However, this list can be extended. For example you may make anHttpFinder
that is able to locate modules through URLs, along with the associatedloader
of course. - Python, has already populated a cache with most common paths for you, inside the
sys.path_importer_cache
dict. Most of them are assigned to theFileFinder
and some toNone
. - If the package you're trying to import is not inside the
path
list parameter passed tofind_spec
then, I'm sorry buddy but you'll get an error! - Next time you try to import the same package, unless no changes have been made to
sys.path
orsys.path_importer_cache
, you'll get the same error because this path has been black listed inside thesys.path_importer_cache
.
Some hints:
- There are two
find_spec
methods. One implemented by thePathFinder
(the meta one or the Path Based Finder, if you like this name) and the other by theFileFinder
(one of the Path Entry Finders, the other one is the zipimporter). - Python calls
PathFinder
'sfind_spec
which implicit callsFilefinder
'sfind_spec
. - Nevertheless, the
find_spec
will return either a spec orNone
. Period.
Now that we have an idea what a Finder is, lets play with that guy and try to load a spec, using different ways.
>>> import sys
>>>
>>> sys.meta_path
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
>>> pf = sys.meta_path[-1]
>>> sys.path_importer_cache
{'/usr/local/lib/python36.zip': None,
'/usr/local/lib/python3.6': FileFinder('/usr/local/lib/python3.6'),
'/usr/local/lib/python3.6/encodings': FileFinder('/usr/local/lib/python3.6/encodings'),
'/usr/local/lib/python3.6/lib-dynload': FileFinder('/usr/local/lib/python3.6/lib-dynload'),
'/usr/local/lib/python3.6/site-packages': FileFinder('/usr/local/lib/python3.6/site-packages'),
'/home/nick/tmp': FileFinder('/home/nick/tmp'),
'/usr/local/lib/python3.6/collections': FileFinder('/usr/local/lib/python3.6/collections')
}
>>> del sys.path_importer_cache['/home/nick/tmp']
>>> # now when I try to find the spec for "my_package", the cache will not find it and the
>>> # import mechanism will start.
>>> pf.find_spec('my_package')
ModuleSpec(name='my_package', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7ff3f060fcf8>, origin='/home/nick/tmp/my_package/__init__.py', submodule_search_locations=['/home/nick/tmp/my_package'])
>>> # found it and the cache has been updated
>>> '/home/nick/tmp' in sys.path_importer_cache.keys()
True
>>> # Let's prevent it from locating our package!
>>> del sys.path_importer_cache['/home/nick/tmp']
>>> sys.path
['', '/usr/local/lib/python36.zip', '/usr/local/lib/python3.6', '/usr/local/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/site-packages']
>>> del sys.path[0] # the first element, '' (empty string), searches, by default, to the current working directory
>>> # We have removed all potential references of our current working directory. Lets see now
>>> pf.find_spec('my_package') # yeah! None returned!
>>> # Lets add it back
>>> sys.path.append('/home/nick/tmp') # the same as ...append('') or sys.path.insert(0, '')
>>> pf.find_spec('my_package')
ModuleSpec(name='my_package', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7ff3f060fcf8>, origin='/home/nick/tmp/my_package/__init__.py', submodule_search_locations=['/home/nick/tmp/my_package'])
4. Loaders
The hard part is to find the spec. If the spec is found, things are quite easy. You can find the pseudo code of a loader here.
In a sense, the loader uses the spec to create the module. Most of the times this will return None
and the module
will be created using the default semantics. What that, exactly means? It means that the _init_module_attrs
takes place and the module object gets filled with the appropriate attributes.
Continuing, after the module is created, it has to be added to the sys.modules
(recall this dict, it's the first cache that gets queried when an import
statement is encountered) and finally
get executed. Done! Your module, after a long journey, can now be used by the programmer, you!
Have you noticed the secret sauce when you import a package or module? It's this line exactly
(inside the exec_module
method of the _LoaderBasics
class, where the SourceFileLoader
inherits from):
_bootstrap._call_with_frames_removed(exec, code, module.__dict__)
The module (the python file) gets executed using the (sometimes evil one) exec
builtin function.
Since, the code is executed, you then have all your python definitions at your feet to work with.
Here's an explanation from StackOverflow of the differences
between eval
, exec
and compile
. Very descriptive!
Lets look at an example, a complete example this time:
The (pure) manual way
>>> import sys
>>> import types
>>>
>>> # get out lovely finder
>>> pf = sys.meta_path[-1]
>>> pf
<class '_frozen_importlib_external.PathFinder'>
>>>
>>> # get the spec for "my_package.my_module", path is given as the place to search
>>> spec = pf.find_spec('my_package.my_module', path=['/home/nick/tmp/my_package'])
>>> spec
ModuleSpec(name='my_package.my_module', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f9117dfee48>, origin='/home/nick/tmp/my_package/my_module.py')
>>>
>>> sys.path_importer_cache
{'/usr/local/lib/python36.zip': None,
'/usr/local/lib/python3.6': FileFinder('/usr/local/lib/python3.6'),
# ... other locations here
'/home/nick/tmp': FileFinder('/home/nick/tmp'),
'/home/nick/tmp/my_package': FileFinder('/home/nick/tmp/my_package')
}
>>> # ~/tmp/my_package was added to the cache along with its associated FileFinder
>>>
>>> mod = spec.loader.create_module(spec)
>>> mod is None
True
>>> # Hmm... loader did not created it, lets do it manually
>>> mod = types.ModuleType(spec.name) # spec.name == 'my_package.my_module'
>>> mod
<module 'my_package.my_module'>
>>> # module has been created. At this time we should fill it with proper attributes
>>> # but it works without filling it too. I'll fill it.
>>> mod.__name__ # already equals spec.name
'my_package.my_module'
>>> mod.__loader__ = spec.loader
>>> mod.__file__ = spec.origin
>>> mod.__package__ = spec.parent
>>> sys.modules[spec.name] = mod # add it to cache
>>> mod
<module 'my_package.my_module' from '/home/nick/tmp/my_package/my_module.py'>
>>> # the printed version of mod has changed
>>> dir(mod)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
>>>
>>> # final step. execute the module
>>> spec.loader.exec_module(mod) # returns None
>>> dir(mod)
['MyClass', '__builtins__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
>>>
>>> # Wow! Look at that! MyClass. Lets use it...
>>> obj = mod.MyClass()
init called
>>> obj.caps('it worked!')
IT WORKED!
The above all-in manual way of importing our my_module.py
module, was a bare implementation of the module loading pseudo code.
By now, you should be able to debug better any ModuleNotFoundError
s or ImportError
s. Of course, in most cases, you will not have to swim so deep, as we did here, but at least,
you know far more than before. In addition to all the above, the import
statement does a lot more while loading the module but that is an implementation detail and better be
left to core developers.
For the last, here some nice sources to look, if you want to learn more or just enjoying reading-watching:
- David Beazley's video on Modules and Packages: Live and Let Die! - PyCon 2015
- Be careful with exec and eval in Python article by Armin Ronacher (the creator of Flask)
- Another article by Armin Ronacher Dealing with the Python Import Blackbox