Introduction
If you have landed on this page, you must have had one between two only possible reactions to the title of this post, either "Hmm, this sounds interesting" or "Just, why?". The straight answer is, well, "just, because". And perhaps a bit more articulated answer is: because the people in the first category probably enjoy this kind of things :).
Reactions aside, the subject of this post is the coding of an extension for Python written in pure Assembly for the Intel x86-64 architecture on a Linux-based operating system. If you are familiar with general assembly but have never coded for the architecture that we are targeting, it is perhaps worth reading through my previous post "Getting Started with x86-64 Assembly on Linux".
I will also assume that you are somewhat familiar with extending Python with C.
If not, then it probably is a good idea to go through the official
documentation before
reading on, or some things might not make too much sense. The approach of this
post is by example and builds on knowledge about C to transition to Assembly. My
favourite assembler on Linux is NASM, since it supports the Intel syntax, the
one that I am more comfortable with. Therefore the only dependencies for
following along are the NASM assembler and the GNU linker ld
. Optionally, we
can make use of a Makefile
to assemble and link our code, and perhaps docker
to test it in a clean environment. You will find all the relevant files in the
linked GitHub repository.
Now it's time to jump straight into the code.
The Code
There isn't much more to say before we can see the code really, so here it is.
This is the content of my asm.asm
source file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | DEFAULT rel
%include "asm/python.inc"
GLOBAL PyInit_asm:function
;; ---------------------------------------------------------------------------
SECTION .rodata
;; ---------------------------------------------------------------------------
l_sayit_name db "sayit", 0
l_sayit_doc db "This method has something important to say.", 0
l_sayit_msg db "Assembly is great fun! :)", 10
l_sayit_msg_len equ $ - l_sayit_msg
l_module_name db "asm", 0
;; ---------------------------------------------------------------------------
SECTION .data
;; ---------------------------------------------------------------------------
l_asm_methods: ;; struct PyMethodDef[] *
ISTRUC PyMethodDef
at PyMethodDef.ml_name , dq l_sayit_name
at PyMethodDef.ml_meth , dq asm_sayit
at PyMethodDef.ml_flags , dq METH_NOARGS
at PyMethodDef.ml_doc , dq l_sayit_doc
IEND
NullMethodDef
l_asm_module: ;; struct PyModuleDef *
ISTRUC PyModuleDef
at PyModuleDef.m_base , PyModuleDef_HEAD_INIT
at PyModuleDef.m_name , dq l_module_name
at PyModuleDef.m_doc , dq NULL
at PyModuleDef.m_size , dq -1
at PyModuleDef.m_methods , dq l_asm_methods
at PyModuleDef.m_slots , dq NULL
at PyModuleDef.m_traverse , dq NULL
at PyModuleDef.m_clear , dq 0
at PyModuleDef.m_free , dq NULL
IEND
;; ---------------------------------------------------------------------------
SECTION .text
;; ---------------------------------------------------------------------------
asm_sayit: ;; ----------------------------------------------------------------
push rbp
mov rbp, rsp
mov rax, 1 ; SYS_WRITE
mov rdi, 1 ; STDOUT
mov rsi, l_sayit_msg
mov rdx, l_sayit_msg_len
syscall
mov rax, Py_None
inc QWORD [rax + PyObject.ob_refcnt]
pop rbp
ret
;; end asm_sayit
PyInit_asm: ;; --------------------------------------------------------------
push rbp
mov rbp, rsp
mov rsi, PYTHON_API_VERSION
mov rdi, l_asm_module
call PyModule_Create2 WRT ..plt
pop rbp
ret
;; end PyInit_asm
|
If you have never written a C extension for Python before, this might look a bit mysterious to you, although the general structure, at least, should be quite clear after you've glimpsed through the official Python documentation on extending Python with C.
We shall now analyse every single part of the above code sample in details to see what each block of code does.
Shared Object
On the very first line of the source we see the line
DEFAULT rel
Our goal is to assemble and link our code into an ELF64 shared object file.
Contrary to ordinary program code, shared object files are dynamically loaded
into random memory addresses. It is therefore important that all our code is
position-independent. One way of doing this is to make sure that any memory
reference is not absolute, but relative to the value of the RIP
register,
which points to the current instruction being executed. This guarantees that, no
matter where the shared object is loaded into memory, references to local
variables are correct. In 64-bit mode, NASM defaults to absolute addresses,
therefore the above line is necessary to switch to RIP
-relative addresses.
The CPython Headers
On line 3 we include a file to our main Assembly source. Given the simplicity of
this example, we could have included the content of the python.inc
file within
asm.asm
itself. However, for larger projects it is perhaps good practice to
separate declarations and actual code, like it is usually done in C, with .h
and .c
files. In fact, the python.inc
file includes the equivalent of
structures and macros as declared in the CPython header files. As far as I'm
aware, there are no assembly-specific include files provided by the maintainers
of CPython, so we have to go through the extra effort of typing them ourselves.
We will get back to the content of this file later on.
Exporting Global Symbols
Line 5 is an important one. It exports the symbol PyInit_asm
, of type
function
, and makes it available for external programs. This is the function
that CPython calls the moment we load the asm
module with import asm
from
the Python interpreter. If we do not export this symbol, then CPython won't be
able to find the code necessary to initialise the module. In analogy with C,
this is equivalent to declaring a non-static function.
Immutable Strings
Next we have the read-only data section
;; ---------------------------------------------------------------------------
SECTION .rodata
;; ---------------------------------------------------------------------------
l_sayit_name db "sayit", 0
l_sayit_doc db "This method has something important to say.", 0
l_sayit_msg db "Assembly is great fun! :)", 10
l_sayit_msg_len equ $ - l_sayit_msg
l_module_name db "asm", 0
Here we initialise the strings that we will need later on. As they won't change during the course of the code execution, we put them in a read-only section of the shared object. The GNU C compiler does just the same thing with every literal string that you use in C code. You will notice references to their address in the following section, that of (read-write) initialised data.
CPython Data Structures
Next is the section of initialised data.
;; ---------------------------------------------------------------------------
SECTION .data
;; ---------------------------------------------------------------------------
l_asm_methods: ;; struct PyMethodDef[] *
ISTRUC PyMethodDef
at PyMethodDef.ml_name , dq l_sayit_name
at PyMethodDef.ml_meth , dq asm_sayit
at PyMethodDef.ml_flags , dq METH_NOARGS
at PyMethodDef.ml_doc , dq l_sayit_doc
IEND
NullMethodDef
l_asm_module: ;; struct PyModuleDef *
ISTRUC PyModuleDef
at PyModuleDef.m_base , PyModuleDef_HEAD_INIT
at PyModuleDef.m_name , dq l_module_name
at PyModuleDef.m_doc , dq NULL
at PyModuleDef.m_size , dq -1
at PyModuleDef.m_methods , dq l_asm_methods
at PyModuleDef.m_slots , dq NULL
at PyModuleDef.m_traverse , dq NULL
at PyModuleDef.m_clear , dq 0
at PyModuleDef.m_free , dq NULL
IEND
Here is where things start to get interesting, and the content of the
python.inc
file comes into play. The first two labels point to the beginning
of CPython-specific structures. The first is an array of PyMethodDef
structures. As the name suggests, each instance of this structure is used to
hold information about a method that should be made available to the Python
interpreter from within our module. To find out in which header file it is
defined, we can use the command
grep -nr /usr/include/python3.6 -e "struct PyMethodDef"
In my case, I get that the structure is defined in
/usr/include/python3.6/methodobject.h
, starting from line 54. Inside the
python.inc
we then have the equivalent structure declaration
STRUC PyMethodDef
.ml_name resq 1 ; const char *
.ml_meth resq 1 ; PyCFunction
.ml_flags resq 1 ; int
.ml_doc resq 1 ; const char *
ENDSTRUC
The NullMethodDef
is a NASM macro that conveniently defines the sentinel
PyMethodDef
structure, which is used to mark the end of the PyMethodDef
array pointed by l_asm_methods
. Its definition is also in the python.inc
file and, as you can see, simply initialises a new instance of the structure
with all the fields set to NULL
or 0, depending on their semantics, i.e.
whether they are memory pointers or general integers.
%define NullMethodDef dq NULL, NULL, 0, NULL
Note that
NULL
is not a native NASM value. To align the coding conventions with C, I have defined NULL as a constant inpython.inc
and assigned the value of 0 to it. The idea is that, like in C, it makes the intent of the code clearer, since any occurrence ofNULL
indicates a null pointer rather than just the literal value 0.
The next label, l_asm_module
, points to an instance of the PyModuleDef
structure, which is pretty much the core data structure of our Python module. It
contains all the relevant metadata that is then passed to CPython for correct
initialisation and use of the module. Its definition is in the moduleobject.h
header file and, at first sight, looks a bit complicated, with some references
to other structures and C macros.
typedef struct PyModuleDef{
PyModuleDef_Base m_base;
const char* m_name;
const char* m_doc;
Py_ssize_t m_size;
PyMethodDef *m_methods;
struct PyModuleDef_Slot* m_slots;
traverseproc m_traverse;
inquiry m_clear;
freefunc m_free;
} PyModuleDef;
So lets take our time to figure out what its byte content looks like. The first
field is an instance of the PyModuleDef_Base
structure, which is defined in
the same header file, just a few lines above. The non-trivial bit in this new
structure is the first part, PyObject_HEAD
, which looks like a C macro. As the
name suggest, its definition is quite likely to be found in object.h
. Indeed,
there we find
#define PyObject_HEAD PyObject ob_base;
so our chase continues. The definition of the PyObject
structure can be found
a few lines below. Again, all the fields are quite simple, i.e. just integers
value or memory pointers, except for the macro _PyObject_HEAD_EXTRA
. We then
have to jump back up a few lines, to find that this macro is conditionally
defined as either nothing or 0, 0
. By default, the macro Py_TRACE_REFS
is
not defined, so in our case _PyObject_HEAD_EXTRA
evaluates to nothing.
Backtracking from our macro chase in CPython headers, we see that we can define
the following structures in python.inc
STRUC PyObject
.ob_refcnt resq 1 ; Py_ssize_t
.ob_type resq 1 ; struct _typeobject *
ENDSTRUC
STRUC PyModuleDef_Base
.ob_base resb PyObject_size
.m_init resq 1 ; PyObject *
.m_index resq 1 ; Py_ssize_t
.m_copy resq 1 ; PyObject *
ENDSTRUC
STRUC PyModuleDef
.m_base resb PyModuleDef_Base_size
.m_name resq 1 ; const char *
.m_doc resq 1 ; const char *
.m_size resq 1 ; Py_ssize_t
.m_methods resq 1 ; PyMethodDef *
.m_slots resq 1 ; struct PyModuleDef_Slot *
.m_traverse resq 1 ; traverseproc
.m_clear resq 1 ; inquiry
.m_free resq 1 ; freefunc
ENDSTRUC
As you can easily guess, NASM generates the constants PyObject_size
etc...
automatically so that they can be used to reserve enough memory to hold the
entire structure in the definition of other structures. This makes nesting quite
easy to implement in NASM.
Local and Global Functions
Finally we get to the actual code that will get executed by CPython when our module is loaded and initialised, and when the methods that it provides are called.
;; ---------------------------------------------------------------------------
SECTION .text
;; ---------------------------------------------------------------------------
asm_sayit: ;; ----------------------------------------------------------------
push rbp
mov rbp, rsp
mov rax, 1 ; SYS_WRITE
mov rdi, 1 ; STDOUT
mov rsi, l_sayit_msg
mov rdx, l_sayit_msg_len
syscall
mov rax, Py_None
inc QWORD [rax + PyObject.ob_refcnt]
pop rbp
ret
;; end asm_sayit
PyInit_asm: ;; --------------------------------------------------------------
push rbp
mov rbp, rsp
mov rsi, PYTHON_API_VERSION
mov rdi, l_asm_module
call PyModule_Create2 WRT ..plt
pop rbp
ret
;; end PyInit_asm
In fact, we have a total of two functions, one local and one global. The first
one, asm_sayit
, is the only method contained in our module. All it does is to
write a string, l_sayit_msg
, to standard output by invoking the SYS_WRITE
system call. Perhaps the most interesting bit of this function is the code on
lines 61-62. This is the idiom for any function that wishes to return None
in
Python. Recall that, in Python, None
is an object instantiated by CPython. As
such, our shared library needs to import it as an external symbol. This is why
you will find the macro PyNone
defined as
%define Py_None _Py_NoneStruct
together with the line
EXTERN _Py_NoneStruct
in the python.inc
file. This is equivalent to the two lines
PyAPI_DATA(PyObject) _Py_NoneStruct; /* Don't use this directly */
#define Py_None (&_Py_NoneStruct)
in the object.h
header file, where the None
object is defined. All of this
explains line 61, but what about line 62? This has to do with Reference
Counting. In a nutshell,
every object created in Python comes with a counter that keeps track of all the
references attached to it. When the counter gets down to 0, the object can be
de-allocated from memory and resources freed for other objects to use. This is
how Python, which heavily relies on malloc
, can keep memory leaks at bait. It
is therefore very important to properly maintain reference counts in Python
extensions. As None
is a Python object like any others, when we return a
reference to it, we have to bump its reference count. In C, this is conveniently
done with the Py_INCREF
macro. Its definition is in the object.h
and, as it
is easy to guess, it just increases the ob_refcnt
field of the PyObject
structure. This is precisely what we do on line 62.
Stack Frames Matter! You might be wondering why we are taking care of creating a stack frame on function entry, and cleaning up after ourself on leave. The reason is a pretty obvious one: we don't know what code will call ours, so it is safe to make sure that stack alignment is preserved across calls by doing what every function is expected to do. When I was lying down the code for this post, I was getting a SIGSEGV exception, and the debugger revealed that the instruction
movaps
was trying to store the value of thexmm0
register on a memory location that was not a multiple of 16. The problem was solved by the extra 8 bytes frompush rbp
.
The second and last function is our global exported symbol PyInit_asm
. It gets
called by CPython as soon as we import
the module with import asm
. In this
simple case, we don't have to do much here. In fact, all we have to do is call a
standard CPython function and pass it the instance of PyModuleDef
allocated at
l_asm_module
. As we have briefly seen, this contains all the information about
our module, from the documentation to the list of methods.
Now, if you have read through the official documentation on how to extend Python
with C, you might be wondering why we are calling PyModule_Create2
instead of
PyModule_Create
(is there a typo?), and why we are passing it two arguments
instead of one. If you are starting to smell a C macro, then you are correct!
Long story short, PyModule_Create
is a macro defined as
#define PyModule_Create(module) PyModule_Create2(module, PYTHON_API_VERSION)
with PYTHON_API_VERSION
defined as the literal 1013. So the actual function to
call is indeed PyModule_Create2
, and it takes two arguments.
Did you notice that weird
WRT ..plt
? Remember the discussion about ensuring position-independent code? Since we have no clue of where thePyModule_Create2
function resides in memory, we have to rely on some sort of indirection. This is provided by the so-called Procedure Linkage Table, or PLT for short, which is some code that is part of our shared library. When we callPyModule_Create2 WRT ..plt
, we are jumping to the PLT section of our object file in memory, which contains the necessary code to make the actual jump to the function that we want to call.
Installation
Once our assembly code is ready, it needs to be assembled and linked into a shared object file. We will now see how to perform these steps, and how to test and install our Python extension.
Assembling and Linking
Once the code is ready, it needs to be assembled and linked into the final shared object file. The NASM assembler is invoked with minimal arguments as
nasm -f elf64 -o asm/asm.o asm/asm.asm
This creates an intermediate object file asm.o
. To create the final shared
object file, we use the GNU linker with the following arguments
ld -shared -o asm/asm.so asm/asm.o -I/lib64/ld-linux-x86-64.so.2
Note the use of the -shared
switch, which instructs the linker to create a
shared object file.
How to Test the Module
The first thing that you might want to do is to manually test that the shared
object file works fine with Python. For CPython to be able to find the module,
we need to ensure that its location is included in the search path. One way is
to add it to the PYTHONPATH
environment variable. For example, from within the
project folder, we can launch Python with
PYTHONPATH=./asm python3
and from the interactive session we should be able to import the module with
>>> import asm
>>>
Alternatively, we can add the search path to sys.path
with these few lines of
Python code
>>> import sys
>>> sys.path.append("./asm")
>>> import asm
>>>
Once we have successfully imported our module in Python, we can test that its
method sayit
works as expected
>>> asm.sayit.__doc__
'This method has something important to say.'
>>> asm.sayit()
Assembly is great fun! :)
>>>
I hope that you would agree :).
Distributing the Module
The simplicity of our sample module wouldn't justify the use of setuptools
for
distribution. In this case, a simple, old-fashioned Makefile is the simplest
solution to go for. Even for larger projects, you would probably still delegate
the build job of your code to a Makefile anyway, which would then get called
from your setup.py
at some phase, perhaps during build
. However, the
recommended standard is that you build wheels instead of eggs, and the
requirement is that you provide pre-built binaries with your package.
This being said, let's see how to distribute the module. As we have seen in the
previous section, the shared object needs to resides in one of the Python search
paths. The easiest way to find out what these paths are on the platform that you
are targeting is to launch the Python interpreter and print the value of
sys.path
. On my platform, I get the following output
Python 3.6.3 (default, Oct 3 2017, 21:45:48)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.6/dist-packages']
>>>
The Makefile could then contain the following line inside the install
rule:
install: default
cp asm/asm.so /usr/lib/python${PYTHON_TARGET}/asm.so
with the environment variable PYTHON_TARGET
set to 3.6
.
To automate the building and testing process of the module, we could use Docker to build an image out of the target platform and trigger a build, and perhaps execute some unit tests too. A simple Dockerfile that does the minimum work to build and test would look something like the following
FROM ubuntu:latest
USER root
ADD . asm
RUN apt-get update &&\
apt-get install -y \
nasm \
python3-pytest \
build-essential
ENV PYTHON_TARGET=3.5
RUN cd asm &&\
make &&\
make install &&\
python3 -m pytest -s
As you can see, we are targeting the latest stable version of Ubuntu, which
comes with python3.5
. We make sure we install all the required dependencies,
the assembler and the standard build tools, along with python3-pytest
to
perform unit testing once our module builds successfully.
The bare minimum that we can test is that the import of the module works fine
and that we can call its method. So a possible test_asm.py
test script would
look like
import asm
def test_asm():
asm.sayit()
Conclusions
Whilst I appreciate that the cases where you'd want to seriously consider extending Python with Assembly code are rare, it is undoubtedly the case that, if you enjoy experimenting with code, this could be a fun and instructing experience. In my case, this has forced me to go look into the CPython header files, which I probably wouldn't have if I were using C. I now know more about the internal workings of Python and a clearer idea of how CPython is structured.
As always, I hope you have enjoyed the read. Happy Assembly coding! :)