Auto-generate a Python module that wraps a Nim programming language module.
The Pymod software consists of Nim bindings & Python scripts to automate the generation of Python C-API extension module boilerplate for Nim "procs" (procedures). After the Pymod script has been run, there will be an auto-generated, auto-compiled Python module that exposes the Nim procs in Python.
There's even a type (PyArrayObject
) that provides a Nim interface to
Numpy arrays,
so you can pass Numpy arrays into your Nim procs from Python and access them
natively.
The auto-generated C-API boilerplate code handles the parsing & type-checking of the function arguments passed from Python, including correct handling of Python ref-counts if a type error occurs or an exception is raised. The boilerplate code also translates Nim exceptions (including back-traces) to Python exceptions. The boilerplate code even includes auto-generated Python docstrings that have been extracted from the Nim procs.
Pymod is definitely still in the in-development phase of software maturity, and it's far from feature-complete, but it's been usable for our work for about 9 months now (and we've been using it regularly during that time). There's a lot of hacky code in there, but it gets the job done.
- Motivation
- Nim
- Example
- Usage
- System requirements
- Per-project configuration
- Procedure parameter & return types
- Docstrings
- PyArrayObject type
- PyArrayIter types
- PyArrayObject & PyArrayIter usage example
- PyArrayIter loop idioms
- Tips, warnings & gotchas
- What about calling Python from Nim?
- Implementation details
Perhaps you have a large body of existing Python code, that you can't or don't want to rewrite. Perhaps you want to use Numpy, Scipy or Matplotlib. Perhaps your program's main loop simply must be in Python.
However, you would like to write Nim code and then call your Nim procs from your Python code. (There are many more systems written in Python than in Nim, but it would be great to start extending them in Nim!) You can write Python C-API extension modules to wrap your Nim procs, but all the C-API boilerplate is a huge drag, especially if you check types and manage reference counts and handle Nim exceptions properly.
That's what Pymod is for.
If you'd like to learn more about the Nim programming language, we recommend:
- the official Nim tutorial, part 1 (followed by part 2)
- the very approachable Nim by Example, which offers a series of short, simple lessons about the main Nim features
- the comprehensive Nim manual
- the Python-vs-Nim feature comparison table at Nim for Python Programmers
Here's a short "Hello world" example (assumed to be in a file called greeting.nim
):
## Compile this Nim module using the following command:
## python path/to/pmgen.py greeting.nim
import strutils # `%` operator
import pymod
import pymodpkg/docstrings
proc greet*(audience: string): string {.exportpy.} =
docstring"""Greet the specified audience with a familiar greeting.
The string returned will be a greeting directed specifically at that audience.
"""
return "Hello, $1!" % audience
initPyModule("hw", greet)
Use the Python script pmgen.py
to auto-generate & compile the boilerplate code:
python path/to/pmgen.py greeting.nim
There will now be a compiled Python extension module hw.so
in the current directory.
(It is called hw
because that is the name that was specified in the initPyModule()
macro).
In a Python interpreter, you can import the module and invoke the greet
function:
>>> import hw
>>> hw.greet
<built-in function greet>
>>> hw.greet("World")
'Hello, World!'
>>>
You can also invoke the built-in Python interpreter help
function about the greet
function:
>>> help(hw.greet)
Help on built-in function greet in module hw:
greet(...)
greet(audience: str) -> (str)
Parameters
----------
audience : str -> string
Returns
-------
out : (str) <- (string)
Greet the specified audience with a familiar greeting.
The string returned will be a greeting directed specifically at that audience.
There is additional example code in the examples directory.
Using Pymod is a 4-step process. In brief:
import pymod
at the top of your Nim module.- Add the
{.exportpy.}
pragma after each Nim proc. - Invoke the
initPyModule("modname", proc1, proc2, proc3)
macro at the bottom of your Nim module. - Run the
pmgen.py
Python script to compile everything.
In more detail:
- At the top of your Nim module, import the module
pymod
.
- Tip: You might additionally wish to import
pymodpkg/docstrings
(to enable Python-like docstrings) and/orpymodpkg/pyarrayobject
(to enable thePyArrayObject
type that corresponds to Numpy's array type).
- In your Nim module, annotate the
{.exportpy.}
pragma onto each Nim proc to be exported to Python. (This pragma is named by analogy with the standard Nim{.exportc.}
pragma). - At the end of your Nim module, configure the Python module to be generated,
using the
initPyModule()
macro: Specify the desired Python module name as a string (without a filename suffix), followed by the names of the Nim procs that should be compiled into the Python module.
- For example, if you supply the string
"foo"
as the first argument toinitPyModule()
, the generated Python module will be calledfoo.so
. - Tip: You can use the
initPyModule()
macro multiple times at the end of your Nim module, with different Python module names & different combinations of Nim procs, to generate multiple Python modules. - Tip: If you specify the empty string
""
as the Python module name, the generated Python module will have the same name as the Nim module, but with an underscore"_"
prepended. So for example, a Nim modulebar.nim
would be compiled to a Python module_bar.so
.
- Invoke the supplied Python script
pmgen.py
, supplying the filename of your Nim module as a command-line argument, to auto-generate & invoke a set of Makefiles that will in turn initiate & run the Pymod process.
When the script pmgen.py
is run, it will create a subdirectory pmgen
in
the current directory. All the source code auto-generated by Pymod will be
placed into this subdirectory and compiled. At the end of the compilation
process, the new Python module, a .so
(shared object) file, will be moved
back into the current directory.
Note that the {.exportpy.}
pragma & initPyModule()
macro are
inert by default (that is, they have no effect), so you can add them to
existing Nim code without changing the default operation of that Nim code.
It's only when you run the script pmgen.py
(which, among other actions,
supplies the switch --define:pmgen
to the Nim compiler) that the
{.exportpy.}
pragma & initPyModule()
macro are activated.
- The latest Nim compiler from Github
- either the
master
ordevel
branches - but not the recent Nim 0.12.0 release. :(
- (A suggested fix for this Nim 0.12.0 packaging problem has been proposed on the Nim forums.)
- either the
- CPython 2.7 or CPython 3.2+
- Python C development header files & static library
- Numpy
- Numpy C development header files
- Make
- a C compiler for use by Nim
If there is a file pymod.cfg
in the same directory as the Nim module you want
to wrap, Pymod will read this as a configuration file for that project.
By default, Pymod runs the Nim compiler in non-release mode, and additionally
performs per-dereference bounds-checking of the PyArrayObject
iterators.
This is safe (and catches all sorts of pesky bugs!) but slow.
If the file pymod.cfg
in the current directory contains the following directives:
[all]
nimSetIsRelease: true
then the Nim compiler will be invoked in release mode, and bounds-checking
of the PyArrayObject
iterators will be switched off. Your code will now run
much faster!
The following Nim types are currently supported by Pymod:
Type family | Nim types | Python2 type | Python3 type |
---|---|---|---|
floating-point | float , float32 , float64 , cfloat , cdouble |
float |
float |
signed integer | int , int16 , int32 , int64 , cshort , cint , clong |
int |
int |
unsigned integer | uint , uint8 , uint16 , uint32 , uint64 , cushort , cuint , culong , byte |
int |
int |
non-unicode character | char , cchar |
str |
bytes |
string | string |
str |
str |
Numpy array | ptr PyArrayObject |
numpy.ndarray |
numpy.ndarray |
Support for the following Nim types is in development:
Type family | Nim types | Python2 type | Python3 type |
---|---|---|---|
signed integer | int8 |
int |
int |
boolean | bool |
bool |
bool |
unicode code point (character) | unicode.Rune |
unicode |
str |
non-unicode character sequence | seq[char] |
str |
bytes |
unicode code point sequence | seq[unicode.Rune] |
unicode |
str |
sequence of a single type T | seq[T] |
list |
list |
Procedure parameters may be any of the above supported Nim types.
Default parameters are supported to a limited extent, although the parameter
type must be specified explicitly, and is currently restricted to the string
,
integer & floating-point types.
Procedure return values may be any of the above supported Nim types or a Nim tuple of any of these types. Nested tuples are currently not supported. By default, named tuples in Nim are returned as raw tuples to Python:
# Nim # Python
tuple[ a, b: int ] => (a_value, b_value)
If {.exportpy.}
is specified as {.exportpy returnDict.}
then the generated
code will instead return a Python dict containing the named properties:
# Nim # Python
tuple[ a, b: int ] => { "a": a_value, "b": b_value }
You can tell Pymod about additional Nim types using the definePyObjectType()
macro. This will include your additional type-mapping in Pymod's type-mapping
registry, similar to how Pymod maps its own PyArrayObject
type to Numpy's
array type. This provides Pymod with a mapping from an already-defined Nim
type to the corresponding Python & C-API types, enabling Pymod to generate
the Nim-Python conversions & type-checking boilerplate for additional types.
Pymod will also auto-generate a Python docstring for each function in the
extension module, specifying the function's parameter types & return type,
based upon the parameter types & return types of the exported Nim proc.
You can embed additional documentation in each Nim proc you want to export,
using the supplied docstring"""Text goes here."""
string type. Any docstrings
in the proc will be extracted automatically and included in the generated Python
docstring. There is an example of docstring usage in the code sample above.
Pymod provides the PyArrayObject
type to allow Python code to pass
Numpy ndarrays
into Nim procs with appropriate type-safety. To access the PyArrayObject
Nim type definition, import pymodpkg/pyarrayobject
in your Nim module (after
you have already imported pymod
).
Because the Numpy array object was allocated in Python, the type of the Nim
proc parameter or return value is ptr PyArrayObject
. Note that it is a
Nim ptr
, not a Nim ref
. Your code should pass around ptr PyArrayObject
.
Pymod also wraps many C functions from the
Numpy C-API
for Numpy array manipulation & attribute access.
To review the full list of PyArrayObject
procs that Pymod provides, browse
the Pymod source file pymodpkg/pyarrayobject.nim
.
Here are some of the Numpy array attributes that Pymod exposes:
.data
(returnspointer
).data(T)
(returnsptr T
).descr
.dimensions
.dtype
.nd
.ndim
(an alias for.nd
).shape
(an alias for.dimensions
).strides
Here are some of the Numpy functions for array creation & manipulation that Pymod wraps:
createSimpleNew(dims, npType)
createNewCopyNewData(oldArray, order)
copy(oldArray)
(an alias forcreateNewCopyNewData
)createAsTypeNewData(oldArray, newType)
doCopyInto(destArray, srcArray)
doFILLWBYTE(destArray, val)
doResizeDataInplace(oldArray, newShape, doRefCheck)
Note that, due to its typeless Pythonic origin, PyArrayObject
is not a
Nim generic type. So the element data-type of a PyArrayObject
instance
is unknown to Nim. The Nim code must specify the correct element data-type
for the PyArrayObject
elements. The preferred method of accessing the
(appropriately-typed) elements of a PyArrayObject
instance is to use one of
the two supplied PyArrayIter
types:
PyArrayForwardIter[T]
, a C++-style Forward Iterator- returned by
.iterateFlat(T)
- can only be incremented & dereferenced
- the fastest & safest iteration style
- returned by
PyArrayRandAccIter[T]
, a C++-style Random-Access Iterator- returned by
.accessFlat(T)
- can be incremented or decremented by any integer; offset (using
+
or-
) by any integer; indexed by any integer; & dereferenced - basically a C pointer with bounds-checking
- returned by
Both of the PyArrayIter
types offer 1-D iteration & indexing over
a "flat" interpretation of the Numpy N-D array data. These two iterator types
are inspired by the
C++ iterator category model.
By default, the iterators implement per-dereference bounds-checking.
This bounds-checking can be disabled, as described above in the section
Per-project configuration.
Note that the PyArrayIter
types can't handle any of the following
usage scenarios:
- non-C-contiguous array data
- strides
- multi-dimensional indexing
If you attempt to iterate over a Numpy array with non-C-contiguous data,
an AssertionError
will be raised (even in release mode). If you supply
the incorrect array element data-type when invoking .iterateFlat(T)
or .accessFlat(T)
, an ObjectConversionError
will be raised (even in
release mode).
Here is a simple example of how to use PyArrayObject
& PyArrayForwardIter[T]
:
import strutils # `%`
import pymod
import pymodpkg/docstrings
import pymodpkg/pyarrayobject
proc addVal*(arr: ptr PyArrayObject, val: int32) {.exportpy} =
docstring"""Add `val` to each element in the supplied Numpy array.
The array is assumed to have dtype `int32`; otherwise, a ValueError will be
raised. The elements in the array will be modified in-place.
"""
let dt = arr.dtype
echo "PyArrayObject has shape $1 and dtype $2" % [$arr.shape, $dt]
if dt == np_int32:
let bounds = arr.getBounds(int32) # Iterator bounds
var iter = arr.iterateFlat(int32) # Forward iterator
while iter in bounds:
iter[] += val
inc(iter) # Increment the iterator manually.
else:
let msg = "expected array of dtype $1, received dtype $2" % [$np_int32, $dt]
raise newException(ValueError, msg)
initPyModule("_myModule", addVal)
You can test the Pymod-wrapped Nim proc addVal
using a Python script like this:
import numpy as np
import _myModule as mm
int32arr = np.arange(10, dtype=np.int32).reshape((2, 5))
print(int32arr)
mm.addVal(int32arr, 101)
print(int32arr)
print("")
float32arr = np.arange(10, dtype=np.float32).reshape((2, 5))
print(float32arr)
mm.addVal(float32arr, 101) # Uh-oh! Our `addVal` proc wants an array with dtype == `np.int32`!
print(float32arr)
The output from running this script will look something like this:
[[0 1 2 3 4]
[5 6 7 8 9]]
PyArrayObject has shape @[2, 5] and dtype numpy.int32
[[101 102 103 104 105]
[106 107 108 109 110]]
[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]]
PyArrayObject has shape @[2, 5] and dtype numpy.float32
Traceback (most recent call last):
File "test_addvalmod.py", line 13, in <module>
mm.addVal(float32arr, 101) # Uh-oh! Our `addVal` proc wants an array with dtype == `np.int32`!
ValueError: expected array of dtype numpy.int32, received dtype numpy.float32
Nim traceback (most recent call last):
File "pmgen_myModule_wrap.nim", line 26, in exportpy_addVal
File "addvalmod.nim", line 22, in addVal
Observe the while
loop that was used in addVal
to iterate over the array.
This is the most flexible loop idiom for forward-iterating over an array,
since you are able to control where, and how many times, the forward iterator
will be incremented within the body of the loop:
let bounds = arr.getBounds(int32) # Iterator bounds
var iter = arr.iterateFlat(int32) # Forward iterator
while iter in bounds:
iter[] += val
inc(iter) # Increment the iterator manually
However, this while
loop idiom is sometimes more verbose than it needs to be.
Often, you only need to increment the forward iterator once per iteration,
at the end of the body of the loop. If this is all you need, there are 4 iterators
that you can use, which enable shorter for
loop idioms:
values(arr, T) -> T
items(PyArrayIter[T]) -> T
mitems(PyArrayIter[T]) -> var T
iitems(PyArrayIter[T]) -> PyArrayIter[T]
For example, if you need to modify the items in the array:
for iter in iitems(arr.iterateFlat(int32)): # `iitems(iter)` iterator yields a PyArrayIter[T].
iter[] += val
or:
for mval in mitems(arr.iterateFlat(int32)): # `mitems(iter)` iterator yields a mutable value.
mval += val
If you don't need to modify the array data at all, there are even shorter for
loop
idioms that yield a succession of (read-only) array values:
var maxVal: int32 = low(int32)
for v in arr.values(int32): # This uses the `values(arr, T)` iterator.
if v > maxVal:
maxVal = v
or:
var maxVal: int32 = low(int32)
for v in arr.iterateFlat(int32): # This uses the implicit `items(iter)` iterator.
if v > maxVal:
maxVal = v
Likewise for PyArrayRandAccIter[T]
, there are several loop idioms, which
offer different levels of control & convenience. For the most flexibility,
use a while
loop:
let bounds = arr.getBounds(int32) # Iterator bounds
var iter = arr.accessFlat(int32) # Random access iterator
while iter in bounds:
iter[0] += val # or equivalently: iter[] += val
inc(iter, incDelta) # Increment the iterator manually
There are multiple different for
loop forms available for PyArrayRandAccIter[T]
,
to make it convenient to iterate over C-contiguous N-dimensional arrays. If you want to
visit every iterator position in turn, but retain the ability to index/offset arbitrarily,
use the 1-argument form of accessFlat
with the iitems()
iterator:
for iter in arr.accessFlat(int32).iitems:
iter[0] += val # or equivalently: iter[] += val
If you want to modify mutable values, but you don't need arbitrary index or offset
capability, you can use the mitems()
iterator:
for mval in arr.accessFlat(int32).mitems:
mval += val
If you want to increment by a certain specific delta each time, use the 2-argument
form of accessFlat
with any of the items()
, mitems()
or iitems()
iterators:
for iter in arr.accessFlat(int32, incDelta).iitems:
iter[] += val
or:
for mval in arr.accessFlat(int32, incDelta).mitems:
mval += val
And finally, if you want the iteration to begin at a certain initial offset,
then increment by a certain specific delta each time, use the 3-argument form
of accessFlat
:
for iter in arr.accessFlat(int32, initialOffset, incDelta).iitems:
iter[] += val
or:
for mval in arr.accessFlat(int32, initialOffset, incDelta).mitems:
mval += val
For example, if you want to visit just the "green" channel of an RGB image, you might use a loop like this:
let greenIdx = 1
let numChans = img.shape[2]
for g in img.accessFlat(uint8, greenIdx, numChans): # implicit `items(iter)` iterator.
processGreenComponent(g)
These code examples are all available in full in the examples directory.
Here are some helpful hints about a few sharp edges of Pymod (some of them due to sharp edges in Nim that we haven't been able to cover over completely) that can trip you up (and then confuse you with obscure compiler error messages):
- If you want to
exportpy
a proc using Pymod, don't give your proc the same name as the Nim module that contains the proc (or in fact, the same name as any other procs in that same module). - If you want to
exportpy
a proc using Pymod, ensure that the proc is also exported in Nim by marking it with an asterisk after the proc-name.
Pymod enables you to wrap your Nim procs so they can be called from Python.
If instead you want to call Python functions (maybe even the interpreter) from Nim (ie, the control flows in the opposite direction), Pymod is not what you're looking for.
In a situation like this, python.nim or the NimBorg project might be what you're looking for.
We want to make existing Python types (and extended types like Numpy arrays) available in Nim procs. The idea is that objects of these Python types will be created in the existing Python code, then passed through to Nim procs for processing, and then the results will be returned to the Python code for the pre-existing processing or viewing in Python.
For this reason, we decided not to use the
Python ctypes module;
the focus of ctypes
seems to be on propagating types in the opposite
direction. Instead of making fully-fledged Python types easily available
in C, ctypes
wraps lowest-common-denominator C types and structs for
basic access in Python.
The way
ctypes handles pointers
& structs
requires you to build your object types up in terms of C primitive types,
so they can be accessed as method-less, field-only objects in Python.
But the Python types we want to use in Nim are already defined in Python!
So ctypes
would be more applicable to us if we wanted to make existing
Nim data-structures available in Python, rather than the other way around.
On an initial reading,
Python CFFI &
cffiwrap appear to be more
useful for exposing
existing Python types in C
than ctypes
is.
However, for the initial implementation of Pymod, we chose the
CPython C-API &
Numpy C-API
over cffi
and/or cffiwrap
.
We reasoned that Nim is going to produce & auto-compile its own C code anyway,
so if we generate C-API C code for our Python integration, this C code can be
compiled & linked into a shared library by the Nim compiler itself -- thus
ensuring that the same compiler settings are used for both the compilation &
linking stages.
We are considering implementing an additional cffi
back-end for
Pymod in the future. This would be a significant step towards compatibility
with the PyPy Python interpreter.