I’ve have been aware of Cython for a few years but newer had chance to really test it in practice (apart of few dummy exercises). Recently I’ve decided to look at it again and test it on my old project adecapcha. I was quite pleased with results, where I was able speed up the program significantly with minimum changes to the code.
I used following approach to improve the app performance with Cython:
- Profile the application (or it’s part) – with cProfile and gprof2dot to give nice graphical view of computing time split across functions. (Alternatively you can use pyvmmonitor, which nicely integrates with PyDev or PyCharms). Below is the profiling result of adecaptcha before Cython optimalization:
Here we can see two branches, which took majority on program time – one is for loading audio file, but it’s spending most of its time in Python standard module wave (scipy.io provides faster alternative, but not compatible with all wav files) . However the second branch is completely under our control. We can see that a lot of time is spent intwin
andwf
functions ( triangular window calculations). - Let’s look at functions identified by profiling:
def twin(start,stop): def wf(len): for n in xrange(len): yield 1.0 - np.abs(((len-1)/2.0 -n)/ ((len)/2.0)) len=stop-start return np.array(list(wf(len)))
This is obviously not very efficient code (could be marginally optimized by using list comprehension expression – but performance will be approximately the same – 24% of total time vs 27%).
- To introduce significant performance change we have to implement this function in Cython:
cdef extern from "math.h": double fabs(double x) ctypedef np.double_t DTYPE_t def twin(start, stop): cdef: np.ndarray[DTYPE_t] arr int n int len=stop-start arr=np.zeros(len) for n in range(len): arr[n]= 1.0 - fabs(((len-1)/2.0 -n)/ (len/2.0)) return arr
As you can see the changes are minimal, we just reimplemented twin function, this time with statically typed loop control variable and numpy array.
- Profile again to see changes:
As you can seetwin
function is not an issue any more ( allcalc_mfcc
branch is now basically spend in FFT calculation).
Cython can be also used to create Python bindings to C and C++ libraries. Although the interface has to be written manually (while some other tools like SWIG can generate it automatically) and it requires some effort, on the other hand it forces authors to think about the interface design and it can result in very nice ‘pythonic’ interfaces, which can save you a lot of time later. Key insight here is that we do not need to interface all functions and classes in C or C++ library, but only those, which are important for our needs. Recently I’ve created a binding to libpoppler (PDF parsing library) and found this task fairly easy and quite enjoyable to do.
Conclusions
Even with few very simple changes into original code I was able to achieve significant performance improvements (run time 0.27s vs 0.41s – for one audio captcha, that’s app. 35 % improvement).
Programming in Cython is relatively easy so I was able to implement these changes quickly, without particular issues. Complication errors were fairly well described, so I was able to fix problems quickly. Building can be defined in setup.py in a straightforward manner or can be even automatic ( with pyximport – however pyximport machinery takes some time – so for smaller programs manual compilation is more effective).
However it has to be said that not always path to performance improvements with Cython is so straightforward. In another project ( Voronoi diagram in bounded box a part of myplaces project) I’ve tried too and I was not able to achieve any performance gain ( quite opposite cythonized version was about 30% slower). I probably should try harder, but obviously sometimes it’s difficult to achieve required improvements.
Cython can be also used to create nice pythonic interfaces to existing C and C++ libraries.
Overall I was quite impressed and looking forward to use Cython in future projects.