Quoted from Cython's Documentation:
According to Cython's documentation, the compilation of Cython code consists of two stages:
To compile a Cython source code into an extension module, two files should be generated:
setup.py using distutils should look like:
setup.py using setuptools should look like:
In WinPython, Cython and MinGW compiler are integrated.
The procedure to compile a Cython code is:
The Python script setup.py using distutils or setuptools will work fine.
In the instruction from Using Microsoft Visual C++ Compiler for Python (only for Python 2.7.x), two methods are suggested.
The procedure to compile a Cython code is:
note: e:\WinPython-64bit-2.7.10.3\ is the path in which WinPython is installed on my computer.
The procedure to compile a Cython code is:
note: e:\WinPython-64bit-2.7.10.3\ is the path in which WinPython is installed on my computer.
The Python script setup.py using distutils or setuptools will work fine.
It is suggested to use setuptools in the Python script setup.py.
The content of the Python code convolve_py.py is:
The content of the Cython code convolve_cy2.pyx is:
The content of the Cython code convolve_cy3.pyx is:
The content of the Cython code convolve_cy4.pyx is:
The Python script setup_mingw.py to build ALL extension modules by MinGW compiler is:
The Python script convolve.py, which runs all convolution codes and shows running time, is:
The computer hardware and software specification is:
Operating system: Windows 10, 64bit
Power plan: High performance
Distribution: WinPython-64bit-2.7.10.3
Hardware: Computer 1
The running time is:
The Python script convolve_12M.py, which runs some convolution codes and shows running time, is:
The running time is:
The computer hardware and software specification is:
Windows platform:
Operating system: Windows 10, 64bit
Power plan: High performance
Distribution: WinPython-64bit-2.7.10.3
Linux platform:
Operating system: CAELinux 2013 (Ubuntu 12.04 LTS 64bit) Linux 3.2.0-111-generic
CPUFreq governor: Performance
Python: Python 2.7.3
Cython: 0.25b0
gcc: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Hardware: Computer 2
On Linux platform, following commands are executed:
Check CPUFreq governor by running command:
Set CPUFreq governor as "performance" for all cores by running command:
Since memoryviews need Cython version 0.16 and higher, update Cython by running command:
For the Numpy array f of size 200 by 200, the running time is:
For the Numpy array f of size 4,000 by 3,000, the running time is:
"... Cython is Python with C data types. ... Almost any piece of Python code is also valid Cython code. ... But Cython is much more than that, because parameters and variables can be declared to have C data types. ..."
According to Cython's documentation, the compilation of Cython code consists of two stages:
- A Cython source code .pyx file is compiled to a .c file
- The .c file is compiled to an extension module (.pyd file on Windows platform or .so file on Unix-like platform)
To compile a Cython source code into an extension module, two files should be generated:
- Cython source code .pyx file(s)
- A Python script (usually named setup.py) to build extension module
During the compilation, lots of warning messages will be generated. As long as an extension module is created, the compilation will be done successfully.
After the compilation, a directory named build will be created and intermediate files will be created in the build directory. If a Cython code is compiled with MinGW, intermediate files include .def and .o file; if with Microsoft Visual C++ Compiler, intermediate files include .exp, .lib, .obj and .manifest file.
Basic Tutorial
The "Basic Tutorial" in Cython's Documentation is quite simple and easy, and here it will be complied with WinPython or Microsoft Visual C++ Compiler for Python 2.7.
Both WinPython and Microsoft Visual C++ Compiler for Python 2.7 are installed on my computer.
In the Python script setup.py, it may use distutils or setuptools to build an extension module.
setup.py using distutils should look like:
from distutils.core import setup from Cython.Build import cythonize setup( ext_modules = cythonize("helloworld.pyx") )
setup.py using setuptools should look like:
try: from setuptools import setup from setuptools import Extension except ImportError: from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize setup( ext_modules = cythonize("helloworld.pyx") )
WinPython
In WinPython, Cython and MinGW compiler are integrated.
The procedure to compile a Cython code is:
Open [WinPython Command Prompt], and change the current working directory to the directory containing the Cython source code. Then run the command:
$ python setup.py build_ext --inplace
The Python script setup.py using distutils or setuptools will work fine.
Microsoft Visual C++ Compiler for Python 2.7
In the instruction from Using Microsoft Visual C++ Compiler for Python (only for Python 2.7.x), two methods are suggested.
Method 1
The procedure to compile a Cython code is:
Open [Visual C++ 2008 64-bit Command Prompt], and change the current working directory to the directory containing Cython source code. Then run the command:
$ e:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe setup.py build_ext --inplace --compiler=msvc
note: e:\WinPython-64bit-2.7.10.3\ is the path in which WinPython is installed on my computer.
The Python script setup.py using setuptools will work fine; however, the Python script setup.py using distutils will show the following error message:
error: Unable to find vcvarsall.batMethod 2
The procedure to compile a Cython code is:
Open [Visual C++ 2008 64-bit Command Prompt], and change the current working directory to the directory containing Cython source code. Then run the command:
$ SET DISTUTILS_USE_SDK=1 $ SET MSSdk=1 $ e:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe setup.py build_ext --inplace --compiler=msvc
note: e:\WinPython-64bit-2.7.10.3\ is the path in which WinPython is installed on my computer.
The Python script setup.py using distutils or setuptools will work fine.
It is suggested to use setuptools in the Python script setup.py.
Working with NumPy
Numpy is a powerful and popular package to handle n-dimensional array, matrix, vector and linear algebra, etc. Supporting Numpy is crucial for any package to step toward success.
The Cython Documentation "Working with NumPy" covers section "Convolution by Python" to section "Convolution by Cython rev.4".
Convolution by Python
The content of the Python code convolve_py.py is:
from __future__ import division import numpy as np def naive_convolve(f, g): # f is an image and is indexed by (v, w) # g is a filter kernel and is indexed by (s, t), # it needs odd dimensions # h is the output image and is indexed by (x, y), # it is not cropped if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") # smid and tmid are number of pixels between the center pixel # and the edge, ie for a 5x5 filter they will be 2. # # The output size is calculated by adding smid, tmid to each # side of the dimensions of the input image. vmax = f.shape[0] wmax = f.shape[1] smax = g.shape[0] tmax = g.shape[1] smid = smax // 2 tmid = tmax // 2 xmax = vmax + 2*smid ymax = wmax + 2*tmid # Allocate result image. h = np.zeros([xmax, ymax], dtype=f.dtype) # Do convolution for x in range(xmax): for y in range(ymax): # Calculate pixel value for h at (x,y). Sum one component # for each pixel (s, t) of the filter g. s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Convolution by Cython rev.1
The content of the Cython code convolve_cy1.pyx is exactly the same as the Python code convolve_py.py.
Convolution by Cython rev.2
The content of the Cython code convolve_cy2.pyx is:
from __future__ import division import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) def naive_convolve(np.ndarray f, np.ndarray g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2*smid cdef int ymax = wmax + 2*tmid cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in range(xmax): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Convolution by Cython rev.3
The content of the Cython code convolve_cy3.pyx is:
from __future__ import division import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2*smid cdef int ymax = wmax + 2*tmid cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in range(xmax): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Convolution by Cython rev.4
The content of the Cython code convolve_cy4.pyx is:
from __future__ import division import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) cimport cython @cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2*smid cdef int ymax = wmax + 2*tmid cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in range(xmax): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Typed Memoryviews
In Cython Documentation "Typed Memoryviews", it is noted that memoryviews are considerably faster than ndarray objects. And it covers section "Convolution by Cython rev.5".
Convolution by Cython rev.5
The Cython code convolve_cy5.pyx is based on the former Cython code convolve_cy4.pyx, and the only difference is that the function arguments are typed as memoryviews.
from __future__ import division import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) cimport cython @cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def naive_convolve(DTYPE_t[:, :] f, DTYPE_t[:, :] g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") # memoryview object has no attribute 'dtype' # assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2*smid cdef int ymax = wmax + 2*tmid cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in range(xmax): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Using Parallelism
In Cython Documentation "Using Parallelism", it is noted that Cython uses OpenMP to achieve native parallelism. And it covers section "Convolution by Cython rev.6_4" and section "Convolution by Cython rev.6_5".
Convolution by Cython rev.6_4
The Cython code convolve_cy6_4.pyx is based on the former Cython code convolve_cy4.pyx, and the only difference is parallel loop prange.
from __future__ import division import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) cimport cython from cython.parallel import prange @cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2*smid cdef int ymax = wmax + 2*tmid cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in prange(xmax, nogil=True): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value = value + g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Convolution by Cython rev.6_5
The Cython code convolve_cy6_5.pyx is based on the former Cython code convolve_cy5.pyx, and the only difference is parallel loop prange.
from __future__ import division import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) cimport cython from cython.parallel import prange @cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def naive_convolve(DTYPE_t[:, :] f, DTYPE_t[:, :] g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") # memoryview object has no attribute 'dtype' # assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2*smid cdef int ymax = wmax + 2*tmid cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in prange(xmax, nogil=True): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value = value + g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Running Convolution Code
The Python script setup_mingw.py to build ALL extension modules by MinGW compiler is:
try: from setuptools import setup from setuptools import Extension except ImportError: from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize import numpy ext_modules = [ Extension( "*", ["*.pyx"], extra_compile_args=['-fopenmp'], extra_link_args=['-fopenmp'], ) ] setup( ext_modules = cythonize(ext_modules), include_dirs=[numpy.get_include()] )
The Python script setup_msvc.py to build ALL extension modules by Microsoft Visual C++ compiler is:
try: from setuptools import setup from setuptools import Extension except ImportError: from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize import numpy ext_modules = [ Extension( "*", ["*.pyx"], extra_compile_args=['/openmp'], extra_link_args=[''], ) ] setup( ext_modules = cythonize(ext_modules), include_dirs=[numpy.get_include()] )
It would be appreciated if anyone could show me how to integrate setup.py for both MinGW compiler and Microsoft Visual C++ compiler.
The Python script convolve.py, which runs all convolution codes and shows running time, is:
import numpy as np N = 200 f = np.arange(N*N, dtype=np.int).reshape((N,N)) g = np.arange(81, dtype=np.int).reshape((9, 9)) print "2D discrete convolution of an image" print "f array size", f.shape[0], "by", f.shape[1] print "g array size", g.shape[0], "by", g.shape[1] print import timeit t_start = timeit.default_timer() import convolve_py convolve_py.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_py takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy1 convolve_cy1.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy1 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy2 convolve_cy2.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy2 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy3 convolve_cy3.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy3 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy4 convolve_cy4.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy4 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy5 convolve_cy5.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy5 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy6_4 convolve_cy6_4.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy6_4 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy6_5 convolve_cy6_5.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy6_5 takes {:.5f} second".format((t_end -t_start))
The computer hardware and software specification is:
Operating system: Windows 10, 64bit
Power plan: High performance
Distribution: WinPython-64bit-2.7.10.3
Hardware: Computer 1
Since np.dtype('int').itemsize = 4 (bytes) on my computer 1, the size of the Numpy array f is 200 * 200 * 4 (bytes) = 160,000 (bytes), and is less than CPU L3 cache.
The running time is:
WinPython (1) | Visual C++ (2) | |||
Running Time (second) | Speed-up | Running Time (second) | Speed-up | |
convolve_py | 5.19602 | 1.00 | 4.98449 | 1.00 |
convolve_cy1 | 3.56622 | 1.46 | 3.74166 | 1.33 |
convolve_cy2 | 1.42561 | 3.64 | 1.58573 | 3.14 |
convolve_cy3 | 0.01646 | 315.68 | 0.02409 | 206.91 |
convolve_cy4 | 0.00641 | 810.61 | 0.00685 | 727.66 |
convolve_cy5 | 0.00572 | 908.40 | 0.00823 | 605.65 |
convolve_cy6_4 | 0.00406 | 1279.81 | 0.00621 | 802.66 |
convolve_cy6_5 | 0.00479 | 1084.76 | 0.00534 | 933.43 |
note1: WinPython-64bit-2.7.10.3 | ||||
note2: Microsoft Visual C++ Compiler for Python 2.7 |
In this test, it seems that Microsoft Visual C++ Compiler for Python 2.7 does not provide extra speed-up by using memoryviews.
Before each build and test, all non-source files including .pyc file, and .c, .pyd file are deleted in order to have a fresh start.
12 Megapixel Resolution
Since high-resolution digital cameras have gained popularity, the Numpy array f is set as the size of 12 Megapixel, 4000 x 3000.
The Python script convolve_12M.py, which runs some convolution codes and shows running time, is:
import numpy as np N1 = 4000 N2 = 3000 f = np.arange(N1*N2, dtype=np.int).reshape((N1,N2)) g = np.arange(81, dtype=np.int).reshape((9, 9)) print "2D discrete convolution of an image" print "f array size", f.shape[0], "by", f.shape[1] print "g array size", g.shape[0], "by", g.shape[1] print import timeit # t_start = timeit.default_timer() # import convolve_py # convolve_py.naive_convolve(f, g) # t_end = timeit.default_timer() # print "convolve_py takes {:.5f} second".format((t_end -t_start)) # t_start = timeit.default_timer() # import convolve_cy1 # convolve_cy1.naive_convolve(f, g) # t_end = timeit.default_timer() # print "convolve_cy1 takes {:.5f} second".format((t_end -t_start)) # t_start = timeit.default_timer() # import convolve_cy2 # convolve_cy2.naive_convolve(f, g) # t_end = timeit.default_timer() # print "convolve_cy2 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy3 convolve_cy3.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy3 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy4 convolve_cy4.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy4 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy5 convolve_cy5.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy5 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy6_4 convolve_cy6_4.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy6_4 takes {:.5f} second".format((t_end -t_start)) t_start = timeit.default_timer() import convolve_cy6_5 convolve_cy6_5.naive_convolve(f, g) t_end = timeit.default_timer() print "convolve_cy6_5 takes {:.5f} second".format((t_end -t_start))
The computer hardware and software specification is the same as the former section "Running Convolution Code".
Since np.dtype('int').itemsize = 4 (bytes) on my computer 1, the size of the Numpy array f is 4,000 * 3,000 * 4 (bytes) = 48,000,000 (bytes), and is greater than CPU L3 cache.
The running time is:
WinPython (1) | Visual C++ (2) | |||
Running Time (second) | Speed-up | Running Time (second) | Speed-up | |
convolve_py | - | - | - | - |
convolve_cy1 | - | - | - | - |
convolve_cy2 | - | - | - | - |
convolve_cy3 | 4.63142 | - | 6.10418 | - |
convolve_cy4 | 1.65433 | - | 1.73886 | - |
convolve_cy5 | 1.49566 | - | 2.14620 | - |
convolve_cy6_4 | 0.61500 | - | 1.20628 | - |
convolve_cy6_5 | 0.57597 | - | 1.14869 | - |
note1: WinPython-64bit-2.7.10.3 | ||||
note2: Microsoft Visual C++ Compiler for Python 2.7 |
The running time of convolve_py, convolve_cy1, convolve_cy2 is too long, hence they are skipped.
In this test, it seems that Microsoft Visual C++ Compiler for Python 2.7 does not provide extra speed-up by using memoryviews; the running time for WinPython is obviously less than Microsoft Visual C++ Compiler for Python 2.7.
Compare the Results on Windows and Linux platform
In this section the results on Windows platform and Linux platform will be compared. Because my computer 1 has problem running CAELinux 2013, all convolution codes are run on my computer 2.
The computer hardware and software specification is:
Windows platform:
Operating system: Windows 10, 64bit
Power plan: High performance
Distribution: WinPython-64bit-2.7.10.3
Linux platform:
Operating system: CAELinux 2013 (Ubuntu 12.04 LTS 64bit) Linux 3.2.0-111-generic
CPUFreq governor: Performance
Python: Python 2.7.3
Cython: 0.25b0
gcc: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Hardware: Computer 2
On Linux platform, following commands are executed:
Check CPUFreq governor by running command:
$ cpufreq-info
Set CPUFreq governor as "performance" for all cores by running command:
$ sudo cpufreq-set -r -g performance
Since memoryviews need Cython version 0.16 and higher, update Cython by running command:
$ sudo pip install --upgrade git+git://github.com/cython/cython@master
For the Numpy array f of size 200 by 200, the running time is:
WinPython (1) | Visual C++ (2) | Linux (3) | ||||
Running Time (second) | Speed-up | Running Time (second) | Speed-up | Running Time (second) | Speed-up | |
convolve_py | 5.40756 | 1.00 | 5.65036 | 1.00 | 4.30537 | 1.00 |
convolve_cy1 | 4.22743 | 1.28 | 4.52284 | 1.25 | 3.94759 | 1.09 |
convolve_cy2 | 1.87212 | 2.89 | 1.93266 | 2.92 | 5.43121 | 0.79 |
convolve_cy3 | 0.02253 | 240.02 | 0.02494 | 226.56 | 0.02066 | 208.39 |
convolve_cy4 | 0.00746 | 724.87 | 0.00778 | 726.27 | 0.00718 | 599.63 |
convolve_cy5 | 0.00749 | 721.97 | 0.00852 | 663.19 | 0.00753 | 571.76 |
convolve_cy6_4 | 0.00484 | 1117.26 | 0.00883 | 639.90 | 0.00787 | 547.06 |
convolve_cy6_5 | 0.00517 | 1045.95 | 0.00648 | 871.97 | 0.00763 | 564.27 |
note1: WinPython-64bit-2.7.10.3 | ||||||
note2: Microsoft Visual C++ Compiler for Python 2.7 | ||||||
note3: CAELinux (Ubuntu 14, 64bit + gcc) |
For the Numpy array f of size 4,000 by 3,000, the running time is:
WinPython (1) | Visual C++ (2) | Linux (3) | ||||
Running Time (second) | Speed-up | Running Time (second) | Speed-up | Running Time (second) | Speed-up | |
convolve_py | - | - | - | - | - | - |
convolve_cy1 | - | - | - | - | - | - |
convolve_cy2 | - | - | - | - | - | - |
convolve_cy3 | 6.44019 | - | 7.17841 | - | 6.14739 | - |
convolve_cy4 | 1.90992 | - | 2.00507 | - | 2.11415 | - |
convolve_cy5 | 1.91129 | - | 2.19844 | - | 2.10054 | - |
convolve_cy6_4 | 0.99059 | - | 1.53905 | - | 2.24326 | - |
convolve_cy6_5 | 0.98661 | - | 1.46488 | - | 2.19705 | - |
note1: WinPython-64bit-2.7.10.3 | ||||||
note2: Microsoft Visual C++ Compiler for Python 2.7 | ||||||
note3: CAELinux (Ubuntu 14, 64bit + gcc) |
The running time on Windows platform and Linux platform are roughly the same, except that the running time of convolve_cy2 on Linux platform is significantly longer than Windows platform, and the running time of convolve_cy6_4 and convolve_cy6_5 is different for WinPython, Visual C++, Linux.
Code convolve_cy6_4 and convolve_cy6_5 don't provide any speed boost on Linux platform, and no error message is generated during compilation. It would be appreciated if anyone could show me how to make parallel loop prange working functionally on Linux platform.
沒有留言:
張貼留言