2016年10月8日 星期六

Running Cython Tutorial

Quoted from Cython's Documentation:
"... Cython is Python with C data types. ... Almost any piece of Python code is also valid Cython code. ... But Cython is much more than that, because parameters and variables can be declared to have C data types. ..."

According to Cython's documentation, the compilation of Cython code consists of two stages:
  • A Cython source code .pyx file is compiled to a .c file
  • The .c file is compiled to an extension module (.pyd file on Windows platform or .so file on Unix-like platform)

To compile a Cython source code into an extension module, two files should be generated:
  • Cython source code .pyx file(s)
  • A Python script (usually named setup.py) to build extension module

During the compilation, lots of warning messages will be generated. As long as an extension module is created, the compilation will be done successfully.

After the compilation, a directory named build will be created and intermediate files will be created in the build directory. If a Cython code is compiled with MinGW, intermediate files include .def and .o file; if with Microsoft Visual C++ Compiler, intermediate files include .exp, .lib, .obj and .manifest file.

Basic Tutorial


The "Basic Tutorial" in Cython's Documentation is quite simple and easy, and here it will be complied with WinPython or Microsoft Visual C++ Compiler for Python 2.7.

Both WinPython and Microsoft Visual C++ Compiler for Python 2.7 are installed on my computer.

In the Python script setup.py, it may use distutils or setuptools to build an extension module.

setup.py using distutils should look like:
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("helloworld.pyx")
)

setup.py using setuptools should look like:
try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension

from Cython.Build import cythonize

setup(
    ext_modules = cythonize("helloworld.pyx")
)

WinPython


In WinPython, Cython and MinGW compiler are integrated.

The procedure to compile a Cython code is:

Open [WinPython Command Prompt], and change the current working directory to the directory containing the Cython source code. Then run the command:
$ python setup.py build_ext --inplace

The Python script setup.py using distutils or setuptools will work fine.

Microsoft Visual C++ Compiler for Python 2.7


In the instruction from Using Microsoft Visual C++ Compiler for Python (only for Python 2.7.x), two methods are suggested.

Method 1


The procedure to compile a Cython code is:

Open [Visual C++ 2008 64-bit Command Prompt], and change the current working directory to the directory containing Cython source code. Then run the command:
$ e:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe setup.py build_ext --inplace --compiler=msvc

note: e:\WinPython-64bit-2.7.10.3\ is the path in which WinPython is installed on my computer.

The Python script setup.py using setuptools will work fine; however, the Python script setup.py using distutils will show the following error message:
error: Unable to find vcvarsall.bat

Method 2


The procedure to compile a Cython code is:

Open [Visual C++ 2008 64-bit Command Prompt], and change the current working directory to the directory containing Cython source code. Then run the command:
$ SET DISTUTILS_USE_SDK=1
$ SET MSSdk=1
$ e:\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\python.exe setup.py build_ext --inplace --compiler=msvc

note: e:\WinPython-64bit-2.7.10.3\ is the path in which WinPython is installed on my computer.

The Python script setup.py using distutils or setuptools will work fine.



It is suggested to use setuptools in the Python script setup.py.





Working with NumPy

Numpy is a powerful and popular package to handle n-dimensional array, matrix, vector and linear algebra, etc. Supporting Numpy is crucial for any package to step toward success.

The Cython Documentation "Working with NumPy" covers section "Convolution by Python" to section "Convolution by Cython rev.4".

Convolution by Python


The content of the Python code convolve_py.py is:
from __future__ import division
import numpy as np
def naive_convolve(f, g):
    # f is an image and is indexed by (v, w)
    # g is a filter kernel and is indexed by (s, t),
    #   it needs odd dimensions
    # h is the output image and is indexed by (x, y),
    #   it is not cropped
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    # smid and tmid are number of pixels between the center pixel
    # and the edge, ie for a 5x5 filter they will be 2.
    #
    # The output size is calculated by adding smid, tmid to each
    # side of the dimensions of the input image.
    vmax = f.shape[0]
    wmax = f.shape[1]
    smax = g.shape[0]
    tmax = g.shape[1]
    smid = smax // 2
    tmid = tmax // 2
    xmax = vmax + 2*smid
    ymax = wmax + 2*tmid
    # Allocate result image.
    h = np.zeros([xmax, ymax], dtype=f.dtype)
    # Do convolution
    for x in range(xmax):
        for y in range(ymax):
            # Calculate pixel value for h at (x,y). Sum one component
            # for each pixel (s, t) of the filter g.
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Convolution by Cython rev.1


The content of the Cython code convolve_cy1.pyx is exactly the same as the Python code convolve_py.py.

Convolution by Cython rev.2


The content of the Cython code convolve_cy2.pyx is:
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
def naive_convolve(np.ndarray f, np.ndarray g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in range(xmax):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Convolution by Cython rev.3


The content of the Cython code convolve_cy3.pyx is:
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in range(xmax):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Convolution by Cython rev.4


The content of the Cython code convolve_cy4.pyx is:
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in range(xmax):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Typed Memoryviews


In Cython Documentation "Typed Memoryviews", it is noted that memoryviews are considerably faster than ndarray objects. And it covers section "Convolution by Cython rev.5".

Convolution by Cython rev.5


The Cython code convolve_cy5.pyx is based on the former Cython code convolve_cy4.pyx, and the only difference is that the function arguments are typed as memoryviews.
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def naive_convolve(DTYPE_t[:, :] f, DTYPE_t[:, :] g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    # memoryview object has no attribute 'dtype'
    # assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in range(xmax):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Using Parallelism


In Cython Documentation "Using Parallelism", it is noted that Cython uses OpenMP to achieve native parallelism. And it covers section "Convolution by Cython rev.6_4" and section "Convolution by Cython rev.6_5".

Convolution by Cython rev.6_4


The Cython code convolve_cy6_4.pyx is based on the former Cython code convolve_cy4.pyx, and the only difference is parallel loop prange.
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
cimport cython

from cython.parallel import prange

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in prange(xmax, nogil=True):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value = value + g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Convolution by Cython rev.6_5


The Cython code convolve_cy6_5.pyx is based on the former Cython code convolve_cy5.pyx, and the only difference is parallel loop prange.
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
cimport cython

from cython.parallel import prange

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def naive_convolve(DTYPE_t[:, :] f, DTYPE_t[:, :] g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    # memoryview object has no attribute 'dtype'
    # assert f.dtype == DTYPE and g.dtype == DTYPE
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2*smid
    cdef int ymax = wmax + 2*tmid
    cdef np.ndarray[DTYPE_t, ndim=2] h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to
    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in prange(xmax, nogil=True):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value = value + g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

Running Convolution Code


The Python script setup_mingw.py to build ALL extension modules by MinGW compiler is:
try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension

from Cython.Build import cythonize
import numpy

ext_modules = [
    Extension(
        "*",
        ["*.pyx"],
        extra_compile_args=['-fopenmp'],
        extra_link_args=['-fopenmp'],
    )
]

setup(
    ext_modules = cythonize(ext_modules),
 include_dirs=[numpy.get_include()]
)

The Python script setup_msvc.py to build ALL extension modules by Microsoft Visual C++ compiler is:
try:
    from setuptools import setup
    from setuptools import Extension
except ImportError:
    from distutils.core import setup
    from distutils.extension import Extension

from Cython.Build import cythonize
import numpy

ext_modules = [
    Extension(
        "*",
        ["*.pyx"],
        extra_compile_args=['/openmp'],
        extra_link_args=[''],
    )
]

setup(
    ext_modules = cythonize(ext_modules),
 include_dirs=[numpy.get_include()]
)

It would be appreciated if anyone could show me how to integrate setup.py for both MinGW compiler and Microsoft Visual C++ compiler.

The Python script convolve.py, which runs all convolution codes and shows running time, is:
import numpy as np

N = 200
f = np.arange(N*N, dtype=np.int).reshape((N,N))
g = np.arange(81, dtype=np.int).reshape((9, 9))

print "2D discrete convolution of an image"
print "f array size", f.shape[0], "by", f.shape[1]
print "g array size", g.shape[0], "by", g.shape[1]
print

import timeit

t_start = timeit.default_timer()
import convolve_py
convolve_py.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_py takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy1
convolve_cy1.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy1 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy2
convolve_cy2.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy2 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy3
convolve_cy3.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy3 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy4
convolve_cy4.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy4 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy5
convolve_cy5.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy5 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy6_4
convolve_cy6_4.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy6_4 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy6_5
convolve_cy6_5.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy6_5 takes {:.5f} second".format((t_end -t_start))

The computer hardware and software specification is:
Operating system: Windows 10, 64bit
Power plan: High performance
Distribution: WinPython-64bit-2.7.10.3
Hardware: Computer 1

Since np.dtype('int').itemsize = 4 (bytes) on my computer 1, the size of the Numpy array f is 200 * 200 * 4 (bytes) = 160,000 (bytes), and is less than CPU L3 cache.

The running time is:

WinPython (1)Visual C++ (2)
Running Time
(second)
Speed-upRunning Time
(second)
Speed-up
convolve_py5.196021.004.984491.00
convolve_cy13.566221.463.741661.33
convolve_cy21.425613.641.585733.14
convolve_cy30.01646315.680.02409206.91
convolve_cy40.00641810.610.00685727.66
convolve_cy50.00572908.400.00823605.65
convolve_cy6_40.004061279.810.00621802.66
convolve_cy6_50.004791084.760.00534933.43
note1: WinPython-64bit-2.7.10.3
note2: Microsoft Visual C++ Compiler for Python 2.7

In this test, it seems that Microsoft Visual C++ Compiler for Python 2.7 does not provide extra speed-up by using memoryviews.

Before each build and test, all non-source files including .pyc file, and .c, .pyd file are deleted in order to have a fresh start.

12 Megapixel Resolution


Since high-resolution digital cameras have gained popularity, the Numpy array f is set as the size of 12 Megapixel, 4000 x 3000.

The Python script convolve_12M.py, which runs some convolution codes and shows running time, is:
import numpy as np

N1 = 4000
N2 = 3000

f = np.arange(N1*N2, dtype=np.int).reshape((N1,N2))
g = np.arange(81, dtype=np.int).reshape((9, 9))

print "2D discrete convolution of an image"
print "f array size", f.shape[0], "by", f.shape[1]
print "g array size", g.shape[0], "by", g.shape[1]
print

import timeit

# t_start = timeit.default_timer()
# import convolve_py
# convolve_py.naive_convolve(f, g)
# t_end = timeit.default_timer()
# print "convolve_py takes {:.5f} second".format((t_end -t_start))

# t_start = timeit.default_timer()
# import convolve_cy1
# convolve_cy1.naive_convolve(f, g)
# t_end = timeit.default_timer()
# print "convolve_cy1 takes {:.5f} second".format((t_end -t_start))

# t_start = timeit.default_timer()
# import convolve_cy2
# convolve_cy2.naive_convolve(f, g)
# t_end = timeit.default_timer()
# print "convolve_cy2 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy3
convolve_cy3.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy3 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy4
convolve_cy4.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy4 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy5
convolve_cy5.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy5 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy6_4
convolve_cy6_4.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy6_4 takes {:.5f} second".format((t_end -t_start))

t_start = timeit.default_timer()
import convolve_cy6_5
convolve_cy6_5.naive_convolve(f, g)
t_end = timeit.default_timer()
print "convolve_cy6_5 takes {:.5f} second".format((t_end -t_start))

The computer hardware and software specification is the same as the former section "Running Convolution Code".

Since np.dtype('int').itemsize = 4 (bytes) on my computer 1, the size of the Numpy array f is 4,000 * 3,000 * 4 (bytes) = 48,000,000 (bytes), and is greater than CPU L3 cache.

The running time is:

WinPython (1)Visual C++ (2)
Running Time
(second)
Speed-upRunning Time
(second)
Speed-up
convolve_py----
convolve_cy1----
convolve_cy2----
convolve_cy34.63142-6.10418-
convolve_cy41.65433-1.73886-
convolve_cy51.49566-2.14620-
convolve_cy6_40.61500-1.20628-
convolve_cy6_50.57597-1.14869-
note1: WinPython-64bit-2.7.10.3
note2: Microsoft Visual C++ Compiler for Python 2.7

The running time of convolve_py, convolve_cy1, convolve_cy2 is too long, hence they are skipped.

In this test, it seems that Microsoft Visual C++ Compiler for Python 2.7 does not provide extra speed-up by using memoryviews; the running time for WinPython is obviously less than Microsoft Visual C++ Compiler for Python 2.7.

Compare the Results on Windows and Linux platform


In this section the results on Windows platform and Linux platform will be compared. Because my computer 1 has problem running CAELinux 2013, all convolution codes are run on my computer 2.

The computer hardware and software specification is:
Windows platform:
    Operating system: Windows 10, 64bit
    Power plan: High performance
    Distribution: WinPython-64bit-2.7.10.3
Linux platform:
    Operating system: CAELinux 2013 (Ubuntu 12.04 LTS 64bit) Linux 3.2.0-111-generic
    CPUFreq governor: Performance
    Python: Python 2.7.3
    Cython: 0.25b0
    gcc: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Hardware: Computer 2

On Linux platform, following commands are executed:
Check CPUFreq governor by running command:
$ cpufreq-info

Set CPUFreq governor as "performance" for all cores by running command:
$ sudo cpufreq-set -r -g performance

Since memoryviews need Cython version 0.16 and higher, update Cython by running command:
$ sudo pip install --upgrade git+git://github.com/cython/cython@master

For the Numpy array f of size 200 by 200, the running time is:

WinPython (1)Visual C++ (2)Linux (3)
Running Time
(second)
Speed-upRunning Time
(second)
Speed-upRunning Time
(second)
Speed-up
convolve_py5.407561.005.650361.004.305371.00
convolve_cy14.227431.284.522841.253.947591.09
convolve_cy21.872122.891.932662.925.431210.79
convolve_cy30.02253240.020.02494226.560.02066208.39
convolve_cy40.00746724.870.00778726.270.00718599.63
convolve_cy50.00749721.970.00852663.190.00753571.76
convolve_cy6_40.004841117.260.00883639.900.00787547.06
convolve_cy6_50.005171045.950.00648871.970.00763564.27
note1: WinPython-64bit-2.7.10.3
note2: Microsoft Visual C++ Compiler for Python 2.7
note3: CAELinux (Ubuntu 14, 64bit + gcc)

For the Numpy array f of size 4,000 by 3,000, the running time is:

WinPython (1)Visual C++ (2)Linux (3)
Running Time
(second)
Speed-upRunning Time
(second)
Speed-upRunning Time
(second)
Speed-up
convolve_py------
convolve_cy1------
convolve_cy2------
convolve_cy36.44019-7.17841-6.14739-
convolve_cy41.90992-2.00507-2.11415-
convolve_cy51.91129-2.19844-2.10054-
convolve_cy6_40.99059-1.53905-2.24326-
convolve_cy6_50.98661-1.46488-2.19705-
note1: WinPython-64bit-2.7.10.3
note2: Microsoft Visual C++ Compiler for Python 2.7
note3: CAELinux (Ubuntu 14, 64bit + gcc)

The running time on Windows platform and Linux platform are roughly the same, except that the running time of convolve_cy2 on Linux platform is significantly longer than Windows platform, and the running time of convolve_cy6_4 and convolve_cy6_5 is different for WinPython, Visual C++, Linux.

Code convolve_cy6_4 and convolve_cy6_5 don't provide any speed boost on Linux platform, and no error message is generated during compilation. It would be appreciated if anyone could show me how to make parallel loop prange working functionally on Linux platform.

沒有留言:

張貼留言