This post comes from the idea of utilizing secondhand Android phones as a computing cluster.
Linux on Android Phone
The software and hardware information for Linux on Android phone is as follows:
Android phone (Sony Xperia V LT25i)
Android 4.3
Qualcomm Snapdragon S4 (Krait) 1.5 GHz MSM8960 MDP (dual core)
Debian 9 Stretch chroot installed by Linux Deploy (rooted by z4root)
The system information of Debian on Android phone is as follows:
android@localhost:~$ uname -a
Linux localhost 3.4.0-perf-g4c8352f-00792-ga64e471 #1 SMP PREEMPT Wed Mar 26 17:56:21 2014 armv7l GNU/Linux
android@localhost:~$ dpkg --print-architecture
armhf
Windows Laptop
The computer hardware and software specification is:
Operating system: Windows 10, 32bit
Power plan: High performance
Distribution: WinPython-32bit-2.7.10.3
Hardware: Computer 2
Code
The code npdot.py runs matrix multiplication in various matrix sizes:
import numpy as np import sys import timeit n_list = [100, 500, 1000, 1500, 2000, 2500, 3000] repeat_list = [100, 100, 10, 10, 3, 3, 3] if len(sys.argv) > 1: n_list = [int(sys.argv[1])] repeat_list = [1] if len(sys.argv) > 2: repeat_list = [int(sys.argv[2])] for n, repeat in zip(n_list, repeat_list): print "n =", n print "repeat =", repeat t = [] for i in range(repeat): print i, "in repeat", repeat, "\r", a = np.random.rand(n, n) b = np.random.rand(n, n) t1 = timeit.default_timer() c = np.dot(a, b) t2 = timeit.default_timer() # print (t2-t1), "second" t.append(t2-t1) print "average =", np.mean(t) print "maximum =", np.max(t) print "minimum =", np.min(t) print "running time =" print t print
Result
The average running time is:
The running time on Android phone is about 100 times as long as that on my computer 2, and it is just way too slow. The CPU frequency on Android phone (1.5GHz, 2 cores) is not that slow compared with the CPU frequency on my computer 2 (2.3GHz, 2 cores/4 threads), and this makes the running time on Android phone unacceptable.
Introducing ATLAS and OpenBLAS
BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) are necessary for NumPy and SciPy. ATLAS and OpenBLAS are high-performance implementations of BLAS, and both provide an optimized subset of LAPACK. The following commands install ATLAS and OpenBLAS:
sudo apt-get update sudo apt-get upgrade sudo apt-get install libopenblas-dev sudo apt-get install libatlas-base-dev sudo apt-get clean
The following commands switch from an implementation to the other:
root@localhost:/home/android# sudo update-alternatives --config libblas.so.3
There are 3 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/openblas-base/libblas.so.3 40 auto mode
* 1 /usr/lib/atlas-base/atlas/libblas.so.3 35 manual mode
2 /usr/lib/libblas/libblas.so.3 10 manual mode
3 /usr/lib/openblas-base/libblas.so.3 40 manual mode
root@localhost:/home/android# sudo update-alternatives --config liblapack.so.3
There are 3 choices for the alternative liblapack.so.3 (providing /usr/lib/liblapack.so.3).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/openblas-base/liblapack.so.3 40 auto mode
* 1 /usr/lib/atlas-base/atlas/liblapack.so.3 35 manual mode
2 /usr/lib/lapack/liblapack.so.3 10 manual mode
3 /usr/lib/openblas-base/liblapack.so.3 40 manual mode
The average running time is:
Before installing ATLAS and OpenBLAS, reference implementation of BLAS and LAPACK are utilized and they are not optimized. The performance of OpenBLAS is slightly better than ATLAS.
Setting CPUFreq Governor
The following commands install cpufrequtils:
sudo apt-get update sudo apt-get install cpufrequtils sudo apt-get clean
The following commands check CPUFreq governor and set CPUFreq governor as "performance" for all cores :
root@localhost:/home/android# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
driver: msm
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 0.00 ms.
hardware limits: 384 MHz - 1.51 GHz
available frequency steps: 384 MHz, 486 MHz, 594 MHz, 702 MHz, 810 MHz, 918 MHz, 1.03 GHz, 1.13 GHz, 1.24 GHz, 1.35 GHz, 1.46 GHz, 1.51 GHz
available cpufreq governors: interactive, ondemand, userspace, powersave, performance
current policy: frequency should be within 384 MHz and 1.51 GHz.
The governor "interactive" may decide which speed to use
within this range.
current CPU frequency is 384 MHz (asserted by call to hardware).
cpufreq stats: 384 MHz:91.85%, 486 MHz:0.99%, 594 MHz:1.07%, 702 MHz:0.94%, 810 MHz:1.12%, 918 MHz:0.78%, 1.03 GHz:0.99%, 1.13 GHz:0.42%, 1.24 GHz:0.20%, 1.35 GHz:0.19%, 1.46 GHz:0.25%, 1.51 GHz:1.21% (25834)
analyzing CPU 1:
no or unknown cpufreq driver is active on this CPU
maximum transition latency: 0.00 ms.
root@localhost:/home/android# sudo cpufreq-set -r -g performance
root@localhost:/home/android# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
driver: msm
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 0.00 ms.
hardware limits: 384 MHz - 1.51 GHz
available frequency steps: 384 MHz, 486 MHz, 594 MHz, 702 MHz, 810 MHz, 918 MHz, 1.03 GHz, 1.13 GHz, 1.24 GHz, 1.35 GHz, 1.46 GHz, 1.51 GHz
available cpufreq governors: interactive, ondemand, userspace, powersave, performance
current policy: frequency should be within 384 MHz and 1.51 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 1.51 GHz (asserted by call to hardware).
cpufreq stats: 384 MHz:91.85%, 486 MHz:0.99%, 594 MHz:1.07%, 702 MHz:0.94%, 810 MHz:1.12%, 918 MHz:0.78%, 1.03 GHz:0.99%, 1.13 GHz:0.42%, 1.24 GHz:0.20%, 1.35 GHz:0.19%, 1.46 GHz:0.25%, 1.51 GHz:1.21% (25846)
analyzing CPU 1:
no or unknown cpufreq driver is active on this CPU
maximum transition latency: 0.00 ms.
The average running time is:
Obviously, only the performance of OpenBLAS is improved.
Recompiling for Optimal Performance
Quoted from ATLAS Installation Guide:
"Sometimes, ARM systems default to having several of the CPUs turned off to save power. If this happens, you need to turn them back on before doing the ATLAS install, ..."
This explains the information "no or unknown cpufreq driver is active on CPU 1" from the command cpufreq-info. To get the best performance, the instructions from ATLAS, ARM version of ATLAS and SciPy should be executed. Disabling CPU throttling is required when installing ATLAS; OpenBLAS 0.2.20 version detects all ARM cpu cores including offline ones.
Per LinearAlgebraLibraries in DebianScience, ATLAS and OpenBLAS should be recompiled locally for optimal performance. See the README.Debian file for instructions:
README.Debian file of ATLAS
README.Debian file of OpenBLAS
When I followed the README.Debian file to recompile ATLAS and OpenBLAS, I ran into a problem - the free disk space on Linux is not enough.
沒有留言:
張貼留言