CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 13 – Vectorization in Numpy and R DA KUA N G, P OLO CHAU G EORGIA T ECH FA L L 2 0 1 4 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1 Vectorization not possible for Python's lists > import time > def func(x): > return x**4 > arr = range(1048576) > > > > > t0 = time.time() arr2 = [None] * (1048576) for i in arr: arr2[i] = i ** 4 print "Time using for loop:", time.time() - t0 > t0 = time.time() > arr3 = map(func, arr) > print "Time using map:", time.time() - t0 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 2 vectorize() in Numpy > import time > import numpy as np > def func(x): > return x**4 > arr = np.arange(0, 1048576, 1, dtype=np.float64) > arr2 = np.zeros(1048576) > t0 = time.time() > for i in arr: > arr2 = arr[i] ** 4 > print "Time using for loop:", time.time() - t0 > > > > t0 = time.time() vectorize() returns a function object vecfunc = np.vectorize(func) arr3 = vecfunc(arr) print "Time using vectorize:", time.time() - t0 > t0 = time.time() > arr4 = np.power(arr, 4) > print "Time using numpy power:", time.time() - t0 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 3 Vectorizing conditional statements > def func(x): > if x <= 0: > return np.exp(x) > else: > return np.log(x) > arr = np.random.randn(1048576) > arr2 = np.zeros(1048576) > > > > t0 = time.time() vecfunc = np.vectorize(func) arr3 = vecfunc(arr) print "Time using vectorize:", time.time() - t0 > t0 = time.time() > arr4 = np.where(arr <= 0, np.exp(arr), np.log(arr)) > print "Time using numpy where:", time.time() - t0 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 4 Vectorization in R > a = 1:1e6 >c = 0 > > > > # compute sum of squares using a for loops system.time(for (e in a) c = c + e^2) ## user system elapsed ## 0.832 0.001 0.833 > system.time(sum(a^2)) > ## user system elapsed > ## 0.006 0.002 0.008 Summary: Avoid using for-loops to manipulate vectors and matrices. Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 5
© Copyright 2025