## Saturday, September 29, 2018

### Comparing numpy scalars directly is time consuming, use .tolist() before a comparison

This is something that I found out about recently when going through the elements of a numpy array in order to do some checks on each numbers. Turns out you shouldn't just do this

for x in nparr:
if x == 0:
something something


as that uses a lot more time than doing this

for x in nparr.tolist():
if x == 0:
something something


This is because a for loop iterating over a numpy array does not result in a sequence of Python constants but in a sequence of numpy scalars which would result in comparing a numpy array to a constant. Converting the array into a list first before the for loop will then result in a sequence of constants.

Here is some profiling I've done using cProfile to check different ways to do an 'if' on a numpy array element:

import cProfile
import numpy as np

runs = 1000000

print('Comparing numpy to numpy')
x = np.array(1.0, np.float32)
y = np.array(1.0, np.float32)
cProfile.run('''
for _ in range(runs):
if x == y:
pass
''')
print()

print('Comparing numpy to constant')
x = np.array(1.0, np.float32)
cProfile.run('''
for _ in range(runs):
if x == 1.0:
pass
''')
print()

print('Comparing constant to constant')
x = 1.0
cProfile.run('''
for _ in range(runs):
if x == 1.0:
pass
''')
print()

print('Comparing numpy.tolist() to constant')
x = np.array(1.0, np.float32)
cProfile.run('''
for _ in range(runs):
if x.tolist() == 1.0:
pass
''')
print()

print('Comparing numpy to numpy.array(constant)')
x = np.array(1.0, np.float32)
cProfile.run('''
for _ in range(runs):
if x == np.array(1.0, np.float32):
pass
''')
print()

print('Comparing numpy.tolist() to numpy.tolist()')
x = np.array(1.0, np.float32)
y = np.array(1.0, np.float32)
cProfile.run('''
for _ in range(runs):
if x.tolist() == y.tolist():
pass
''')
print()


Here are the results in order of speed:

Comparing constant to constant: 0.088 seconds 0.288 seconds 0.508 seconds 0.684 seconds 1.192 seconds 1.203 seconds

It turns out that it is always faster to first convert your numpy scalars into constants via .tolist() than to do anything with them as numpy scalars.