Measuring and Optimizing Runtime

Data scientists have somewhat of a reputation for being tinkerers. But as the field is increasingly drawing nearer to software engineering, the demand for concise, highly performant code has increased. The performance of a program should be assessed in terms of time, space, and disk use — keys to scalable performance.

Python offers some profiling utilities to showcase where your code is spending time. To support the monitoring of a function’s runtime, Python offers the timeit function.

%%timeit for i in range(100000):
    i = i**3

Some quick wins when it comes to improving your code while working with pandas:

  1. Use pandas the way it’s meant to be used: do not loop through dataframe rows — use the apply method instead
  2. Leverage NumPy arrays for more even efficient coding

Sign up for more tips