Remember my post about how to bust performace issues? My claim there was that if you picked a project at random from e.g. GitHub, you'd find something that would catch your eye if you ran the code through a profiler. Iterating this process then seemed like a good strategy to generate PRs, which is what you need to do if you want to complete the Hacktoberfest challenge when that time of the year comes around.
But let's not get the wrong idea. You shouldn't walk away from here thinking that performance analysis is as trivial as turning the profiler on during test runs. What my previous post was trying to show is that, in many cases, code is not profiled and therefore it is easy to find some (rather) low-hanging fruits that can be fixed easily just as simply as looking at profiling data from the test suite. Once these are out of the way, that's when the performance analysis becomes a challenge itself, and some more serious and structured methodologies are required to make further progress.
So how did I actually use a profiler to complete the Hacktoberfest? I started by
looking at all the Python projects with the hacktoberfest
topic on GitHub and
picked some that looked interesting to me. The profiler of choice was (surprise,
surprise) Austin, since it requires no instrumentation and has
practically no impact on the tracee, meaning that I could just sneak a austin
in the command line used to start the tests to get the data that I needed.
As a concrete example, let's look at how I was able to detect and fix a
performance regression in pyWhat. I forked the repository, made a
local clone and looked at how the test suite is run. Peeking at the GitHub
Actions I could see the test suite was triggered with nox
python -m nox
Inside the noxfile.py
we can find the tests
session, which is the one we are
interested in
@nox.session
def tests(session: Session) -> None:
"""Run the test suite."""
session.run("poetry", "install", "--no-dev", external=True)
install_with_constraints(
session,
"pytest",
"pytest-black",
"pytest-cov",
"pytest-isort",
"pytest-flake8",
"pytest-mypy",
"types-requests",
"types-orjson",
)
session.run("pytest", "-vv", "--cov=./", "--cov-report=xml")
So let's create a profile
session where we run the test suite through Austin.
All we have to do is add austin
at the right place in the arguments to
session.run
, plus some additional options, e.g.:
@nox.session
def profile(session: Session) -> None:
"""Profile the test suite."""
session.run("poetry", "install", "--no-dev", external=True)
profile_file = os.environ.get("AUSTIN_FILE", "tests.austin")
install_with_constraints(
session,
"pytest",
"pytest-black",
"pytest-cov",
"pytest-isort",
"pytest-flake8",
"pytest-mypy",
"types-requests",
"types-orjson",
)
session.run("austin", "-so", profile_file, "-i", "1ms", "pytest")
Here I've actually removed options to pytest
which I don't care about, like
code coverage, as it's not what I want to profile this time. The -s
option
tells Austin to give us non-idle samples only, effectively giving us a profile
of CPU time. I'm also allowing the Austin output file to be specified from the
environment via the AUSTIN_FILE
variable. This means that, if I want to
profile the tests and save the results to tests.austin
, all I have to do is
invoke
pipx install nox # if not installed already
AUSTIN_FILE=tests.austin nox -rs profile
Once this completes, the profiling data will be sitting in tests.austin
, ready
to be analysed. With VS Code open on my local copy of pyWhat
, I've used the
Austin VS Code extension to visualise the data in the form of a flame
graph and, by poking around, this is what caught my eye
The suspect here is the chunky deepcopy
frame stack which is quite noticeable.
The question, of course, is whether the deepcopy is really needed. Clicking on
the check
frame takes us straight into the part of the code where the
deepcopy
is triggered. By inspecting the lines around I couldn't really see
the need of making deepcopy
of objects. So I turned that back (it was
originally a shallow copy, that was later turned into a deep copy) into a
shallow copy with this PR,
ran the test and checked for the expected output. All was looking find. In fact,
things now looked much, much better! Rerunning the profile session with the
change produced the following picture:
The deepcopy
stacks have disappeared and the check
frame is overall much
slimmer! And so, just like that, a performance regression has been found and
fixed in just a few minutes :).