The Hub of Heliopolis - How I completed the Hacktoberfest 2021 challenge with a profiler

Remember my post about how to bust performace issues? My claim there was that if you picked a project at random from e.g. GitHub, you'd find something that would catch your eye if you ran the code through a profiler. Iterating this process then seemed like a good strategy to generate PRs, which is what you need to do if you want to complete the Hacktoberfest challenge when that time of the year comes around.

But let's not get the wrong idea. You shouldn't walk away from here thinking that performance analysis is as trivial as turning the profiler on during test runs. What my previous post was trying to show is that, in many cases, code is not profiled and therefore it is easy to find some (rather) low-hanging fruits that can be fixed easily just as simply as looking at profiling data from the test suite. Once these are out of the way, that's when the performance analysis becomes a challenge itself, and some more serious and structured methodologies are required to make further progress.

So how did I actually use a profiler to complete the Hacktoberfest? I started by looking at all the Python projects with the hacktoberfest topic on GitHub and picked some that looked interesting to me. The profiler of choice was (surprise, surprise) Austin, since it requires no instrumentation and has practically no impact on the tracee, meaning that I could just sneak a austin in the command line used to start the tests to get the data that I needed.

As a concrete example, let's look at how I was able to detect and fix a performance regression in pyWhat. I forked the repository, made a local clone and looked at how the test suite is run. Peeking at the GitHub Actions I could see the test suite was triggered with nox

python -m nox

Inside the noxfile.py we can find the tests session, which is the one we are interested in

@nox.session
def tests(session: Session) -> None:
    """Run the test suite."""
    session.run("poetry", "install", "--no-dev", external=True)
    install_with_constraints(
        session,
        "pytest",
        "pytest-black",
        "pytest-cov",
        "pytest-isort",
        "pytest-flake8",
        "pytest-mypy",
        "types-requests",
        "types-orjson",
    )
    session.run("pytest", "-vv", "--cov=./", "--cov-report=xml")

So let's create a profile session where we run the test suite through Austin. All we have to do is add austin at the right place in the arguments to session.run, plus some additional options, e.g.:

@nox.session
def profile(session: Session) -> None:
    """Profile the test suite."""
    session.run("poetry", "install", "--no-dev", external=True)
    profile_file = os.environ.get("AUSTIN_FILE", "tests.austin")
    install_with_constraints(
        session,
        "pytest",
        "pytest-black",
        "pytest-cov",
        "pytest-isort",
        "pytest-flake8",
        "pytest-mypy",
        "types-requests",
        "types-orjson",
    )
    session.run("austin", "-so", profile_file, "-i", "1ms", "pytest")

Here I've actually removed options to pytest which I don't care about, like code coverage, as it's not what I want to profile this time. The -s option tells Austin to give us non-idle samples only, effectively giving us a profile of CPU time. I'm also allowing the Austin output file to be specified from the environment via the AUSTIN_FILE variable. This means that, if I want to profile the tests and save the results to tests.austin, all I have to do is invoke

pipx install nox  # if not installed already
AUSTIN_FILE=tests.austin nox -rs profile

Once this completes, the profiling data will be sitting in tests.austin, ready to be analysed. With VS Code open on my local copy of pyWhat, I've used the Austin VS Code extension to visualise the data in the form of a flame graph and, by poking around, this is what caught my eye

The suspect here is the chunky deepcopy frame stack which is quite noticeable. The question, of course, is whether the deepcopy is really needed. Clicking on the check frame takes us straight into the part of the code where the deepcopy is triggered. By inspecting the lines around I couldn't really see the need of making deepcopy of objects. So I turned that back (it was originally a shallow copy, that was later turned into a deep copy) into a shallow copy with this PR, ran the test and checked for the expected output. All was looking find. In fact, things now looked much, much better! Rerunning the profile session with the change produced the following picture:

The deepcopy stacks have disappeared and the check frame is overall much slimmer! And so, just like that, a performance regression has been found and fixed in just a few minutes :).