Ruff my dirty code

17. 09. 2024 | Jakub Kadlčík | EN ruff python dev github

Static analysis tools have their limitations but regardless they help us quickly discover many types of bugs. That is not a controversial take. What may be disputed is whether enabling them for large projects with a long-standing history is worth the effort and I would say it most definitely is. In this article will take a look at how to report problems only for new code, lowering the barrier to entry to the minimum.

The dilemma

Creating a new project from scratch is so enjoyable, isn’t it? No technical debt, no backward compatibility, no compromises, all the code is beautifully formatted and brilliantly architected. We don’t really need to run any static analysis tools but we do it anyway just so that they can say that “All checks passed!” and that we are awesome. Obviously, I am being sarcastic, but the point is that enabling such tools for new projects is easy.

Now, let’s consider projects that have been developed in the span of decades. Everything is a mess. Running pylint, mypy or ruff overwhelms you with hundreds or thousands of reports and leaves you with the following dilemma - Should you just abandon all hope and pretend that this never happened? Should you pollute your codebase with a bunch of # pylint: disable=foo comments? Or should you devote the next month of your life to rewriting all the problematic code while risking to introduce even more bugs in the process?

There is one more option that worked for our team quite well for years now - running static analysis tools for the whole project but reporting only newly introduced problems.

Reporting only new problems

There is a tool called csdiff which takes two lists of defects (formatted errors from static analysis tools), compares them, and prints only defects that newly appeared or that are missing from the second list. This can be understood either as newly added or fixed defects.

We created a tool called vcs-diff-lint which does the obvious thing. It runs pylint, mypy, and ruff for the main branch of your git repository, and then runs them again for your current branch. There we have our two lists of defects which get internally passed to csdiff. The output looks like this.

$ vcs-diff-lint
Error: RUFF_WARNING:
fedora_distro_aliases/__init__.py:23:20: F821[undefined-name]: Undefined name `requests`

Error: MYPY_ERROR:
fedora_distro_aliases/__init__.py:11: mypy[error]: "None" has no attribute "append"  [attr-defined]

We can clearly see that our code introduced two new errors. Sometimes it may be useful to also see how many existing errors our code fixed. In that case, use vcs-diff-lint --print-fixed-errors.

Please follow the installation instructions here.

Github Action

I don’t trust myself (or anyone else for that matter) to run the vcs-diff-lint tool manually for every proposed change. And neither should you. There is an easy-to-use GitHub action that runs the tool automatically for every pull request. It tries to be as user-friendly as possible and reports the problems as comments directly in your “Files changed” section.

Please follow the installation instructions here or take a look at our setup as an example.

Ruff support

Ruff is all the rage nowadays, and rightfully so. It checks our whole codebase in under a second (20ms actually) while mypy takes its sweet time and finishes around a one-minute mark. Up until recently, the vcs-diff-lint tool supported only pylint and mypy but since the last release, ruff is supported as well. Please give it a try.

As a matter of fact, I am writing this article as a celebration of the new vcs-diff-lint release.

But what about diff-cover

Speaking about differential static analysis, you may have already heard about diff-cover. It has many more contributors and GitHub stars so why would I recommend trying vcs-diff-lint instead?