Mypy Flake Tests

Dec 5, 2018 07:02 · 1439 words · 7 minute read mypy flake8 pytest

Mypy and Flake8 testing

At work I am currently working on Python project and we decided to make it fully typed. Python allows you to annotate types, but it doesn’t check them on its own. You need separate program that will do it for you. Mypy is most widely used, so we picked it up.

We also setup flake8. It combines three tools:

  • PyFlakes check source code for errors like missing imports, unused imports and undefined names.
  • pycodestyle helps with keeping code PEP8 compliance.
  • Ned Batchelder’s McCabe script checks McCabe complexity. To be honest, we are not using it right now.

Integration with project

We wanted to make flake8 and mypy integrate with our development seamlessly. These scripts could be run by precommit checks, by pyinvoke or just by setting up IDE properly. But it would require some setup. When new developer comes into project, he would need to tinker with all these things.

Since we are using pytest and there are plugins pytest-flake8 and pytest-mypy we decided to use them.

It looked promising, just run tests and you know if there are missing imports, typing errors or other warnings. We started to write code the code.


After some time, our project grew into over 600 python files excluding tests, and it’s just MVP that will grow much more. We run our tests also on CircleCI, and it got obvious that there are some problems with them.

Single run took 4-5 minutes.

I like to work remotely and I am using 8GB RAM Macbook Air. Surprisingly it’s enough to run Docker and PyCharm without much pain. Usually when I work on battery, I would use CircleCI to run tests for me. But when I am out of the Internet, it’s painful to wait over 8 minutes for tests to finish. Not mentioning overheating and battery drain.

Watching pytest running it was obvious that the problem was lying under these two plugins - pytest-flake8 and pytest-mypy. They took over 3 minutes, and our actual tests less than 1.

Flake8 plugin

Looking at source code of pytest-flake8 you can see it’s quite straight forward plugin. Most of the functionalities lives in file.

It has:

def pytest_addoption(parser):
    """Hook up additional options."""
    group = parser.getgroup("general")
        '--flake8', action='store_true',
        help="perform some flake8 sanity checks on .py files")
        "flake8-ignore", type="linelist",
        help="each line specifies a glob pattern and whitespace "
             "separated FLAKE8 errors or warnings which will be ignored, "
             "example: *.py W293")
        help="maximum line length")

def pytest_configure(config):
    """Start a new session."""
    if config.option.flake8:
        config._flake8ignore = Ignorer(config.getini("flake8-ignore"))
        config._flake8maxlen = config.getini("flake8-max-line-length")

That provides support to various flake8 configurations like flake8-max-line-length through pytest config files.

def pytest_collect_file(path, parent):
    """Filter files down to which ones should be checked."""
    config = parent.config
    if config.option.flake8 and path.ext in config._flake8exts:
        flake8ignore = config._flake8ignore(path)
        if flake8ignore is not None:
            return Flake8Item(

That collects files in the project based on pytest configuration to later run flake8 on.

class Flake8Item(pytest.Item, pytest.File):

    def __init__(self, path, parent, flake8ignore, maxlength,
                 maxcomplexity, showshource, statistics):
        super(Flake8Item, self).__init__(path, parent)
        self._nodeid += "::FLAKE8"

    def setup(self):
        if hasattr(self.config, "_flake8mtimes"):
            flake8mtimes = self.config._flake8mtimes
            flake8mtimes = {}
        self._flake8mtime = self.fspath.mtime()
        old = flake8mtimes.get(str(self.fspath), (0, []))
        if old == [self._flake8mtime, self.flake8ignore]:
            pytest.skip("file(s) previously passed FLAKE8 checks")

    def runtest(self):
        call =
        found_errors, out, err = call(
        if found_errors:
            raise Flake8Error(out, err)
        # update mtime only if test passed
        # otherwise failures would not be re-run next time
        if hasattr(self.config, "_flake8mtimes"):
            self.config._flake8mtimes[str(self.fspath)] = (self._flake8mtime,

Pytest Item that provides information for pytest how to run flake8 on collected file, and couple less important classes like custom Exception.

Mypy plugin

Let’s see pytest-mypy now. Like pytest-flake8 it has file with most of the code.

It has:

def pytest_addoption(parser):
    group = parser.getgroup('mypy')
        '--mypy', action='store_true',
        help='run mypy on .py files')
        '--mypy-ignore-missing-imports', action='store_true',
        help="suppresses error messages about imports that cannot be resolved ")

To support mypy settings thorugh pytest configuration.

def pytest_collect_file(path, parent):
    config = parent.config
    mypy_config = []

    if config.option.mypy_ignore_missing_imports:

    if path.ext == '.py' and any([
        return MypyItem(path, parent, mypy_config)

To collect files and run MypyItem on them.

class MypyItem(pytest.Item, pytest.File):
    def __init__(self, path, parent, config):
        super().__init__(path, parent)
        self.path = path
        self.mypy_config = config
    def reportinfo(self):
        """Produce a heading for the test report."""
        return self.fspath, None, ' '.join(['mypy',])

    def runtest(self):
        """Run mypy on the given file."""
        mypy_argv = [
            str(self.path), '--incremental',
        mypy_argv += self.mypy_config
        stdout, _, _ =

        if stdout:
            raise MypyError(stdout)

And the MypyItem itself.

What’s wrong with them

Looking at both plugins it’s easy to find similarities. They have 4 main steps:

  1. Find configuration through pytest
  2. Collect files to test
  3. Run external program (mypy/flake8)
  4. Return results

Flow looks so easy, that there seems to be no place to improve. But the tests are so slow in comparison to runnig mypy/flake8 on it’s own.

$ time flake8 .
real    0m12.385s
user    0m8.000s
sys     0m1.270s

$ time mypy .
real    0m16.175s
user    0m9.140s
sys     0m2.160s

They should take around 30seconds to finish, not over 3 minutes. Thats even without caching.

The issue is that they are run on each file separately.

Faster tests

The easiest way to assign this issue was to create two simple tests that will run mypy and flake8 by themselves, but on directory.
It’s primitive solution but with some tweaks it works quite well.


I have created new file tests/ with following content:

import configparser
import mypy.api

class MypyError(Exception):

def test_mypy():
    cfg = configparser.ConfigParser()"setup.cfg")
    test_paths = cfg.get("tool:pytest", "testpaths").split()
    stdout, _, exit_status =

    if exit_status:
        raise MypyError(stdout)

All it does is:

  • read testpaths from setup.cfg file that contains py.test variables.
  • run mypy by provided api on testpaths
  • raise exception passing stdout if run failed


In a similar way to mypy, I created tests/test_flake8:

import configparser
import subprocess

class FlakeError(Exception):
    def __init__(self, message, *args, **kwargs):
        message = "Flake8 check Failed to successfully complete.\n" + message
        super().__init__(message, *args, **kwargs)

def test_flake():
    cfg = configparser.ConfigParser()"setup.cfg")
    test_paths = cfg.get("tool:pytest", "testpaths")
    max_line_length = cfg.get("tool:pytest", "flake8-max-line-length", fallback=None)

    command = f"flake8 {test_paths}"
    if max_line_length:
        command = f"{command} --max-line-length={max_line_length}"

    result =
    if result.returncode:
        raise FlakeError(result.stdout + result.stderr)

Unfortunately at the moment flake8 doesn’t provide api that’s not legacy.
I had to run it using subprocess passing args from setup.cfg like before.
Here Exception has additional message, but the purpose of it was just to align output better.
Which brings us to…


This approach lacks this handy traceback that is provided by plugins running each file separately.
When some files fails mypy/flake8 tests, the failure doesn’t point to that file but to file tests/test_mypy or tests/test_flake8.
However the error messages are clear and lists all information required.

For us it the trade-off is worth it.


Now CircleCI was hitting 55seconds up to 1minute 10seconds.

More Improvements

After tweaking tests we looked briefly on what we could improve in whole process of CircleCI pipeline to make it faster.

Collecting Files

We want to run flake8 and mypy on our codebase and on our tests.
Our folder structure has codebase and tests next to each other in separate folders so we run py.test specifying both directories like this:

py.test core tests

It had to be done this way to allow pytest to collect all files before running tests on them.

Since we are now setting paths in these two tests we no longer need to include tests in pytest args. It improves pytest collection time quite a lot.

CircleCI Caching

Another quite obvious thing to do was to cache Python libraries between run. We had some troubles with that in the past but took another shot. We added restore_cache and save_cache steps like this:

    - checkout
    - restore_cache:
        key: deps1-{{ .Branch }}-{{ checksum "requirements.txt" }}-{{ checksum "" }}

    - run:
        name: install dependencies
        command: |
        python3 -m venv venv
        . venv/bin/activate
        pip install -r

    - save_cache:
        key: deps1-{{ .Branch }}-{{ checksum "requirements.txt" }}-{{ checksum "" }}
        - ./venv

    - run:
        name: run tests
        command: |
        . venv/bin/activate
        pytest tests
        environment: *envs

Caching mypy

We also looked at caching mypy between runs but the gain in time was equal to overhead that was generated by creating cache and extracting it.


Most of the tests run right now around 43seconds when they can restore cache and up to 55s when they can’t. That includes whole CircleCI process. When run locally it takes ~17 seconds. It’s more acceptable now.

If you have any suggestions how we could improve speed of our tests let me know!

tweet Share