Code review as part of a research workflow

We have adopted a process of code review, in which research code is evaluated by other lab or project members before entering the primary code base. Here, I describe this approach and describe its advantages (and some challenges) and give some tips on its implementation.

Research projects in my lab involve a lead researcher (typically an honours student, PhD student, or postdoc) and myself as supervisor, with also potentially research assistants and collaborators. The key elements of the implementation of such projects are:

In typical research projects, active review and acceptance of the elements by multiple people on the project only occurs for the final stage—in which the project is reported. However, there would also seem to be great potential benefits to a review process in the previous stages, in which the project is developed and implemented.

Here, I will describe and reflect upon our adoption of such a code review workflow.

Repositories and code review

In our project organisation system, each of the project elements has a repository under version control using git and stored on github under our lab. The main repository state, called master, is protected and cannot be directly changed. Instead, alterations are made within a specific branch, which can then be incorporated into master via a pull request. As part of the pull request, another member of the project is invited to review the proposed changes. The reviewer can add comments and questions and request changes, often leading to a set of iterations within the pull request. When the reviewer is satisfied with the changes, they mark them as accepted and merge them into the master state.

For example, consider a situation in which the stimuli for an experiment are being created:

Advantages

This might seem like a complicated sequence of events, and one might wonder why they would introduce this complexity into their workflow. Here are what I consider to be a few key advantages.

Training

Students do not typically begin research with much experience in computer programming, and must acquire such skills during their research training. Code review is a great way to gradually boost skills and confidence in the necessary programming concepts. In the beginning, the student can serve as a reviewer of code written by others on the project. This allows them to see a direct connection between the requirements and goals of their specific research and how they are met using code—this is typically a much more productive and stimulating way of learning than following generic instructional material. Importantly, they retain a connection to the project and feeling of ownership in a way that they may not if the code was not visible to them or their input on key implementation decisions was not sought.

As students advance in their skills, they can transition towards writing code and creating pull requests for their supervisor or others on the project to review. The review process allows them to have confidence that the responsibility for producing correct code is not theirs alone, and to gain feedback that will enable them to rapidly advance their skills. Furthermore, this also allows the supervisor to balance the need for students to have the autonomy and freedom to direct and implement their projects and the requirement as supervisor to have responsibility and oversight for the project and its integrity.

Collaboration

Research projects are often done with outside collaborators—that is, with people that are not within the group of the lead researcher and the supervisor. These collaborators typically bring some specific expertise to a project, and often that expertise is technical. This can create a challenge in ensuring that all of the project members are in agreement on the details of a project.

Code review can help with this challenge by allowing the lead researcher and/or the supervisor to inspect the contributions of the collaborator before they enter the master code base. Even if the collaborator is not part of the code review process, it can still be useful for the within-lab researchers to introduce the code via a pull request—this can facilitate discussions around the suitability of the code and the understanding of its functionality.

Communication

Code is often the most fundamental source of information about a research project, and we are encouraged to make code available alongside publications about the associated research for inspection by the broader community. The usefulness of such code is likely to be related to how readily it can be comprehended.

Having a review process means that the code has already been written with the ability for it to be understood in mind, since the reviewer must be able to follow and understand the code in order to approve it. It is likely that this automatically causes the code to increase in readability. Indeed, I have found that knowing that someone else will be trying to understand my code can have dramatic effects on its resulting structure—even more so when I know that a specific person will be looking at it in detail.

Disadvantages and challenges

It is important to note a few disadvantages and challenges in adopting a code review workflow:

Tips for implementing a code review process

On balance, I consider a code review process to be a positive and beneficial incorporation into a research workflow. Here are a few tips and suggestions if you are considering implementing such processes:

Summary

Code review is a process in which other members of a research team examine and evaluate code before it is incorporated into the primary code base. We suggest that code review can provide significant advantages in research training, collaboration, and communication. Considered along with some disadvantages and challenges, I think that code review is a worthwhile addition to a research workflow.

Acknowledgements

Thanks to the members of my lab for their willingness to apply a code review process. Thanks in particular to Lindsay Peterson, who gave a talk on this topic at ResBaz Sydney 2019—you can watch Lindsay's 5-minute 'lightning' talk on "Code review to enhance research training, collaboration, and communication" (begins at 27:30), which gives an overview of many of the points raised here.

Resources of interest