Developing a Semantics-Aware Pattern Detection Tool for the Tezos codebase
Blockchains are critical pieces of software, which are used to manipulate significant
amount of financial values (e.g., Tezos’ market cap is more than a billion according
to CoinMarketCap). As a consequence, the quality of Tezos’ codebase should only
increase over time.
The assessment of the quality of a change proposed by any developer to the Tezos
codebase is based upon two components: a Continuous Integration process based
on an ever-growing test-suite, and a Code Review process which involves at least
two engineers per proposal. Code reviews provide the opportunity to uncover issues
that are hard to detect with automatic means, such as logical errors, erroneous as-
sumptions by the developers, etc. Unfortunately, they are also devoted to ensure that
the “quality” of Tezos codebase increases overtime. Reviewers are expected to verify
that exceptions are correctly caught, assert false are indeed unreachable, Lwt is not
We believe that, by adopting or developing automated tools to perform static analy-
ses on the codebase, we can alleviate the amount of work required to perform a code
review, while increasing our confidence in the fact that certain classes of errors can
no longer go undetected.
Last year, we have published an early version of ometrics. The goal of ometrics is to
implement “incremental static analyses”, where only the alerts related to a relevant
subset of the codebase (that is, the one changed in a Merge Request) are reported
to the developers.
We wish to extend ometrics to detect known, erroneous or error-prone patterns that
we want to forbid in the codebase. The expected interface is something akin to what
semgrep is proposing, based on rules developers can write themselves, but curated to
the specific needs of Tezos.
• The first step of the internship will be to provide a tool that handle all the semgrep
rules currently used in the Tezos codebase (but with ometrics capability to filter
• The second step is to explore how to provide more semantics-aware rules.
The result of the internship will be validated by its capability to handle a large code-
base like Tezos.
The successful applicant should have a good knowledge of the OCaml programming
language. A knowledge in the tools used by developers to collaboratively develop
Octez would be a nice bonus. This includes git, Gitlab CI, dune, and Opam among
others. A minimal background in programming language theory is required, as the
applicant will be confronted to the OCaml intermediate representation (AST).
You will work at the Nomadic Labs’ offices in Paris.
Participating in a large scale open-source project you will have to rapidly learn to
use collaborative tools (Git, merge request, issues, gitlab, continuous integration,
documentation) and to communicate about your work. The final results might be
presented at an international conference or workshop.
You will have a designated advisor at Nomadic Labs and will have to work indepen-
dently and to propose thoroughly-considered solutions to the different problems you
will have to solve. You will be encouraged to seek advice from members of the team.
All material produced (essays, documentation, code, etc.) will be released under an
open source license (e.g. MIT or CC).