Managing binary package repositories

In Packaging for Arch Linux I described the ins and outs of binary repository management and some of the issues that come with the tooling currently used by Arch Linux.

In this article I will highlight the work on new tooling and its features.

Since my last write-up on this topic, the project formerly known as arch-repo-management has been renamed to repod (as in repo-d) and has just seen its first minor release. 🎉

You can find its documentation at https://repod.archlinux.page.

Please note, that repod 0.1.0 is still alpha grade software and should not be used to actually manage binary package repositories at this point in time!

However, it is already possible to do a few things with the software and if you are able to test it or are interested in helping develop it, that is very much welcomed!

On Arch Linux you can install it using pacman:

pacman -S repod

On other distributions (even on macOS!) you may install it using pip (repod is available on pypi) until your respective package management system makes the software available to you on a system level:

pip install --user repod

Working on repod

Since 2021 I have been on and off working on what is now repod. The project is written in typed Python and is extensively tested using pytest.

Work first began after Arch Conf 2019, at which a working group had looked into improving the workflows currently employed by the distribution and pushing for tooling that would allow moving away from an svn monorepo based approach to a deconstructed git setup.

A proof of concept (PoC) to mimic the behavior of dbscripts had been created, but after 2019 this work laid dormant.

Over 2021 I have spent time to transform parts of the PoC into a Python project following best practices for development (e.g. type hints following PEP 0484, 100% test coverage, data validation using pydantic models), exposing first features in scripts.

In 2022 more work has been done to extend validation and transform the project into a package based setup for easier handling and extension in the future.

Concepts of repod

Contrary to dbscripts, repod follows a paradigm in which it is largely decoupled from the source repository of the binary packages it maintains and aims at becoming a self-contained service.

Package files and their signatures are consumed, relevant metadata is extracted and transferred to a management repository, which is where the state of each binary repository is kept.

Available packages in a given binary repository are exposed to pacman via sync database files. The management repository contents can be (transparently and reproducibly) transformed into sync database files and vice versa.

The above functionality is exposed on the command line via repod-file.

Upcoming work

As mentioned above, repod is not yet stable and still misses quite a few features. The following topics (and more) will be worked on in the next milestones (not necessarily in this order):

  • file handling (moving package files from staging areas to actual repositories)

  • signature validation (PGP signature validation of package files)

  • handling of debug packages

  • consolidate schema of management repository

  • improve logging throughout the project

  • git backend to transparently expose changes to the management repository

  • caching for management repository state (e.g. to allow fast searches)

  • API to interact with a repod instance over the wire in an authenticated fashion

  • client-side tooling to interact with repod's API

There is still a lot of work to be done, so if you have a background in Python development and are interested in working on a project that is very close to the distribution and will likely improve the workflow of many people while making binary repository management more transparent to the user, have a look at the project's contribution guidelines and do not hesitate to reach out!