Best friends building Python package

I’ve recently published a Python package. At least for me, I become more careful and comprehensive to the codes, once decided publishing the project. Immediately, I stand on the opposite an start questioning and criticizing every piece of codes. But I have to say this helps me improve the codes a lot. For example, in machine learning projects, I start thinking how flexible my structure could be adapted to other data and model. Publishing your codes if definitely something you want to try!

“If I have seen further it is by standing on the shoulders of giants.”
–Issac Newton

There are fantastic already fantastic tools and pipelines which helps you build a robust, flexible, clean and beautiful project. In this post, I would like to introduce those friends of mine to you.

venv

Python environments can drive one crazy. There are lots of nice tools but personally I prefer venv which is a standard library of Python. First you create a virtual enviroment with

$ python3.9 -m venv .venv
$ source .venv/bin/activate

This creates a .venv folder to store your environment. You can gitignore it by adding .venv/ to .gitignore. Now you can start managing your packages.

pip + pip-tools

Pip is also a standard library of Python which helps you manage your dependency. My practice is to use with pip-tools.

$ python -m pip install --upgrade pip setuptools pip-tools

First you need a .in specifying the packages you need, e.g.

# requirements/train.in
torch 
seaborn
...

Based on this file, piptools find a list of packages you need. You can output them in a .txt file.

$ python -m piptools compile requirements/train.in --output-file requirements/train.txt

Now you have all the packages you need in the .txt file, e.g.

#
# This file is autogenerated by pip-compile with Python 3.9
# by the following command:
#
#    pip-compile --output-file=requirements/train.txt requirements/train.in
#
absl-py==2.1.0
    # via ml-collections
annotated-types==0.6.0
    # via pydantic
anyio==4.3.0
...

At the end, we just need to install these packages from pip:

$ python -m pip install -r requirements/train.txt

If you also integrate your package into environment in an editable mode, run

$ python -m pip install --editable .

Just keep in mind that this requires you to setup the package first.

Check the following blogs explaining why using native library:

folder tree

A good folder tree is also very useful managing your project. Here is how I order my stuff.

projectname/
├── .venv/
├── dist/
│   ├── projectname-1.0.0-py3-none-any.whl
│   └── projectname-1.0.0.tar.gz
├── notebooks/
│   └── notebook.ipynb
├── scripts/
│   └── script.py
├── data/
│   └── data.npy
├── src/
│   └──packagename/
│       ├── __init__.py
│       └── subpackage/
│           ├── __init__.py
│           ├── subpackage.py
├── tests/
│   └── test_subpackage.py
├── docs/
│    └── Makefile
├── .gitignore
├── Makefile
├── Dockerfile
├── requirements/
│   ├── dev-requirements.in
│   ├── dev-requirements.txt
│   ├── requirements.in
│   ├── requirements.txt
│ 
├── pyproject.toml
├── MANIFEST.in
├── tox.ini
├── LICENSE
└── README.md

You probably already know what they are for by just google their names. Anyway, I would explain the files later when we meet them.

VScode, pep8, black, flake8, isort, pylance,

Already we have the environment and can code now! Here are some tools that helps you writing better code. I use VScode with the following extensions installed.

pytest

After coding, you should always test your code before publishing. Pytest facilitate this. Writing test codes is very easy following the official tutorial and after that you only need to run

$ python -m pytest

which output clean and clear test results:

============================= test session starts ==============================
platform darwin -- Python 3.9.13, pytest-8.2.2, pluggy-1.5.0
rootdir: /Users/hous/Github/NeuralHedge
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 4 items                                                              

tests/test_data.py ..                                                    [ 50%]
tests/test_nn.py .                                                       [ 75%]
tests/test_utils.py .                                                    [100%]

=============================== warnings summary ===============================
tests/test_nn.py::test_network
  /Users/hous/Github/NeuralHedge/.venv/lib/python3.9/site-packages/torch/nn/modules/lazy.py:181: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.
    warnings.warn('Lazy modules are a new feature under heavy development '

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================= 4 passed, 1 warning in 3.68s =========================

setuptools

Now you are ready to publish your codes. I follow the pipelines:

Always check the latest tutorial! pyproject.toml is the new standardized format to describe project metadata declaratively, introduced with PEP 621, but many projects are still using the setup.py approach.

tox

With tox, you can even test your codes in different environments. Simply write a configuration of tests and environments.

# tox.in
[tox]
env_list = py38, py39

[testenv]
deps = pytest
command = pytest tests

Then tox would do everything for you

$ tox
.pkg: _optional_hooks> python /Users/hous/Github/NeuralHedge/.venv/lib/python3.9/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: get_requires_for_build_sdist> python /Users/hous/Github/NeuralHedge/.venv/lib/python3.9/site-packages/pyproject_api/_backend.py True setuptools.build_meta
.pkg: build_sdist> python /Users/hous/Github/NeuralHedge/.venv/lib/python3.9/site-packages/pyproject_api/_backend.py True setuptools.build_meta
py38: install_package> python -I -m pip install --force-reinstall --no-deps /Users/hous/Github/NeuralHedge/.tox/.tmp/package/4/neuralhedge-0.1.0.tar.gz
py38: OK ✔ in 4.98 seconds
py39: install_package> python -I -m pip install --force-reinstall --no-deps /Users/hous/Github/NeuralHedge/.tox/.tmp/package/5/neuralhedge-0.1.0.tar.gz
  py38: OK (4.98 seconds)
  py39: OK (2.94 seconds)
  congratulations :) (8.03 seconds)

Reference

This blog is greatly inspired by