No description
Find a file
2023-12-24 12:22:20 +00:00
.vscode Initial commit with project layout 2023-12-08 13:23:01 +00:00
bin/make Add Make target "git-clean" 2023-12-24 12:22:20 +00:00
docs Add first page to docs with Mermaid diagrams 2023-12-24 10:26:38 +00:00
src/mike_dag Fix whitespace 2023-12-22 20:10:44 +00:00
tests Rename package to "mike-dag" 2023-12-22 15:31:35 +00:00
.envrc Add local "bin" directory in PATH via direnv .envrc 2023-12-24 12:01:13 +00:00
.flake8 Solve linting issues 2023-12-17 21:37:37 +00:00
.gitignore Initial commit with project layout 2023-12-08 13:23:01 +00:00
CHANGELOG.md Add changelog 2023-12-24 10:19:15 +00:00
Makefile Add Makefile and MkDocs 2023-12-22 21:15:12 +00:00
mkdocs.yml Add first page to docs with Mermaid diagrams 2023-12-24 10:26:38 +00:00
pdm.lock Add first page to docs with Mermaid diagrams 2023-12-24 10:26:38 +00:00
pyproject.toml Add first page to docs with Mermaid diagrams 2023-12-24 10:26:38 +00:00
README.md Improve readme 2023-12-24 10:19:06 +00:00

Mike's DAG

A simple collection that implements a directed acyclic graph (DAG) in pure Python. Most suitable as a dependency graph.

Since trees are a special kind of DAG, this package can also be used to represent trees.

Introduction

The Dag class represent a directed acyclic graph (DAG), just like this one:

graph BT
    0
    1

    A --> 0
    B --> 1

    Q --> A
    Q --> B

    X --> Q
    Y --> Q

A DAG consists of nodes and edges. In our example the nodes are 0, 1, 2, A, B, Q, X, and Y. The edges are directed and, thus, represented by the arrows between the nodes. They point from child to parent. In our example the nodes A and B are parents of node Q.

We call nodes 0 and 1 roots of our DAG, because they do not have any parents. Similarly X and Y are leaves, because they have no children. Very often you'll want to start iterating from the roots or leaves: because with the Dag class it's simple and efficient to follow the edges in both directions.

Probably you are familiar with both terms from the context of tree data structures. Good to know: every tree is a DAG; DAGs are a generalization of trees. As a consequence, you could use the Dag class to handle trees.

What nodes and edges mean or represent is totally up to you. For example, if a DAG represents the dependencies between Python packages, then its nodes are packages and an arrow points to its requirements. In this example the DAG could be used to efficiently find all sub-, sub-, etc. requirements of a package recursively.

With the Dag class you can use any hashable object as a node. This is the same requirement as for built-in data type set. So you can easily build DAGs from / with virtually any object inside your problem domain.

Using only built-in data types, such a DAG could be stored in Python by a collection, e.g. a set, of nodes and edges. In fact the DAG directly supports importing and exporting from such simple collections. As a consequence there's only a minimal lock-in effect: you can get your data in and out of DAGs in very simple ways.

As the term "acyclic" implies, DAGs may not contain cycles. With the Dag class, the user is responsible to prevent cycles. That is, the algorithms simply assume that there are no cycles and could fail (or hang in an infinte loop) otherwise.

Here's Pyhton code that creates our example DAG by instantiating the Dag class in two different ways:

# Instantiate the example DAG with an adjancency dict.
# Every dict key is a node. The dict values are its parents.
dag = Dag({
    0: set(),
    1: set(),
    'A': {0},
    'B': {1},
    'Q': {'A', 'B'},
    'X': {'Q'},
    'Y': {'Q'},
    })

# Instantiate the example DAG as sets of nodes and edges:
dag = DAG(
    # The set of nodes.
    # Only needs to contain nodes that have no incoming or outgoing edge.
    {0, 1, 'A', 'B', 'Q', 'X', 'Y'},

    # The edges as tuples of form (child, parent).
    {
      ('A', 0),
      ('B', 1),
      ('Q', 'A'),
      ('Q', 'B'),
      ('X', 'Q'),
      ('Y', 'Q'),
    }
)

Note: In the mathematical or computer science literature on graph theory you may find alternative terms for the ones used by this package. Nodes are often called vertices. Parents are also named predecessors and, finally, the terms arc or connection are a synonym for edge.

Features

  • Requires only Python >=3.7. No external dependencies.

  • A DAG is very similar to a standard library set. Whenever reasonably possible, we treat the DAG like a set of its nodes and/or edges. Similarities include:

    • Implements the abc.Collection protocol.

    • Every hashable object can be a node (vertex) of the DAG. This is the same requirement as standard for dict or set.

    • The interface of the DAG, for example, the function naming schema, is very close to that of standard collection set.

    • DAGs can easily be exported to / imported from standard data types like dictionaries, sets, and lists.

  • DAGs can easily be exported to common string representations (DOT format, Markdown / Mermaid, and Make).

  • It's easy to iterate over the DAG's nodes and/or edges.

  • Users are responsible to maintain the graph free from cycles.

  • DAGs can be reversed efficiently.

  • The DAG can be used efficiently as a dependency graph. It's very efficient to find the "roots",that is, the nodes with not parents (predecessors). Dependencies and nodes can be added and removed efficiently.

Development

This section is meant for contributors. Users of the DAG package should ignore this section.

  • Obviously, you need Git and Python >=3.7.

  • Optional: GNU Make >=4.2 to run tasks comfortably. This is most likely pre-installed or installable via your operating system, if you are on a POSIX system, e.g. Linux.

  • Optional: Direnv to automatically setup the Python environment when entering the project directory. Here are the installation instructions.

  • The package contains editor settings for Visual Studio Code, which enable few low-level code format settings matching the coding guidelines. Otherwise the project does not use an autoformatter. Still, it adheres to PEP8 guidelines with a line length of 80 characters.

  • PDM, the Python Development Master to manage dependencies and builds. Dependencies are project settings are stored in top-level file pyproject.toml.

    If you do not already have installed PDM, a very simple and unobtrusive way to get it is via pipx command

    pipx install pdm
    

Coding Conventions and Coding Style

All text files are stored UTF-8 encoded with LF line endings independent of the conventions established by the operating system. Please ensure that your editor is configured properly; for Visual Studio Code these settings are provided.

All public interfaces are annotated with typing hints in order to make usage easier.

The code base adheres to PEP8 with few small deviations:

  • Line length is 80, even for docstrings.

  • Some imports may not be placed on the top of the file, espcially if they are only used in special cases and expensive to import.

  • Some multi-line docstrings may start in a new line after the initial quotes, which can improve readability due to better alignment. Example:

    """"
    This is a multi-line docstring that starts in a new line after the quotes
    
    Another ...
    ...paragraph.
    """
    

Setting-up the Development Environment

If you have direnv installed, consider giving permission to execute the .envrc file provided by the project. Due to this "magic", the development environment, including virtual Python environments, is automatically setup by simply entering the project directory. Depending on your direnv configuration, you may need to repeat this step, for every fresh clone and when the .envrc file is changed (which is rather rarely the case).

direnv allow

In a fresh clone make sure you have new a virtual environment and install the (editable) package and its (development) dependencies:

pdm install

Then you should be able to run all builds and test with a simple

make all

which is roughly equivalent to running individual targets

make lint
make typecheck
make test
make package
make docs

Running make without argument prints help.

Some but not all make targets are also provided as pdm scripts. For example, you can run

pdm lint
pdm test

On Microsoft Windows it's recommended to develop in a POSIX-like environment in order to get theses command line tools running comfortably. For example, the Windows Subsystem for Linux (WSL) or MSYS2 (or the "Git Bash for Windows") offer such an environment.

Development Dependencies

All development dependenceies are installed as project dependencies in group dev. The most important development dependencies are: