| .vscode | ||
| bin/make | ||
| docs | ||
| src/mike_dag | ||
| tests | ||
| .envrc | ||
| .flake8 | ||
| .gitignore | ||
| CHANGELOG.md | ||
| Makefile | ||
| mkdocs.yml | ||
| pdm.lock | ||
| pyproject.toml | ||
| README.md | ||
Mike's DAG
A simple collection that implements a directed acyclic graph (DAG) in pure Python. Most suitable as a dependency graph.
Since trees are a special kind of DAG, this package can also be used to represent trees.
Introduction
The Dag class represent a directed acyclic graph (DAG), just
like this one:
graph BT
0
1
A --> 0
B --> 1
Q --> A
Q --> B
X --> Q
Y --> Q
A DAG consists of nodes and edges. In our example the nodes are 0, 1,
2, A, B, Q, X, and Y. The edges are directed and, thus,
represented by the arrows between the nodes. They point from child to parent.
In our example the nodes A and B are parents of node Q.
We call nodes 0 and 1 roots of our DAG, because they do not have any
parents. Similarly X and Y are leaves, because they have no children. Very
often you'll want to start iterating from the roots or leaves: because with the
Dag class it's simple and efficient to follow the edges in both directions.
Probably you are familiar with both terms from the context of tree data
structures. Good to know: every tree is a DAG; DAGs are a generalization of
trees. As a consequence, you could use the Dag class to handle trees.
What nodes and edges mean or represent is totally up to you. For example, if a DAG represents the dependencies between Python packages, then its nodes are packages and an arrow points to its requirements. In this example the DAG could be used to efficiently find all sub-, sub-, etc. requirements of a package recursively.
With the Dag class you can use any hashable object as a node. This is the same
requirement as for built-in data type set. So you can easily build DAGs from /
with virtually any object inside your problem domain.
Using only built-in data types, such a DAG could be stored in Python by a
collection, e.g. a set, of nodes and edges. In fact the DAG directly
supports importing and exporting from such simple collections. As a consequence
there's only a minimal lock-in effect: you can get your data in and out of DAGs
in very simple ways.
As the term "acyclic" implies, DAGs may not contain cycles. With the Dag
class, the user is responsible to prevent cycles. That is, the algorithms simply
assume that there are no cycles and could fail (or hang in an infinte loop)
otherwise.
Here's Pyhton code that creates our example DAG by instantiating the Dag class
in two different ways:
# Instantiate the example DAG with an adjancency dict.
# Every dict key is a node. The dict values are its parents.
dag = Dag({
0: set(),
1: set(),
'A': {0},
'B': {1},
'Q': {'A', 'B'},
'X': {'Q'},
'Y': {'Q'},
})
# Instantiate the example DAG as sets of nodes and edges:
dag = DAG(
# The set of nodes.
# Only needs to contain nodes that have no incoming or outgoing edge.
{0, 1, 'A', 'B', 'Q', 'X', 'Y'},
# The edges as tuples of form (child, parent).
{
('A', 0),
('B', 1),
('Q', 'A'),
('Q', 'B'),
('X', 'Q'),
('Y', 'Q'),
}
)
Note: In the mathematical or computer science literature on graph theory you may find alternative terms for the ones used by this package. Nodes are often called vertices. Parents are also named predecessors and, finally, the terms arc or connection are a synonym for edge.
Features
-
Requires only Python >=3.7. No external dependencies.
-
A DAG is very similar to a standard library set. Whenever reasonably possible, we treat the DAG like a set of its nodes and/or edges. Similarities include:
-
Implements the
abc.Collectionprotocol. -
Every hashable object can be a node (vertex) of the DAG. This is the same requirement as standard for
dictorset. -
The interface of the DAG, for example, the function naming schema, is very close to that of standard collection
set. -
DAGs can easily be exported to / imported from standard data types like dictionaries, sets, and lists.
-
-
DAGs can easily be exported to common string representations (DOT format, Markdown / Mermaid, and Make).
-
It's easy to iterate over the DAG's nodes and/or edges.
-
Users are responsible to maintain the graph free from cycles.
-
DAGs can be reversed efficiently.
-
The DAG can be used efficiently as a dependency graph. It's very efficient to find the "roots",that is, the nodes with not parents (predecessors). Dependencies and nodes can be added and removed efficiently.
Development
This section is meant for contributors. Users of the DAG package should ignore this section.
Recommended Development Tooling
-
Optional: GNU Make >=4.2 to run tasks comfortably. This is most likely pre-installed or installable via your operating system, if you are on a POSIX system, e.g. Linux.
-
Optional: Direnv to automatically setup the Python environment when entering the project directory. Here are the installation instructions.
-
The package contains editor settings for Visual Studio Code, which enable few low-level code format settings matching the coding guidelines. Otherwise the project does not use an autoformatter. Still, it adheres to PEP8 guidelines with a line length of 80 characters.
-
PDM, the Python Development Master to manage dependencies and builds. Dependencies are project settings are stored in top-level file
pyproject.toml.If you do not already have installed PDM, a very simple and unobtrusive way to get it is via
pipxcommandpipx install pdm
Coding Conventions and Coding Style
All text files are stored UTF-8 encoded with LF line endings independent of the conventions established by the operating system. Please ensure that your editor is configured properly; for Visual Studio Code these settings are provided.
All public interfaces are annotated with typing hints in order to make usage easier.
The code base adheres to PEP8 with few small deviations:
-
Line length is 80, even for docstrings.
-
Some imports may not be placed on the top of the file, espcially if they are only used in special cases and expensive to import.
-
Some multi-line docstrings may start in a new line after the initial quotes, which can improve readability due to better alignment. Example:
"""" This is a multi-line docstring that starts in a new line after the quotes Another ... ...paragraph. """
Setting-up the Development Environment
If you have direnv installed, consider giving permission to execute the
.envrc file provided by the project. Due to this "magic", the development
environment, including virtual Python environments, is automatically setup by
simply entering the project directory. Depending on your direnv configuration,
you may need to repeat this step, for every fresh clone and when the .envrc
file is changed (which is rather rarely the case).
direnv allow
In a fresh clone make sure you have new a virtual environment and install the (editable) package and its (development) dependencies:
pdm install
Then you should be able to run all builds and test with a simple
make all
which is roughly equivalent to running individual targets
make lint
make typecheck
make test
make package
make docs
Running make without argument prints help.
Some but not all make targets are also provided as pdm scripts. For example,
you can run
pdm lint
pdm test
On Microsoft Windows it's recommended to develop in a POSIX-like environment in order to get theses command line tools running comfortably. For example, the Windows Subsystem for Linux (WSL) or MSYS2 (or the "Git Bash for Windows") offer such an environment.
Development Dependencies
All development dependenceies are installed as project dependencies in group
dev. The most important development dependencies are: