Welcome to check_systemd’s documentation!

check_system is a Nagios / Icinga monitoring plugin to check systemd. This Python script will report a degraded system to your monitoring solution. It can also be used to monitor individual systemd services (with the -u, --unit parameter) and timers units (with the -t, --dead-timers parameter).

To learn more about the project, please visit the repository on Github.

Monitoring scopes

  • units: State of unites

  • timers: Timers

  • startup_time: Startup time

  • performance_data: Performance data

Data sources

  • D-Bus (dbus)

  • Command line interface (cli)

This plugin is based on a Python package named nagiosplugin. nagiosplugin has a fine-grained class model to separate concerns. A Nagios / Icinga plugin must perform these three steps: data acquisition, evaluation and presentation. nagiosplugin provides for this three steps three classes: Resource, Context, Summary. check_systemd extends this three model classes in the following subclasses:

Acquisition (Resource)

Evaluation (Context)

Presentation (Summary)

check_systemd.ActiveState

From the D-Bus interface of systemd documentation:

ActiveState contains a state value that reflects whether the unit is currently active or not. The following states are currently defined:

  • active,

  • reloading,

  • inactive,

  • failed,

  • activating, and

  • deactivating.

active indicates that unit is active (obviously…).

reloading indicates that the unit is active and currently reloading its configuration.

inactive indicates that it is inactive and the previous run was successful or no previous run has taken place yet.

failed indicates that it is inactive and the previous run was not successful (more information about the reason for this is available on the unit type specific interfaces, for example for services in the Result property, see below).

activating indicates that the unit has previously been inactive but is currently in the process of entering an active state.

Conversely deactivating indicates that the unit is currently in the process of deactivation.

alias of Literal[‘active’, ‘reloading’, ‘inactive’, ‘failed’, ‘activating’, ‘deactivating’]

check_systemd.SubState

From the D-Bus interface of systemd documentation:

SubState encodes states of the same state machine that ActiveState covers, but knows more fine-grained states that are unit-type-specific. Where ActiveState only covers six high-level states, SubState covers possibly many more low-level unit-type-specific states that are mapped to the six high-level states. Note that multiple low-level states might map to the same high-level state, but not vice versa. Not all high-level states have low-level counterparts on all unit types.

All sub states are listed in the file basic/unit-def.c of the systemd source code:

  • automount: dead, waiting, running, failed

  • device: dead, tentative, plugged

  • mount: dead, mounting, mounting-done, mounted,

    remounting, unmounting, remounting-sigterm, remounting-sigkill, unmounting-sigterm, unmounting-sigkill, failed, cleaning

  • path: dead, waiting, running, failed

  • scope: dead, running, abandoned, stop-sigterm,

    stop-sigkill, failed

  • service: dead, condition, start-pre, start,

    start-post, running, exited, reload, stop, stop-watchdog, stop-sigterm, stop-sigkill, stop-post, final-watchdog, final-sigterm, final-sigkill, failed, auto-restart, cleaning

  • slice: dead, active

  • socket: dead, start-pre, start-chown, start-post,

    listening, running, stop-pre, stop-pre-sigterm, stop-pre-sigkill, stop-post, final-sigterm, final-sigkill, failed, cleaning

  • swap: dead, activating, activating-done, active,

    deactivating, deactivating-sigterm, deactivating-sigkill, failed, cleaning

  • target:dead, active

  • timer: dead, waiting, running, elapsed, failed

alias of Literal[‘abandoned’, ‘activating-done’, ‘activating’, ‘active’, ‘auto-restart’, ‘cleaning’, ‘condition’, ‘deactivating-sigkill’, ‘deactivating-sigterm’, ‘deactivating’, ‘dead’, ‘elapsed’, ‘exited’, ‘failed’, ‘final-sigkill’, ‘final-sigterm’, ‘final-watchdog’, ‘listening’, ‘mounted’, ‘mounting-done’, ‘mounting’, ‘plugged’, ‘reload’, ‘remounting-sigkill’, ‘remounting-sigterm’, ‘remounting’, ‘running’, ‘start-chown’, ‘start-post’, ‘start-pre’, ‘start’, ‘stop-post’, ‘stop-pre-sigkill’, ‘stop-pre-sigterm’, ‘stop-pre’, ‘stop-sigkill’, ‘stop-sigterm’, ‘stop-watchdog’, ‘stop’, ‘tentative’, ‘unmounting-sigkill’, ‘unmounting-sigterm’, ‘unmounting’, ‘waiting’]

check_systemd.LoadState

src/basic/unit-def.c#L95-L103

From the D-Bus interface of systemd documentation:

LoadState contains a state value that reflects whether the configuration file of this unit has been loaded. The following states are currently defined:

  • loaded,

  • error and

  • masked.

loaded indicates that the configuration was successfully loaded.

error indicates that the configuration failed to load, the LoadError field contains information about the cause of this failure.

masked indicates that the unit is currently masked out (i.e. symlinked to /dev/null or suchlike).

Note that the LoadState is fully orthogonal to the ActiveState (see below) as units without valid loaded configuration might be active (because configuration might have been reloaded at a time where a unit was already active).

alias of Literal[‘stub’, ‘loaded’, ‘not-found’, ‘bad-setting’, ‘error’, ‘merged’, ‘masked’]

class check_systemd.T

For UnitCache. Can not be an inner typevar because of pylance

alias of TypeVar(‘T’)

class check_systemd.Logger[source]

Bases: object

A wrapper around the Python logging module with 3 debug logging levels.

  1. -d: info

  2. -dd: debug

  3. -ddd: verbose

set_level(level: int) None[source]
info(msg: str, *args: object) None[source]

Log on debug level 1: -d

debug(msg: str, *args: object) None[source]

Log on debug level 2: -dd

verbose(msg: str, *args: object) None[source]

Log on debug level 3: -ddd

show_levels() None[source]
class check_systemd.Source[source]

Bases: object

class BaseUnit[source]

Bases: object

name: str

The name of the system unit, for example nginx.service. In the command line table of the command systemctl list-units is the column containing unit names titled with “UNIT”.

class Unit(name: str, active_state: object | None = None, sub_state: object | None = None, load_state: object | None = None)[source]

Bases: BaseUnit

This class bundles all state related informations of a systemd unit in a object. This class is inherited by the class DbusUnit and the attributes are overwritten by properties.

active_state: Literal['active', 'reloading', 'inactive', 'failed', 'activating', 'deactivating']
sub_state: Literal['abandoned', 'activating-done', 'activating', 'active', 'auto-restart', 'cleaning', 'condition', 'deactivating-sigkill', 'deactivating-sigterm', 'deactivating', 'dead', 'elapsed', 'exited', 'failed', 'final-sigkill', 'final-sigterm', 'final-watchdog', 'listening', 'mounted', 'mounting-done', 'mounting', 'plugged', 'reload', 'remounting-sigkill', 'remounting-sigterm', 'remounting', 'running', 'start-chown', 'start-post', 'start-pre', 'start', 'stop-post', 'stop-pre-sigkill', 'stop-pre-sigterm', 'stop-pre', 'stop-sigkill', 'stop-sigterm', 'stop-watchdog', 'stop', 'tentative', 'unmounting-sigkill', 'unmounting-sigterm', 'unmounting', 'waiting']
load_state: Literal['stub', 'loaded', 'not-found', 'bad-setting', 'error', 'merged', 'masked']
convert_to_exitcode() ServiceState[source]

Convert the different systemd states into a Nagios compatible exit code.

Returns:

A Nagios compatible exit code: 0, 1, 2, 3

class Timer(name: str, last: int | None, next: int | None)[source]

Bases: BaseUnit

# Dbus doc # readonly t NextElapseUSecRealtime = …; # readonly t NextElapseUSecMonotonic = …; # readonly t LastTriggerUSec = …; # readonly t LastTriggerUSecMonotonic = …; # NextElapseUSecRealtime contains the next elapsation point on the CLOCK_REALTIME clock in miscroseconds since the epoch, or 0 if this timer event does not include at least one calendar event.

# Similarly, NextElapseUSecMonotonic contains the next elapsation point on the CLOCK_MONOTONIC clock in microseconds since the epoch, or 0 if this timer event does not include at least one monotonic event.

# https://github.com/systemd/systemd/blob/e0270bab43a4c37028ee32ae853037df22999767/src/systemctl/systemctl-list-units.c#L668-L671’ # TABLE_TIMESTAMP, t->next_elapse, # TABLE_TIMESTAMP_LEFT, t->next_elapse, # TABLE_TIMESTAMP, t->last_trigger.realtime, # TABLE_TIMESTAMP_RELATIVE_MONOTONIC, t->last_trigger.monotonic,

# https://github.com/systemd/systemd/blob/e0270bab43a4c37028ee32ae853037df22999767/src/core/dbus-timer.c#L111 # SD_BUS_PROPERTY(“NextElapseUSecRealtime”, “t”, bus_property_get_usec, offsetof(Timer, next_elapse_realtime), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE), # SD_BUS_PROPERTY(“NextElapseUSecMonotonic”, “t”, property_get_next_elapse_monotonic, 0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE), # BUS_PROPERTY_DUAL_TIMESTAMP(“LastTriggerUSec”, offsetof(Timer, last_trigger), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),

name: str

The name of the system unit, for example nginx.service. In the command line table of the command systemctl list-units is the column containing unit names titled with “UNIT”.

last: int | None

Timestamp

next: int | None

Timestamp

class NameFilter(unit_names: Sequence[str] = ())[source]

Bases: object

This class stores all system unit names (e. g. nginx.service or fstrim.timer) and provides a interface to filter the names by regular expressions.

static match(unit_name: str, regexes: str | Sequence[str]) bool[source]

Match multiple regular expressions against a unit name.

Parameters:
  • unit_name – The unit name to be matched.

  • regexes – A single regular expression (include='.*service') or a list of regular expressions (include=('.*service', '.*mount')).

Returns:

True if one regular expression matches

add(unit_name: str) None[source]

Add one unit name.

Parameters:

unit_name – The name of the unit, for example apt.timer.

get() set[str][source]

Get all stored unit names.

filter(include: str | Sequence[str] | None = None, exclude: str | Sequence[str] | None = None) Generator[str, None, None][source]

List all unit names or apply filters (include or exclude) to the list of unit names.

Parameters:
  • include – If the unit name matches the provided regular expression, it is included in the list of unit names. A single regular expression (include='.*service') or a list of regular expressions (include=('.*service', '.*mount')).

  • exclude – If the unit name matches the provided regular expression, it is excluded from the list of unit names. A single regular expression (exclude='.*service') or a list of regular expressions (exclude=('.*service', '.*mount')).

class Cache[source]

Bases: Generic[T]

This class is a container class for systemd units.

add(name: str, unit: T) None[source]
get(name: str | None = None) T | None[source]
filter(include: str | Sequence[str] | None = None, exclude: str | Sequence[str] | None = None) Generator[T, None, None][source]

List all units or apply filters (include or exclude) to the list of unit.

Parameters:
  • include – If the unit name matches the provided regular expression, it is included in the list of unit names. A single regular expression (include='.*service') or a list of regular expressions (include=('.*service', '.*mount')).

  • exclude – If the unit name matches the provided regular expression, it is excluded from the list of unit names. A single regular expression (exclude='.*service') or a list of regular expressions (exclude=('.*service', '.*mount')).

property count: int
count_by_states(states: Sequence[str], include: str | Sequence[str] | None = None, exclude: str | Sequence[str] | None = None) dict[str, int][source]
static get_interface_name_from_unit_name(unit_name: str) str[source]
Parameters:

name – for example apt-daily.service

Returns:

org.freedesktop.systemd1.Service

static get_interface_name_from_object_path(object_path: str) str[source]
Parameters:

object_path – for example /org/freedesktop/systemd1/unit/apt_2ddaily_2eservice

Returns:

org.freedesktop.systemd1.Service

static is_unit_type(unit_name_or_object_path: str, type_name: Literal['service', 'socket', 'target', 'device', 'mount', 'automount', 'timer', 'swap', 'path', 'slice', 'scope']) bool[source]
set_user(user: bool) None[source]
abstract get_unit(name: str) Unit[source]
property units: Cache[Unit]
abstract property startup_time: float | None
property timers: Cache[Timer]
class check_systemd.CliSource[source]

Bases: Source

class Table(stdout: str)[source]

Bases: object

This class reads the text tables that some systemd commands like systemctl list-units or systemctl list-timers produce.

header_row: str
column_lengths: list[int]
columns: list[str]
body_rows: list[str]
property row_count: int

The number of rows. Only the body rows are counted. The header row is not taken into account.

check_header(column_header: Sequence[str]) None[source]

Check if the specified column names are present in the header row of the text table. Raise an exception if not.

Parameters:

column_headers – The expected column headers (for example ('UNIT', 'LOAD', 'ACTIVE'))

get_row(row_number: int) dict[str, str][source]

Retrieve a table row as a dictionary. The keys are taken from the header row. The first row number is 0.

Parameters:

row_number – The index number of the table row starting at 0.

list_rows() Generator[dict[str, str], None, None][source]

List all rows.

get_unit(name: str) Unit[source]
property startup_time: float | None
class check_systemd.GiSource[source]

Bases: CliSource

Data source via D-Bus using the gi (GObject introspection) package.

TODO Intherit from DataSource if the full Dbus Api is implemented

This class holds the main entry point object of the D-Bus systemd API. See the section The Manager Object in the systemd D-Bus API.

class UnitTuple(name, description, load_state, active_state, sub_state, followed_by, unit_object_path, job_id, job_type, job_object_path)[source]

Bases: NamedTuple

name: str

The primary unit name as string, for example dbus.service

description: str

The human readable description string, for example D-Bus System Message Bus

load_state: Literal['stub', 'loaded', 'not-found', 'bad-setting', 'error', 'merged', 'masked']

The load state (i.e. whether the unit file has been loaded successfully), for example loaded

active_state: Literal['active', 'reloading', 'inactive', 'failed', 'activating', 'deactivating']

The active state (i.e. whether the unit is currently started or not), for example active

sub_state: Literal['abandoned', 'activating-done', 'activating', 'active', 'auto-restart', 'cleaning', 'condition', 'deactivating-sigkill', 'deactivating-sigterm', 'deactivating', 'dead', 'elapsed', 'exited', 'failed', 'final-sigkill', 'final-sigterm', 'final-watchdog', 'listening', 'mounted', 'mounting-done', 'mounting', 'plugged', 'reload', 'remounting-sigkill', 'remounting-sigterm', 'remounting', 'running', 'start-chown', 'start-post', 'start-pre', 'start', 'stop-post', 'stop-pre-sigkill', 'stop-pre-sigterm', 'stop-pre', 'stop-sigkill', 'stop-sigterm', 'stop-watchdog', 'stop', 'tentative', 'unmounting-sigkill', 'unmounting-sigterm', 'unmounting', 'waiting']

The sub state (a more fine-grained version of the active state that is specific to the unit type, which the active state is not), for example running

followed_by: str

A unit that is being followed in its state by this unit, if there is any, otherwise the empty string, for example ''

unit_object_path: str

The unit object path, for example /org/freedesktop/systemd1/unit/dbus_2eservice

job_id: str

If there is a job queued for the job unit, the numeric job id, 0 otherwise, for example 0

job_type: str

The job type as string, for example ''

job_object_path: str

The job object path, for example /

class Proxy(object_path: str, interface_name: str, user: bool = False)[source]

Bases: object

get(name: str) Any[source]
property object_path: str
property interface_name: str
class ManagerProxy(user: bool = False)[source]

Bases: Proxy

property default_target: str
property userspace_timestamp_monotonic: int
get_object_path(name: str) str[source]
property units: list[UnitTuple]
class UnitProxy(name: str | None = None, object_path: str | None = None, user: bool = False)[source]

Bases: Proxy

property active_state: str
property sub_state: str
property load_state: str
property active_enter_timestamp_monotonic: int
class TimerProxy(name: str | None = None, object_path: str | None = None, user: bool = False)[source]

Bases: UnitProxy

property last: int

Timestamp in microseconds

property next: int

Timestamp in microseconds

classmethod get_manager(user: bool = False) ManagerProxy[source]
property manager: ManagerProxy
property startup_time: float | None

src/analyze/analyze-time-data.c <https://github.com/systemd/systemd/blob/1f901c24530fb9b111126381a6ea101af8040e65/src/analyze/analyze-time-data.c#L141-L197>

class check_systemd.OptionContainer[source]

Bases: object

This class has the same attributes as the Namespace instance returned by the argparse package.

verbose: int
debug: int
ignore_inactive_state: bool
include_unit: str | None
include_type: list[str]
exclude_unit: list[str]
exclude_type: list[str]
expected_state: str | None
scope_timers: bool
timers_warning: int
timers_critical: int
scope_startup_time: bool
warning: int

-w, --warning

critical: int

-c, --critical

user: bool

--user

performance_data: bool
include: list[str]
exclude: list[str]
data_source: Literal['dbus', 'cli'] | None
check_systemd.opts

We make is variable global to be able to access the command line arguments everywhere in the plugin. In this variable the result of parse_args() is stored. It is an instance of the argparse.Namespace class. This variable is initialized in the main function. The variable is intentionally not named args to avoid confusion with *args (Non-Keyword Arguments).

exception check_systemd.CheckSystemdError[source]

Bases: Exception

Base class for exceptions in this module. All exceptions are caught by the decorator @nagiosplugin.guarded() on the main function and printed out nicely.

exception check_systemd.CheckSystemdRegexpError[source]

Bases: CheckSystemdError

Raised when an invalid regular expression is specified.

class check_systemd.SystemdUnitTypesList(*args: str)[source]

Bases: MutableSequence[str]

unit_types: list[str]
insert(index: int, value: str) None[source]

S.insert(index, value) – insert value before index

convert_to_regexp()[source]
class check_systemd.UnitsResource(units: Cache[Unit])[source]

Bases: Resource

units: Cache[Unit]
probe() Generator[Metric, None, None][source]

Query system state and return metrics.

This is the only method called by the check controller. It should trigger all necessary actions and create metrics.

Returns:

list of Metric objects, or generator that emits Metric objects, or single Metric object

class check_systemd.UnitsContext[source]

Bases: Context

evaluate(metric: Metric, resource: Resource) Result[source]

Determines state of a given metric.

Parameters:
  • metric – associated metric that is to be evaluated

  • resource – resource that produced the associated metric (may optionally be consulted)

Returns:

Result

class check_systemd.TimersResource(source: Source)[source]

Bases: Resource

Resource that calls systemctl list-timers --all on the command line to get informations about dead / inactive timers. There is one type of systemd “degradation” which is normally not detected: dead / inactive timers.

Parameters:

excludes (list) – A list of systemd unit names to exclude from the checks.

name
source: Source
probe() Generator[Metric, None, None][source]

Query system state and return metrics.

This is the only method called by the check controller. It should trigger all necessary actions and create metrics.

Returns:

list of Metric objects, or generator that emits Metric objects, or single Metric object

class check_systemd.TimersContext[source]

Bases: Context

evaluate(metric: Metric, resource: Resource)[source]

Determines state of a given metric.

Parameters:
  • metric – associated metric that is to be evaluated

  • resource – resource that produced the associated metric (may optionally be consulted)

Returns:

Result

class check_systemd.StartupTimeResource(source: Source)[source]

Bases: Resource

Resource that calls systemd-analyze on the command line to get informations about the startup time.

src/analyze/analyze-time-data.c

probe() Generator[Metric, None, None][source]

Query system state and return metrics.

This is the only method called by the check controller. It should trigger all necessary actions and create metrics.

Returns:

list of Metric objects, or generator that emits Metric objects, or single Metric object

class check_systemd.StartupTimeContext[source]

Bases: ScalarContext

performance(metric: Metric, resource: Resource) Performance | None[source]

Derives performance data.

The metric’s attributes are combined with the local warning and critical ranges to get a fully populated Performance object.

Parameters:
  • metric – metric from which performance data are derived

  • resource – not used

Returns:

Performance object

class check_systemd.PerformanceDataResource(units: Cache[Unit])[source]

Bases: Resource

units: Cache[Unit]
probe() Generator[Metric, None, None][source]

Query system state and return metrics.

This is the only method called by the check controller. It should trigger all necessary actions and create metrics.

Returns:

list of Metric objects, or generator that emits Metric objects, or single Metric object

class check_systemd.PerformanceDataContext[source]

Bases: Context

performance(metric: Metric, resource: Resource) Performance[source]

Derives performance data from a given metric.

Parameters:
  • metric – associated metric from which performance data are derived

  • resource – resource that produced the associated metric (may optionally be consulted)

Returns:

Perfdata object

class check_systemd.SystemdSummary[source]

Bases: Summary

Format the different status lines. A subclass of nagiosplugin.Summary.

ok(results: Results) str[source]

Formats status line when overall state is ok.

Parameters:

resultsResults container

Returns:

status line

problem(results: Results) str[source]

Formats status line when overall state is not ok.

Parameters:

resultsResults container

Returns:

status line

verbose(results: Results) list[str][source]

Provides extra lines if verbose plugin execution is requested.

Parameters:

resultsResults container

Returns:

list of strings

check_systemd.convert_to_regexp_list(regexp: Sequence[str] | None = None, unit_names: str | Sequence[str] | None = None, unit_types: Sequence[str] | None = None) set[str][source]
check_systemd.get_argparser() ArgumentParser[source]
check_systemd.normalize_argparser(opts: Namespace) OptionContainer[source]
check_systemd.main() None[source]

The main entry point of the monitoring plugin. First the command line arguments are read into the variable opts. The configuration of this opts object decides which instances of the Resource, Context and Summary subclasses are assembled in a list called tasks. This list is passed the main class of the nagiosplugin library: the Check class.

Indices and tables