Building a dynamic Ansible inventory from Proxmox
Written by Emmanuel BENOÎT - Created on 2022-08-07
I am in the process of slowly migrating the VMs on my home servers from my old, custom configuration management scripts to a set of Ansible scripts. However, because I'm quite lazy, I don't want to have to update the playbook's inventories manually if I can avoid it. Ideally, I should only have to write a service's actual configuration, then run the playbook. The appropriate VMs should be selected automatically from the Proxmox cluster and used as the targets.
TL;DR version: trying to do as much of this as possible using only Ansible and YAML files results in a highly convoluted mess.
Target inventory structure
As far as inventory groups go, I want to have a somewhat nested hierarchy. First
and foremost, all VMs that are managed by Ansible will be found under the
managed
group, which is organized with the following subgroups.
by_network
- a group that contains sub-groups corresponding to the various VLANs.by_environment
- a group that contains sub-groups for the various environments I have (here at home that'sdev
andprod
)by_failover_stack
- VMs which are part of services that support HA will be added to sub-groups namedfostack_{X}
(whereX
is a number) ; other services will be added to theno_failover
sub-group.svc_{service}
- these groups should be generated automatically based on the VM's metadata. They correspond to all VMs implementing a specific service. They contain two types of sub-groups.svin_{service}_{instance}
- an instance of the service.svcm_{service}_{component}
- a component of the service. This will be used if a service requires multiple VMs running different parts of an application (e.g. database vs webserver). Components may be nested into other components with a maximal depth of 2.
For example, a pair of clusters implementing LDAP services (one for testing purposes, one for actual use) might be organized as shown below :
managed
|- by_network
| |- net_dev -> [vm0, vm1, vm2, vm3, vm4]
| |- net_infra -> [vm5, vm6, vm7, vm8, vm9]
|- by_environment
| |- env_dev -> [vm0, vm1, vm2, vm3, vm4]
| |- env_prod -> [vm5, vm6, vm7, vm8, vm9]
|- by_failover_stack
| |- fostack_1 -> [vm0, vm2, vm5, vm7]
| |- fostack_2 -> [vm1, vm3, vm6, vm8]
| |- no_failover -> [vm4, vm9]
|- svc_ldap
|- svin_ldap_dev -> [vm0, vm1, vm2, vm3, vm4]
|- svin_ldap_prod -> [vm5, vm6, vm7, vm8, vm9]
|- svcm_ldap_front -> [vm0, vm1, vm5, vm6]
|- svcm_ldap_ldap
|- svcm_ldap_roldap -> [vm2, vm3, vm7, vm8]
|- svcm_ldap_rwldap -> [vm5, vm9]
Problems
There are two main problems that need to be solved here.
First, Proxmox has very limited support for VM metadata. It supports "tags", which is a list of words with rather strong constraints on which characters are allowed, and as far as I know can only be set or read through the API or command line, but these would be insufficient for what I need to do. In addition, while I have no intention of managing my metadata manually, not having it in the GUI at all is a bit of a pain.
On the Ansible side, While there is a community plugin which can
read inventory from Proxmox,
and a core plugin that can
construct groups from various variables,
the latter cannot construct empty groups (e.g. the svc_ldap
group in the
example above), and there are limitations in the way facts can be generated
by both plugins - chiefly, none of the facts being generated may refer to
another such fact.
Implementation
In the following sections, I will implement an inventory which generates the expected structure from the Proxmox inventory. I have created a repository on GitHub which contains the example, with each commit in the repository corresponding to the steps below.
Static structure
The first file of the inventory should generate the static parts of the
structure. This is done using a simple inventory file ; that file must be read
before the rest, so we will name it 00-static-structure.yml
.
all:
children:
managed:
children:
by_environment:
by_failover_stack:
children:
no_failover:
by_network:
Testing at this point using ansible-inventory --playbook-dir . --graph
shows
the groups above plus the additional ungrouped
group.
Fetching inventory from Proxmox
Now we need to fetch the list of VMs and their associated metadata from the
Proxmox cluster using the community.general.proxmox
plugin. We need Ansible to
run it right after loading the static groups, so its name will start with 01-
.
In addition, the plugin requires the name of the file to end with
.proxmox.yml
.
We will configure the plugin to fetch all facts and write them to variables with
the proxmox__
prefix. Similarly, groups generated by the plugin will use the
same prefix.
plugin: community.general.proxmox
url: https://proxmox.example.org:8006
validate_certs: false
user: test@pve
password: ...
want_facts: true
facts_prefix: proxmox__
group_prefix: proxmox__
want_proxmox_nodes_ansible_host: false
If the Ansible configuration is used to restrict the list of plugins that can parse the inventory, it should be modified as well:
[inventory]
enable_plugins = community.general.proxmox, yaml
And it might be necessary to install the requests
Python module
(pip install requests
in the same venv as Ansible should work).
Once this is done, and assuming the url
, user
and password
are configured
appropriately, ansible-inventory
should display both the static structure from
the section above and the VMs and groups that were fetched from the Proxmox
cluster :
@all:
|--@managed
| |--@by_environment:
| |--@by_failover_stack:
| | |--@no_failover:
| |--@by_network:
|--@proxmox__all_lxc:
|--@proxmox__all_qemu:
| |--vm1
| |--vm2
| |--vm3
| ...
|--@ungrouped:
In addition, using ansible-inventory --host
to display the facts for a VM
should show a bunch of entries that correspond to the VM's settings :
{
"proxmox__agent": "1",
"proxmox__boot": {
"order": "ide2;scsi0"
},
"proxmox__cores": 4,
"proxmox__cpu": "kvm64",
"proxmox__description": "something",
// ...
"proxmox__net0": {
"bridge": "vmbr0",
"firewall": "1",
"tag": "16",
"virtio": "12:23:34:45:56:67"
},
// ...
}
Storing metadata on the Proxmox cluster
As I mentionned in the introduction, VM tags are not sufficient for what we need to do. However, each VM can have arbitrary Markdown associated with it. This text can be seen in the "Notes" part of the Proxmox GUI.
One solution to the problem of storing arbitrary metadata would be to store it
as JSON directly in the notes. It can then be read from the
proxmox__description
variable.
However, this approach is insufficient in two ways. First, the JSON itself is quite unreadable on the GUI, which renders the idea of having it visible there moot. Second, it makes adding human-readable notes impossible.
In order to solve that, we could surround the section of the notes that contain the JSON notes with the Markdown code block marker. This is still not enough, as it would prevent the notes from containing any other code block.
Instead, I chose to use the following structure:
(arbitrary Markdown here)
```ansible
{
"service": "ldap",
"instance": "dev",
"component": "ldap",
"subcomponent": "roldap",
"fostack": 1
}
```
(more Markdown here because why not)
Because of the ansible
marker, it is possible to split the description right
at the start of the block, then using the unmodified marker to remove the end
of the description. The resulting string can then be parsed from JSON.
This can be achieved by adding a compose
section to the Proxmox plugin
configuration.
compose:
inv__data: >-
( ( proxmox__description | split( '```ansible' ) )[1]
| split( '```' ) )[0]
| from_json
When the notes contain a block that follows the right format, the plugin will
create a inv__data
variable which will contain the parsed data. If the format
is incorrect, or if there is no description, or if the block contains invalid
JSON, the variable will simply not be defined (this is due to the Proxmox
inventory plugin's strict
option defaulting to false
).
It is possible to use the ansible-inventory
command to check for the variable
after having added a test on one of the VMs :
{
"inv__data": {
"component": "ldap",
"fostack": 1,
"instance": "dev",
"service": "ldap",
"subcomponent": "roldap"
},
"proxmox__agent": "1",
// ...
}
Computing facts
We now need to deduce a few things from the various data we gathered.
Copying metadata to top-level variables
Because the inv__data
variable might be undefined, we will copy some of its
contents to separate variables to avoid having to write (inv__data|default({}))
for all accesses. This will be done in the 02-copy-metadata.yml
inventory file,
using the constructed
plugin. Since it is working in non-strict mode, the various
variables will not be generated if inv__data
doesn't exist.
plugin: constructed
strict: false
compose:
inv__component: inv__data.component
inv__fostack: inv__data.fostack
inv__instance: inv__data.instance
inv__service: inv__data.service
inv__subcomponent: inv__data.subcomponent
It will be necessary to enable the constructed
plugin in the Ansible
configuration for this to work :
[inventory]
enable_plugins = constructed, community.general.proxmox, yaml
Should this VM be managed?
The next file, 03-check-managed.yml
, will create an _inv__managed
variable
if the metadata includes a service name and an instance name, and if the first
network interface is connected. When it is defined, this variable will always
contain an empty string. This allows us to use it while defining other variables
or groups. If it exists, adding its contents to some variable will have no
effect. If it doesn't, Jinja evaluation will fail, causing group or variable
creation to be skipped.
In order to do this, we need to use Jinja conditionnals in addition to
expressions. The constructed
plugin's compose
block normally doesn't allow
that, but it is possible to do it anyway. In fact, Ansible simply prefixes the
expression with {{
and suffixes it with }}
, so it is possible to terminate
these expressions and add conditionnals.
Here is the 03-check-managed.yml
file, which implements that.
plugin: constructed
strict: false
compose:
_inv__managed: >-
( inv__instance and inv__service ) | ternary( '' , '' )
}}{% if proxmox__net0.link_down | default("0") == "1"
%}{{ this_variable_does_not_exist_and_so_inv_managed_will_not_be_created
}}{% endif
%}{{ ''
The first line of the definition relies on the fact that trying to use
inv__instance
or inv__service
in an expression will cause the variable
to be skipped if any of them is missing.
The second line exits the expression so a conditional can be used. It is however necessary to re-enter an expression and provide something valid, which is done by the last line.
Finally, the very long variable name in the expression references an undefined
variable, and will only be executed if the condition is right, causing the
definition for _inv__managed
to be skipped.
Basic groups
At this point, we are about ready to start adding our VMs to the network,
environment and failover stack groups. We will create a new inventory file
called 04-env-fo-net-groups.yml
to handle that using the constructed
plugin.
First, we will use a lookup table to determine which network the VM is on based
on its net0
interface's VLAN tag:
plugin: constructed
strict: false
compose:
inv__network: >
{
"30": "infra",
"31": "dmz",
"32": "pubapps",
"33": "intapps",
"34": "users",
"60": "dev",
}[ proxmox__net0.tag | default("") ]
| default( "unknown" )
~ _inv__managed
The last line uses the _inv_managed
variable to prevent the variable from
being defined if the VM should not be managed. Since the variable normally
contains an empty string, using it has no other effect.
At that point, we can create the network-based group:
keyed_groups:
- prefix: net
key: inv__network
parent_group: by_network
The environment can be computed by checking for a environment
field in the
original metadata. Failing that, the VM will be assigned to the prod
environment if its instance name is prod
, or to the dev
environment if it
isn't. We also need to reference _inv__managed
to prevent unmanaged VMs from
being added to the group.
compose:
# ...
inv__environment: >-
inv__data.environment
| default(
( inv__instance == "prod" ) | ternary( "prod", "dev" )
)
~ _inv__managed
keyed_groups:
# ...
- prefix: env
key: inv__environment
parent_group: by_environment
The last basic group to generate is based on the HA stack the VM is a part of,
if any. Note the default("")
used in the ternary
to prevent it from
referencing an undefined variable.
compose:
# ...
_inv__fostack_group: >-
( inv__fostack is defined )
| ternary(
"fostack_" ~ inv__fostack | default("") ,
"no_failover"
)
~ _inv__managed
keyed_groups:
# ...
- prefix: ''
key: _inv__fostack_group
parent_group: by_failover_stack
Generating empty intermediary groups
At this point we need to start working on creating the intermediary groups for the service itself and for its components if that feature is being used.
The main problem is that these groups must be created empty - we don't want our VMs to be added directly to them as it would cause variable precedence problems when we try to use them for actual configuration.
Sadly, neither the constructed
plugin, which we used above, nor the
generator
plugin (documented
here)
can be used to generate the empty groups we need, as both always add a host to
the groups that are created. In addition, generator
doesn't process the layer
names through Jinja. We need to write a custom plugin to generate the groups
we need.
Empty group generator
What we need is a relatively simple inventory plugin that will generate groups
with templated names and templated parents. It could be configured using
a list of groups, each described by a dictionary with a name
entry containing
a Jinja template, and a parents
key containing a list of Jinja templates (one
for each parent group). Each invidual template could return an empty string;
in the name
part it would cause the group to be skipped, and in the parents
list it would simply be ignored.
We will not cover the actual writing of the plugin here, but we will comment
some parts of the code. It can be found in the repository's
inventory_plugins/group_creator.py
file.
It starts by the "documentation", which Ansible uses to validate the plugin's configuration data and set defaults to the various options.
Following that, we define the plugin's class. Its main method, parse()
, is
shown below :
def parse(self, inventory, loader, path, cache=False):
super(InventoryModule, self).parse(inventory, loader, path, cache=cache)
self._read_config_data(path)
strict = self.get_option("strict")
for host in inventory.hosts:
host_vars = self.inventory.get_host(host).get_vars()
for group in self.get_option("groups"):
name = self._get_group_name(host, group['name'], host_vars, strict)
if not name:
continue
self.inventory.add_group(name)
for ptmpl in group.get("parents"):
parent = self._get_group_name(host, ptmpl, host_vars, strict)
if parent:
self.inventory.add_group(parent)
self.inventory.add_child(parent, name)
It goes through all known inventory hosts, and tries to generate groups based on
each of these host's facts. It then computes the parent groups' names, ensures
the parent groups exist, and registers the new group as a child. The
_get_group_name
method will simply apply the templates, either returning an
empty string or causing an exception if a problem occurs, depending on the
strict
option's value.
The plugin must be added to the enabled plugins in the Ansible configuration as well.
[inventory]
enable_plugins = constructed, community.general.proxmox, group_creator, yaml
Note: at this point, testing with ansible-inventory
really requires
the --playbook-dir .
option, as the tool will not find the plugin if it is
not present.
Creating the groups
We can create a new file in the inventory for the intermediary groups' creation. For all Ansible-managed VMs, we must ensure that the service group exists. We must also create component groups if components are defined, and sub-component groups if both components and sub-components are defined.
Creating the service group is pretty straightforward:
plugin: group_creator
strict: true
groups:
- name: >-
{{ 'svc_' ~ inv__service ~ _inv__managed }}
parents:
- managed
Component groups are about as straightforward. If no components are defined for
the current service, groups will not be created as the inv__component
variable
will fail to evaluate.
- name: >-
{{
'svcm_' ~ inv__service
~ '_' ~ inv__component
~ _inv__managed
}}
parents:
- 'svc_{{ inv__service }}'
Finally, if sub-components are in use, their groups must also be created. Doing it at this point will free us from having to specify the correct parent groups in the next step. Sub-component groups should be created if the VM is managed and has both a component and sub-component.
- name: >-
{{
'svcm_' ~ inv__service
~ '_' ~ inv__subcomponent
~ _inv__managed
~ ( inv__component | ternary('','') )
}}
parents:
- 'svcm_{{ inv__service }}_{{ inv__component }}'
Testing at that point should show the various groups that have been created. They should not contain any hosts.
@all:
|--@managed:
| |--@svc_ldap:
| | |--@svcm_ldap_front:
| | |--@svcm_ldap_ldap:
| | | |--@svcm_ldap_roldap:
| | | |--@svcm_ldap_rwldap:
Assigning VMs to service groups
We can proceed with assigning VMs to service groups using the constructed
plugin. This is done in the 06-hosts-in-service-groups.yml
file.
First we will add the hosts to instance groups under the main service groups.
As usual, _inv__managed
ensures that we only create groups from VMs we
actually need to and can manage.
compose:
_inv__instance_group: >-
inv__service ~ '_' ~ inv__instance ~ _inv__managed
keyed_groups:
- prefix: svin
key: _inv__instance_group
parent_group: "svc_{{ inv__service }}"
Next we need to add the VM to the group which corresponds to the component
or sub-component of the service. This should only be done if there is a
component. We do not need to specify a parent_group
in the group definition
as the hierarchy has already been defined by the group creation.
compose:
_inv__component_group: >-
inv__service ~ '_' ~ inv__subcomponent | default( inv__component )
~ _inv__managed
keyed_groups:
- prefix: svcm
key: _inv__component_group
Conclusion
This setup creates the structure we needed to create. However, achieving this is quite convoluted (7 YAML files and a Python plugin), and has to rely on quite a few hacks and side-effects - the "Jinja injection" used for things that Ansible expects to be a single Jinja expression being quite dirty. Given the complexity involved, it would probably be worth it to replace all steps following the Proxmox fetch with a single Python plugin that handles the whole process.