Hiera hierarchies

Hiera looks up data by following a hierarchy — an ordered list of data sources.

Hierarchies are configured in a hiera.yaml configuration file. Each level of the hierarchy tells Hiera how to access some kind of data source. A hierarchy is usually organized like this:

---
version: 5
defaults:  # Used for any hierarchy level that omits these keys.
  datadir: data         # This path is relative to hiera.yaml's directory.
  data_hash: yaml_data  # Use the built-in YAML backend.

hierarchy:
  - name: "Per-node data"                   # Human-readable name.
    path: "nodes/%{trusted.certname}.yaml"  # File path, relative to datadir.
                                   # ^^^ IMPORTANT: include the file extension!

  - name: "Per-datacenter business group data" # Uses custom facts.
    path: "location/%{facts.whereami}/%{facts.group}.yaml"

  - name: "Global business group data"
    path: "groups/%{facts.group}.yaml"

  - name: "Per-datacenter secret data (encrypted)"
    lookup_key: eyaml_lookup_key   # Uses non-default backend.
    path: "secrets/%{facts.whereami}.eyaml"
    options:
      pkcs7_private_key: /etc/puppetlabs/puppet/eyaml/private_key.pkcs7.pem
      pkcs7_public_key:  /etc/puppetlabs/puppet/eyaml/public_key.pkcs7.pem

  - name: "Per-OS defaults"
    path: "os/%{facts.os.family}.yaml"

  - name: "Common data"
    path: "common.yaml"
In this example, every level configures the path to a YAML file on disk.

Hierarchies interpolate variables

Most levels of a hierarchy interpolate variables into their configuration:

path: "os/%{facts.os.family}.yaml"

The percent-and-braces %{variable} syntax is a Hiera interpolation token. It is similar to the Puppet language’s ${expression} interpolation tokens. Wherever you use an interpolation token, Hiera determines the variable’s value and inserts it into the hierarchy.

The facts.os.family uses the Hiera special key.subkey notation for accessing elements of hashes and arrays. It is equivalent to $facts['os']['family'] in the Puppet language but the 'dot' notation produces an empty string instead of raising an error if parts of the data is missing. Make sure that an empty interpolation does not end up matching an unintended path.

You can only interpolate values into certain parts of the config file. For more info, see the hiera.yaml format reference.

With node-specific variables, each node gets a customized set of paths to data. The hierarchy is always the same.

Hiera searches the hierarchy in order

After Hiera replaces the variables to make a list of concrete data sources, it checks those data sources in the order they were written.

Generally, if a data source doesn’t exist, or doesn’t specify a value for the current key, Hiera skips it and moves on to the next source, until it finds one that exists — then it uses it. Note that this is the default merge strategy, but does not always apply, for example, Hiera can use data from all data sources and merge the result.

Earlier data sources have priority over later ones. In the example above, the node-specific data has the highest priority, and can override data from any other level. Business group data is separated into local and global sources, with the local one overriding the global one. Common data used by all nodes always goes last.

That’s how Hiera’s “defaults, with overrides” approach to data works — you specify common data at lower levels of the hierarchy, and override it at higher levels for groups of nodes with special needs.

Layered hierarchies

Hiera uses layers of data with a hiera.yaml for each layer.

Each layer can configure its own independent hierarchy. Before a lookup, Hiera combines them into a single super-hierarchy: global → environment → module.

There is a fourth layer - default_hierarchy - that can be used in a module’s hiera.yaml. It only comes into effect when there is no data for a key in any of the other regular hierarchies
Assume the example above is an environment hierarchy (in the production environment). If we also had the following global hierarchy:
---
version: 5
hierarchy:
  - name: "Data exported from our old self-service config tool"
    path: "selfserve/%{trusted.certname}.json"
    data_hash: json_data
    datadir: data
And the NTP module had the following hierarchy for default data:
---
version: 5
hierarchy:
  - name: "OS values"
    path: "os/%{facts.os.name}.yaml"
  - name: "Common values"
    path: "common.yaml"
defaults:
  data_hash: yaml_data
  datadir: data

Then in a lookup for the ntp::servers key, thrush.example.com would use the following combined hierarchy:

  • <CODEDIR>/data/selfserve/thrush.example.com.json
  • <CODEDIR>/environments/production/data/nodes/thrush.example.com.yaml
  • <CODEDIR>/environments/production/data/location/belfast/ops.yaml
  • <CODEDIR>/environments/production/data/groups/ops.yaml
  • <CODEDIR>/environments/production/data/os/Debian.yaml
  • <CODEDIR>/environments/production/data/common.yaml
  • <CODEDIR>/environments/production/modules/ntp/data/os/Ubuntu.yaml
  • <CODEDIR>/environments/production/modules/ntp/data/common.yaml

The combined hierarchy works the same way as a layer hierarchy. Hiera skips empty data sources, and either returns the first found value or merges all found values.

By default, datadir refers to the directory named ‘data’ next to the hiera.yaml.

Tips for making a good hierarchy

  • Make a short hierarchy. Data files are easier to work with.
  • Use the roles and profiles method to manage less data in Hiera. Sorting hundreds of class parameters is easier than sorting thousands.

  • If the built-in facts don’t provide an easy way to represent differences in your infrastructure, make custom facts. For example, create a custom datacenter fact that is based on information particular to your network layout so that each datacenter is uniquely identifiable.

  • Give each environment – production, test, development – its own hierarchy.

Related topics: codedir, confdir.