如何从Linux shell脚本解析YAML文件?

I wish to provide a structured configuration file which is as easy as possible for a non-technical user to edit (unfortunately it has to be a file) and so I wanted to use YAML. I can't find any way of parsing this from a Unix shell script however.

我希望提供一个结构化的配置文件，让非技术用户尽可能容易地进行编辑(不幸的是，它必须是一个文件)，所以我想使用YAML。但是，我无法从Unix shell脚本中找到任何解析这个问题的方法。

12 个解决方案

#1

My use case may or may not be quite the same as what this original post was asking, but it's definitely similar.

我的用例可能与这篇文章的要求完全相同，也可能不完全相同，但肯定是相似的。

I need to pull in some YAML as bash variables. The YAML will never be more than one level deep.

我需要引入一些YAML作为bash变量。YAML永远不会超过一个层次。

YAML looks like so:

YAML看起来像这样:

KEY:                value
ANOTHER_KEY:        another_value
OH_MY_SO_MANY_KEYS: yet_another_value
LAST_KEY:           last_value

Output like-a dis:

输出像说:

KEY="value"
ANOTHER_KEY="another_value"
OH_MY_SO_MANY_KEYS="yet_another_value"
LAST_KEY="last_value"

I achieved the output with this line:

我用这一行实现了输出:

sed -e 's/:[^:\/\/]/="/g;s/$/"/g;s/ *=/=/g' file.yaml > file.sh

s/:[^:\/\/]/="/g finds : and replaces it with =", while ignoring :// (for URLs)
s /:[^:\ / \]/ = " / g发现:并换成= ",而忽略:/ /(url)
s/$/"/g appends " to the end of each line
s/$/"/g附加在每一行的末尾
s/ *=/=/g removes all spaces before =
s/ *=/=/g删除=之前的所有空格

#2

191

Here is a bash-only parser that leverages sed and awk to parse simple yaml files:

这里有一个仅用于bash的解析器，它利用sed和awk解析简单的yaml文件:

function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
      }
   }'
}

It understands files such as:

它理解文件，例如:

## global definitions
global:
  debug: yes
  verbose: no
  debugging:
    detailed: no
    header: "debugging started"

## output
output:
   file: "yes"

Which, when parsed using:

,当解析使用:

parse_yaml sample.yml

will output:

将输出:

global_debug="yes"
global_verbose="no"
global_debugging_detailed="no"
global_debugging_header="debugging started"
output_file="yes"

it also understands yaml files, generated by ruby which may include ruby symbols, like:

它还理解由ruby生成的yaml文件，其中可能包含ruby符号，比如:

---
:global:
  :debug: 'yes'
  :verbose: 'no'
  :debugging:
    :detailed: 'no'
    :header: debugging started
  :output: 'yes'

and will output the same as in the previous example.

输出结果与前面的示例相同。

typical use within a script is:

脚本内的典型用法是:

eval $(parse_yaml sample.yml)

parse_yaml accepts a prefix argument so that imported settings all have a common prefix (which will reduce the risk of namespace collisions).

parse_yaml接受前缀参数，以便导入的设置都具有公共前缀(这将减少名称空间冲突的风险)。

parse_yaml sample.yml "CONF_"

yields:

收益率:

CONF_global_debug="yes"
CONF_global_verbose="no"
CONF_global_debugging_detailed="no"
CONF_global_debugging_header="debugging started"
CONF_output_file="yes"

Note that previous settings in a file can be referred to by later settings:

请注意文件中先前的设置可以通过后面的设置来引用:

## global definitions
global:
  debug: yes
  verbose: no
  debugging:
    detailed: no
    header: "debugging started"

## output
output:
   debug: $global_debug

Another nice usage is to first parse a defaults file and then the user settings, which works since the latter settings overrides the first ones:

另一个不错的用法是首先解析默认文件，然后解析用户设置，因为后者的设置覆盖了第一个:

eval $(parse_yaml defaults.yml)
eval $(parse_yaml project.yml)

#3

I've written shyaml in python for YAML query needs from the shell command line.

我已经在python中编写了shyaml，以满足shell命令行对YAML查询的需求。

Overview:

概述:

$ pip install shyaml      ## installation

Example's YAML file (with complex features):

示例的YAML文件(具有复杂的特性):

$ cat <<EOF > test.yaml
name: "MyName !!"
subvalue:
    how-much: 1.1
    things:
        - first
        - second
        - third
    other-things: [a, b, c]
    maintainer: "Valentin Lab"
    description: |
        Multiline description:
        Line 1
        Line 2
EOF

Basic query:

基本的查询:

$ cat test.yaml | shyaml get-value subvalue.maintainer
Valentin Lab

More complex looping query on complex values:

关于复杂值的更复杂的循环查询:

$ cat test.yaml | shyaml values-0 | \
  while read -r -d $'\0' value; do
      echo "RECEIVED: '$value'"
  done
RECEIVED: '1.1'
RECEIVED: '- first
- second
- third'
RECEIVED: '2'
RECEIVED: 'Valentin Lab'
RECEIVED: 'Multiline description:
Line 1
Line 2'

A few key points:

几个要点:

all YAML types and syntax oddities are correctly handled, as multiline, quoted strings, inline sequences...
所有YAML类型和语法异常都被正确处理，比如多行、引用字符串、内联序列……
\0 padded output is available for solid multiline entry manipulation.
\0填充输出可用于可靠的多行输入操作。
simple dotted notation to select sub-values (ie: subvalue.maintainer is a valid key).
选择子值(即:子值)的简单点符号。维护者是一个有效的键)。
access by index is provided to sequences (ie: subvalue.things.-1 is the last element of the subvalue.things sequence.)
通过索引访问序列(即:subvalue.things)。-1是子值的最后一个元素。序列的事情。)
access to all sequence/structs elements in one go for use in bash loops.
在bash循环中一次访问所有序列/结构元素。
you can output whole subpart of a YAML file as ... YAML, which blend well for further manipulations with shyaml.
可以将YAML文件的整个子部分输出为……YAML可以很好地混合shyaml进行进一步操作。

More sample and documentation are available on the shyaml github page or the shyaml PyPI page.

可以在shyaml github页面或shyaml PyPI页面上获得更多的示例和文档。

#4

It's possible to pass a small script to some interpreters, like Python. An easy way to do so using Ruby and its YAML library is the following:

可以将一个小脚本传递给一些解释器，比如Python。使用Ruby及其YAML库实现这一点的一个简单方法是:

$ RUBY_SCRIPT="data = YAML::load(STDIN.read); puts data['a']; puts data['b']"
$ echo -e '---\na: 1234\nb: 4321' | ruby -ryaml -e "$RUBY_SCRIPT"
1234
4321

, wheredata is a hash (or array) with the values from yaml.

，其中数据是一个散列(或数组)，其值来自yaml。

As a bonus, it'll parse Jekyll's front matter just fine.

作为奖励，它将解析杰基尔的前沿问题。

ruby -ryaml -e "puts YAML::load(open(ARGV.first).read)['tags']" example.md

#5

Hard to say because it depends on what you want the parser to extract from your YAML document. For simple cases, you might be able to use grep, cut, awk etc. For more complex parsing you would need to use a full-blown parsing library such as Python's PyYAML or YAML::Perl.

很难说，因为这取决于您希望解析器从YAML文档中提取什么。对于简单的情况，您可能可以使用grep、cut、awk等。对于更复杂的解析，您可能需要使用成熟的解析库，如Python的PyYAML或YAML::Perl。

#6

I just wrote a parser that I called Yay! (Yaml ain't Yamlesque!) which parses Yamlesque, a small subset of YAML. So, if you're looking for a 100% compliant YAML parser for Bash then this isn't it. However, to quote the OP, if you want a structured configuration file which is as easy as possible for a non-technical user to edit that is YAML-like, this may be of interest.

我刚写了一个解析器，我叫它Yay!(Yaml不是Yamlesque!)它解析Yamlesque, Yaml的一个小子集。所以，如果你正在寻找一个100%兼容的YAML解析器，那么这不是它。但是，引用OP，如果您想要一个结构化的配置文件，它可以让非技术用户尽可能轻松地编辑类似于yaml的配置文件，这可能会很有趣。

It's inspred by the earlier answer but writes associative arrays (yes, it requires Bash 4.x) instead of basic variables. It does so in a way that allows the data to be parsed without prior knowledge of the keys so that data-driven code can be written.

它被前面的答案所检验，但是写入关联数组(是的，它需要Bash 4.x)而不是基本的变量。它以一种允许数据在不了解键的情况下进行解析的方式进行，这样就可以编写数据驱动的代码。

As well as the key/value array elements, each array has a keys array containing a list of key names, a children array containing names of child arrays and a parent key that refers to its parent.

除了键/值数组元素之外，每个数组都有一个键数组，其中包含一个键名列表，一个包含子数组名称的子数组和一个父键。

This is an example of Yamlesque:

这是Yamlesque的一个例子:

root_key1: this is value one
root_key2: "this is value two"

drink:
  state: liquid
  coffee:
    best_served: hot
    colour: brown
  orange_juice:
    best_served: cold
    colour: orange

food:
  state: solid
  apple_pie:
    best_served: warm

root_key_3: this is value three

Here is an example showing how to use it:

这里有一个如何使用它的例子:

#!/bin/bash
# An example showing how to use Yay

. /usr/lib/yay

# helper to get array value at key
value() { eval echo \${$1[$2]}; }

# print a data collection
print_collection() {
  for k in $(value $1 keys)
  do
    echo "$2$k = $(value $1 $k)"
  done

  for c in $(value $1 children)
  do
    echo -e "$2$c\n$2{"
    print_collection $c "  $2"
    echo "$2}"
  done
}

yay example
print_collection example

which outputs:

输出:

root_key1 = this is value one
root_key2 = this is value two
root_key_3 = this is value three
example_drink
{
  state = liquid
  example_coffee
  {
    best_served = hot
    colour = brown
  }
  example_orange_juice
  {
    best_served = cold
    colour = orange
  }
}
example_food
{
  state = solid
  example_apple_pie
  {
    best_served = warm
  }
}

And here is the parser:

这里是解析器:

yay_parse() {

   # find input file
   for f in "$1" "$1.yay" "$1.yml"
   do
     [[ -f "$f" ]] && input="$f" && break
   done
   [[ -z "$input" ]] && exit 1

   # use given dataset prefix or imply from file name
   [[ -n "$2" ]] && local prefix="$2" || {
     local prefix=$(basename "$input"); prefix=${prefix%.*}
   }

   echo "declare -g -A $prefix;"

   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
          -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |
   awk -F$fs '{
      indent       = length($1)/2;
      key          = $2;
      value        = $3;

      # No prefix or parent for the top level (indent zero)
      root_prefix  = "'$prefix'_";
      if (indent ==0 ) {
        prefix = "";          parent_key = "'$prefix'";
      } else {
        prefix = root_prefix; parent_key = keys[indent-1];
      }

      keys[indent] = key;

      # remove keys left behind if prior row was indented more than this row
      for (i in keys) {if (i > indent) {delete keys[i]}}

      if (length(value) > 0) {
         # value
         printf("%s%s[%s]=\"%s\";\n", prefix, parent_key , key, value);
         printf("%s%s[keys]+=\" %s\";\n", prefix, parent_key , key);
      } else {
         # collection
         printf("%s%s[children]+=\" %s%s\";\n", prefix, parent_key , root_prefix, key);
         printf("declare -g -A %s%s;\n", root_prefix, key);
         printf("%s%s[parent]=\"%s%s\";\n", root_prefix, key, prefix, parent_key);
      }
   }'
}

# helper to load yay data file
yay() { eval $(yay_parse "$@"); }

There is some documentation in the linked source file and below is a short explanation of what the code does.

在链接的源文件中有一些文档，下面是一个简短的代码说明。

The yay_parse function first locates the input file or exits with an exit status of 1. Next, it determines the dataset prefix, either explicitly specified or derived from the file name.

yay_parse函数首先定位输入文件或出口，出口状态为1。接下来，它确定数据集前缀，要么显式指定，要么从文件名派生。

It writes valid bash commands to its standard output that, if executed, define arrays representing the contents of the input data file. The first of these defines the top-level array:

它将有效的bash命令写入其标准输出，如果执行，则定义表示输入数据文件内容的数组。第一个定义了*数组:

echo "declare -g -A $prefix;"

Note that array declarations are associative (-A) which is a feature of Bash version 4. Declarations are also global (-g) so they can be executed in a function but be available to the global scope like the yay helper:

注意，数组声明是关联的(-A)，这是Bash版本4的一个特性。声明也是全局的(-g)，因此它们可以在函数中执行，但是可以在全局范围中使用，如yay助手:

yay() { eval $(yay_parse "$@"); }

The input data is initially processed with sed. It drops lines that don't match the Yamlesque format specification before delimiting the valid Yamlesque fields with an ASCII File Separator character and removing any double-quotes surrounding the value field.

输入数据最初是用sed处理的。它删除与Yamlesque格式规范不匹配的行，然后用ASCII文件分隔符分隔有效的Yamlesque字段，并删除值字段周围的任何双引号。

 local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
 sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |

The two expressions are similar; they differ only because the first one picks out quoted values where as the second one picks out unquoted ones.

这两个表达式是相似的;它们之所以不同，仅仅是因为第一个选择了引用值，而第二个选择了未引用值。

The File Separator (28/hex 12/octal 034) is used because, as a non-printable character, it is unlikely to be in the input data.

之所以使用文件分隔符(28/hex 12/octal 034)，是因为作为不可打印字符，它不太可能出现在输入数据中。

The result is piped into awk which processes its input one line at a time. It uses the FS character to assign each field to a variable:

结果通过管道传输到awk中，awk每次只处理输入一行。它使用FS字符将每个字段分配给一个变量:

indent       = length($1)/2;
key          = $2;
value        = $3;

All lines have an indent (possibly zero) and a key but they don't all have a value. It computes an indent level for the line dividing the length of the first field, which contains the leading whitespace, by two. The top level items without any indent are at indent level zero.

所有的行都有缩进(可能是零)和键，但它们都没有值。它为第一个字段(其中包含领先空格)的长度除以2的行计算缩进级别。没有缩进的顶层项目位于缩进零级。

Next, it works out what prefix to use for the current item. This is what gets added to a key name to make an array name. There's a root_prefix for the top-level array which is defined as the data set name and an underscore:

接下来，它计算出当前项目使用的前缀。这是添加到键名以创建数组名的内容。*数组有一个root_prefix，它定义为数据集名称和下划线:

root_prefix  = "'$prefix'_";
if (indent ==0 ) {
  prefix = "";          parent_key = "'$prefix'";
} else {
  prefix = root_prefix; parent_key = keys[indent-1];
}

The parent_key is the key at the indent level above the current line's indent level and represents the collection that the current line is part of. The collection's key/value pairs will be stored in an array with its name defined as the concatenation of the prefix and parent_key.

parent_key是位于当前行缩进级别之上的缩进级别上的密钥，表示当前行所属的集合。集合的键/值对将存储在一个数组中，数组的名称定义为前缀和parent_key的连接。

For the top level (indent level zero) the data set prefix is used as the parent key so it has no prefix (it's set to ""). All other arrays are prefixed with the root prefix.

对于顶层(缩进级0)，数据集前缀用作父键，因此没有前缀(它被设置为“”)。所有其他数组都以根前缀作为前缀。

Next, the current key is inserted into an (awk-internal) array containing the keys. This array persists throughout the whole awk session and therefore contains keys inserted by prior lines. The key is inserted into the array using its indent as the array index.

接下来，将当前键插入包含键的(awk-internal)数组中。该数组在整个awk会话中一直存在，因此包含由前几行插入的键。使用它的缩进作为数组索引插入到数组中。

keys[indent] = key;

Because this array contains keys from previous lines, any keys with an indent level grater than the current line's indent level are removed:

由于该数组包含来自前一行的键，因此删除任何具有比当前行的缩进级grater的键:

 for (i in keys) {if (i > indent) {delete keys[i]}}

This leaves the keys array containing the key-chain from the root at indent level 0 to the current line. It removes stale keys that remain when the prior line was indented deeper than the current line.

这将使包含键链的键数组从根的缩进级别0保留到当前行。它将删除前一行缩进比当前行更深时保留的陈旧键。

The final section outputs the bash commands: an input line without a value starts a new indent level (a collection in YAML parlance) and an input line with a value adds a key to the current collection.

最后一节输出bash命令:没有值的输入行启动一个新的缩进级别(用YAML表示为集合)，带有值的输入行向当前集合添加一个键。

The collection's name is the concatenation of the current line's prefix and parent_key.

集合的名称是当前行的前缀和parent_key的连接。

When a key has a value, a key with that value is assigned to the current collection like this:

当键具有值时，具有该值的键被分配给当前集合，如下所示:

printf("%s%s[%s]=\"%s\";\n", prefix, parent_key , key, value);
printf("%s%s[keys]+=\" %s\";\n", prefix, parent_key , key);

The first statement outputs the command to assign the value to an associative array element named after the key and the second one outputs the command to add the key to the collection's space-delimited keys list:

第一个语句输出将值赋值给以键命名的关联数组元素的命令，第二个语句输出将键添加到集合的空格分隔的键列表的命令:

<current_collection>[<key>]="<value>";
<current_collection>[keys]+=" <key>";

When a key doesn't have a value, a new collection is started like this:

当一个键没有值时，一个新的集合就是这样开始的:

printf("%s%s[children]+=\" %s%s\";\n", prefix, parent_key , root_prefix, key);
printf("declare -g -A %s%s;\n", root_prefix, key);

The first statement outputs the command to add the new collection to the current's collection's space-delimited children list and the second one outputs the command to declare a new associative array for the new collection:

第一个语句输出命令，将新集合添加到当前集合的空间分隔子列表中，第二个语句输出命令，为新集合声明一个新的关联数组:

<current_collection>[children]+=" <new_collection>"
declare -g -A <new_collection>;

All of the output from yay_parse can be parsed as bash commands by the bash eval or source built-in commands.

yay_parse的所有输出都可以通过bash eval或源内置命令解析为bash命令。

#7

perl -ne 'chomp; printf qq/%s="%s"\n/, split(/\s*:\s*/,$_,2)' file.yml > file.sh

#8

Another option is to convert the YAML to JSON, then use jq to interact with the JSON representation either to extract information from it or edit it.

另一个选项是将YAML转换为JSON，然后使用jq与JSON表示进行交互，或者从中提取信息，或者对其进行编辑。

I wrote a simple bash script that contains this glue - see Y2J project on GitHub

我编写了一个简单的bash脚本，其中包含这个胶水——请参阅GitHub上的Y2J项目

#9

I've found this Jshon tool to be the best one for this purpose, but in JSON world.

我发现这个Jshon工具是这个目的的最佳工具，但是在JSON世界中。

But I've found no traces on the internet of such tool for YAML. You will (at least for now) have to use Perl / Python / Ruby script to do that (as in previous answers).

但是我在互联网上找不到YAML这类工具的踪迹。您将(至少目前)必须使用Perl / Python / Ruby脚本来实现这一点(如前面的答案所示)。

#10

Given that Python3 and PyYAML are quite easy dependencies to meet nowadays, the following may help:

考虑到Python3和PyYAML现在是很容易满足的依赖项，下面的内容可能会有所帮助:

yaml() {
    python3 -c "import yaml;print(yaml.load(open('$1'))$2)"
}

VALUE=$(yaml ~/my_yaml_file.yaml "['a_key']")

#11

You can also consider using Grunt (The JavaScript Task Runner). Can be easily integrated with shell. It supports reading YAML (grunt.file.readYAML) and JSON (grunt.file.readJSON) files.

您还可以考虑使用Grunt (JavaScript任务运行器)。可以很容易地与外壳集成。它支持读取YAML (grunt.file.readYAML)和JSON (grunt.file.readJSON)文件。

This can be achieved by creating a task in Gruntfile.js (or Gruntfile.coffee), e.g.:

这可以通过在Gruntfile中创建任务来实现。js(或Gruntfile.coffee),例如:

module.exports = function (grunt) {

    grunt.registerTask('foo', ['load_yml']);

    grunt.registerTask('load_yml', function () {
        var data = grunt.file.readYAML('foo.yml');
        Object.keys(data).forEach(function (g) {
          // ... switch (g) { case 'my_key':
        });
    });

};

then from shell just simply run grunt foo (check grunt --help for available tasks).

然后从shell运行grunt foo(检查grunt——帮助完成可用任务)。

Further more you can implement exec:foo tasks (grunt-exec) with input variables passed from your task (foo: { cmd: 'echo bar <%= foo %>' }) in order to print the output in whatever format you want, then pipe it into another command.

此外，您还可以实现exec:foo tasks (grunt-exec)，输入变量从您的任务中传递(foo: {cmd: 'echo bar <%= foo %>'})，以便以任何您想要的格式打印输出，然后将其导入到另一个命令中。

There is also similar tool to Grunt, it's called gulp with additional plugin gulp-yaml.

还有一个类似的工具Grunt，它被称为gulp和附加插件gulp-yaml。

Install via: npm install --save-dev gulp-yaml

安装方式:npm安装——save-dev gulp-yaml

Sample usage:

示例用法:

var yaml = require('gulp-yaml');

gulp.src('./src/*.yml')
  .pipe(yaml())
  .pipe(gulp.dest('./dist/'))

gulp.src('./src/*.yml')
  .pipe(yaml({ space: 2 }))
  .pipe(gulp.dest('./dist/'))

gulp.src('./src/*.yml')
  .pipe(yaml({ safe: true }))
  .pipe(gulp.dest('./dist/'))

To more options to deal with YAML format, check YAML site for available projects, libraries and other resources which can help you to parse that format.

要处理YAML格式的更多选项，请查看YAML站点上的可用项目、库和其他资源，它们可以帮助您解析该格式。

Other tools:

其他工具:

Jshon

Jshon

parses, reads and creates JSON

解析、读取和创建JSON

#12

I know this is very specific, but I think my answer could be helpful for certain users.
If you have node and npm installed on your machine, you can use js-yaml.
First install :

我知道这是非常具体的，但我认为我的答案可能对某些用户有用。如果您的机器上安装了node和npm，您可以使用js-yaml。第一个安装:

npm i -g js-yaml
# or locally
npm i js-yaml

then in your bash script

然后在您的bash脚本中

#!/bin/bash
js-yaml your-yaml-file.yml

Also if you are using jq you can do something like that

如果你在使用jq，你也可以这样做

#!/bin/bash
json="$(js-yaml your-yaml-file.yml)"
aproperty="$(jq '.apropery' <<< "$json")"
echo "$aproperty"

Because js-yaml converts a yaml file to a json string literal. You can then use the string with any json parser in your unix system.

因为js-yaml将yaml文件转换为json字符串文本。然后可以在unix系统中的任何json解析器中使用该字符串。

#1