为什么wc报告的文件中的行数与awk读取的记录数不同?

时间:2022-06-26 16:07:17

When I count the number of lines in a file using awk:

当我使用awk计算文件中的行数时:

cat ~/.account | wc -l

... the result is:

......结果是:

384

But when I use awk:

但是当我使用awk时:

awk 'BEGIN {x = "1.02"; y = 0; } {x = x*2; y = y + 1} END {print x; print y}' ~/.account

... the result is:

......结果是:

8.03800926406447389928897056654e+115

385

Why is this?

为什么是这样?

1 个解决方案

#1


2  

What wc -l is doing

From man wc:

来自man wc:

-l, --lines

-l, - lines

print the newline counts

打印换行计数

Using wc -l counts the number of newline characters and awk separates the input into records separated by newline characters.

使用wc -l计算换行符的数量,awk将输入分隔为由换行符分隔的记录。

Consider this example:

考虑这个例子:

$ echo 1 | wc -l
1
$ echo -n 1 | wc -l
0

The input for the first command (echo 1 ) is the string "1\n". Using -n with echo echos the 1 without a newline at the end, which makes the input just the string "1". The wc -l counts the newline characters in the input. In the first case, there is one newline and in the second there are none.

第一个命令(echo 1)的输入是字符串“1 \ n”。使用-n with echo echos结尾没有换行符,这使得输入只是字符串“1”。 wc -l计算输入中的换行符。在第一种情况下,有一个换行符,而在第二种情况下没有换行符。

What AWK is doing

AWK divides its input into records, and each record into fields. This is an important part of the parsing magic that AWK does for us.

AWK将其输入划分为记录,并将每个记录划分为字段。这是AWK为我们解析魔法的重要部分。

From The GNU AWK User's Guide (but referring to standard AWK):

从GNU AWK用户指南(但参考标准AWK):

Records are separated by a character called the record separator. By default, the record separator is the newline character. This is why records are, by default, single lines.

记录由称为记录分隔符的字符分隔。默认情况下,记录分隔符是换行符。这就是为什么记录默认为单行。

But if the input ends with this separator, see what happens:

但是如果输入以此分隔符结束,请查看会发生什么:

$ echo 1 | awk 'END{print NR}'
1
$ echo -n 1 | awk 'END{print NR}'
1

(NR is a special variable for "the total number of input records read so far from all data files.")

(NR是“到目前为止从所有数据文件读取的输入记录总数”的特殊变量。)

There is only one record in each case, even the first ("1\n") that contains a newline character. Since there is nothing after the separator, it separates nothing. In other words, it does not give an empty record at the end if the input ends with the separator.

每种情况下只有一条记录,甚至包含换行符的第一条记录(“1 \ n”)。由于分隔符之后没有任何内容,因此它不会分离。换句话说,如果输入以分隔符结束,则它不会在结尾处给出空记录。

If your input file does not end in a newline character, wc -l will report one less than awk's number of records (NR).

如果输入文件没有以换行符结尾,则wc -l将报告一个小于awk的记录数(NR)。

#1


2  

What wc -l is doing

From man wc:

来自man wc:

-l, --lines

-l, - lines

print the newline counts

打印换行计数

Using wc -l counts the number of newline characters and awk separates the input into records separated by newline characters.

使用wc -l计算换行符的数量,awk将输入分隔为由换行符分隔的记录。

Consider this example:

考虑这个例子:

$ echo 1 | wc -l
1
$ echo -n 1 | wc -l
0

The input for the first command (echo 1 ) is the string "1\n". Using -n with echo echos the 1 without a newline at the end, which makes the input just the string "1". The wc -l counts the newline characters in the input. In the first case, there is one newline and in the second there are none.

第一个命令(echo 1)的输入是字符串“1 \ n”。使用-n with echo echos结尾没有换行符,这使得输入只是字符串“1”。 wc -l计算输入中的换行符。在第一种情况下,有一个换行符,而在第二种情况下没有换行符。

What AWK is doing

AWK divides its input into records, and each record into fields. This is an important part of the parsing magic that AWK does for us.

AWK将其输入划分为记录,并将每个记录划分为字段。这是AWK为我们解析魔法的重要部分。

From The GNU AWK User's Guide (but referring to standard AWK):

从GNU AWK用户指南(但参考标准AWK):

Records are separated by a character called the record separator. By default, the record separator is the newline character. This is why records are, by default, single lines.

记录由称为记录分隔符的字符分隔。默认情况下,记录分隔符是换行符。这就是为什么记录默认为单行。

But if the input ends with this separator, see what happens:

但是如果输入以此分隔符结束,请查看会发生什么:

$ echo 1 | awk 'END{print NR}'
1
$ echo -n 1 | awk 'END{print NR}'
1

(NR is a special variable for "the total number of input records read so far from all data files.")

(NR是“到目前为止从所有数据文件读取的输入记录总数”的特殊变量。)

There is only one record in each case, even the first ("1\n") that contains a newline character. Since there is nothing after the separator, it separates nothing. In other words, it does not give an empty record at the end if the input ends with the separator.

每种情况下只有一条记录,甚至包含换行符的第一条记录(“1 \ n”)。由于分隔符之后没有任何内容,因此它不会分离。换句话说,如果输入以分隔符结束,则它不会在结尾处给出空记录。

If your input file does not end in a newline character, wc -l will report one less than awk's number of records (NR).

如果输入文件没有以换行符结尾,则wc -l将报告一个小于awk的记录数(NR)。