在awk中设置默认的数字格式

时间:2023-01-14 15:00:33

I wanted to do a simple parsing of two files with ids and some corresponding numerical values. I didn't want awk to print numbers in scientific notation.

我想对两个带有id和相应数值的文件进行简单的解析。我不想awk用科学的符号来打印数字。

File looks like this:

文件是这样的:

someid-1 860025 50.0401 4.00022
someid-2 384319 22.3614 1.78758
someid-3 52096 3.03118 0.242314
someid-4 43770 2.54674 0.203587
someid-5 33747 1.96355 0.156967
someid-6 20281 1.18004 0.0943328
someid-7 12231 0.711655 0.0568899
someid-8 10936 0.636306 0.0508665
someid-9 10224.8 0.594925 0.0475585
someid-10 10188.8 0.59283 0.047391

when use print instead of printf :

当使用打印而不是printf:

awk 'BEGIN{FS=OFS="\t"} NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); print $1,k[2],k[3],k[4],$2,$3,$4}' OSCAo.txt dme_miRNA_PIWI_OSC.txt | sort -n -r -k 7 | head

i get this result:

我得到这个结果:

dme-miR-iab-4-5p      0.333333    0.000016    0.000001  0.25    0.000605606 9.36543e-07
dme-miR-9c-5p   10987.300000      0.525413    0.048798  160.2   0.388072    0.000600137
dme-miR-9c-3p   731.986000    0.035003    0.003251  2.10714 0.00510439  7.89372e-06
dme-miR-9b-5p   30322.500000      1.450020    0.134670  595.067 1.4415  0.00222922
dme-miR-9b-3p   2628.280000   0.125684    0.011673  48  0.116276    0.000179816
dme-miR-9a-3p    10.365000    0.000496    0.000046  0.25    0.000605606 9.36543e-07
dme-miR-999-5p  103.433000    0.004946    0.000459  0.0769231   0.00018634  2.88167e-07
dme-miR-999-3p  1513.790000   0.072389    0.006723  28  0.0678278   0.000104893
dme-miR-998-5p  514.000000    0.024579    0.002283  73  0.176837    0.000273471
dme-miR-998-3p  3529.000000   0.168756    0.015673  42  0.101742    0.000157339

Notice the scientific notation in the last column

注意最后一列中的科学符号

I understand that printf with appropriate format modifier can do the job but the code becomes very lengthy. I have to write something like this:

我理解带有适当格式修饰符的printf可以完成这项工作,但是代码变得非常冗长。我必须这样写:

awk 'BEGIN{FS=OFS="\t"} NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); printf "%s\t%3.6f\t%3.6f\t%3.6f\t%3.6f\t%3.6f\t%3.6f\n", $1,k[2],k[3],k[4],$2,$3,$4}' file1.txt file2.txt > fileout.txt

This becomes clumsy when I have to parse fileout with another similarly structured file.

当我必须用另一个类似的结构化文件解析fileout时,这会变得很笨拙。

Is there any way to specify default numerical output, such that any string will be printed like a string but all numbers follow a particular format.

是否有方法指定默认的数值输出,这样任何字符串都会像字符串一样打印,但是所有的数字都遵循特定的格式。

2 个解决方案

#1


3  

I think You misinterpreted the meaning of %3.6f. The first number before the decimal point is the field width not the "number of digits before decimal point". (See prinft(3))

我认为你误解了%3.6f的意思。小数点前的第一个数字是字段宽度,而不是“小数点前的位数”。(见prinft(3))

So You should use %10.6f instead. It can be tested easily in bash

所以你应该用%10.6f代替。在bash中可以很容易地测试它

$ printf "%3.6f\n%3.6f\n%3.6f" 123.456 12.345 1.234
123.456000
12.345000
1.234000
$ printf "%10.6f\n%10.6f\n%10.6f" 123.456 12.345 1.234
123.456000
 12.345000
  1.234000

You can see that the later aligns to the decimal point properly.

您可以看到,后面的部分会正确地对齐到小数点。

As sidharth c nadhan mentioned You can use the OFMT awk internal variable (seem awk(1)). An example:

正如sidharth c nadhan所提到的,您可以使用OFMT awk内部变量(似乎awk(1))。一个例子:

$ awk 'BEGIN{print 123.456; print 12.345; print 1.234}'
123.456
12.345
1.234
$ awk -vOFMT=%10.6f 'BEGIN{print 123.456; print 12.345; print 1.234}'
123.456000
 12.345000
  1.234000

As I see in You example the number with maximum digits can be 123456.1234567, so the format %15.7f to cover all and show a nice looking table.

正如我在您的示例中看到的,最大数字的数字可以是123456.1234567,因此格式为%15.7f,可以覆盖所有数字并显示一个漂亮的表。

But unfortunately it will not work if the number has no decimal point in it or even if it does, but it ends with .0.

但不幸的是,如果数字中没有小数点,它就不能工作,即使它有,但它以。0结束。

$ awk -vOFMT=%15.7f 'BEGIN{print 123.456;print 123;print 123.0;print 0.0+123.0}'
    123.4560000
123
123
123

I even tried gawk's strtonum() function, but the integers are considered as non-OFMT strings. See

我甚至尝试了gawk的strtonum()函数,但整数被认为是非ofmt字符串。看到

awk -vOFMT=%15.7f -vCONVFMT=%15.7f 'BEGIN{print 123.456; print strtonum(123); print strtonum(123.0)}'

It has the same output as before.

它的输出和以前一样。

So I think, you have to use printf anyway. The script can be a little bit shorter and a bit more configurable:

所以我认为,你必须使用printf。脚本可以更短一些,更可配置:

awk -vf='\t'%15.7f 'NR==FNR{x[$1]=sprintf("%s"f f f,$1,$2,$3,$4);next}$1 in x{printf("%s"f f f"\n",x[$1],$2,$3,$4)}' file1.txt file2.txt

The script will not work properly if there are duplicated IDs in the first file. If it does not happen then the two conditions can be changed and the ;next can be left off.

如果第一个文件中有重复的id,脚本将不能正常工作。如果它没有发生,那么这两个条件可以改变,下一个条件可以省略。

#2


0  

awk 'NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); printf "%s\t%9s\t%9s\t%9s\t%9s\t%9s\t%9s\n", $1,k[2],k[3],k[4],$2,$3,$4}' file1.txt file2.txt > fileout.txt

#1


3  

I think You misinterpreted the meaning of %3.6f. The first number before the decimal point is the field width not the "number of digits before decimal point". (See prinft(3))

我认为你误解了%3.6f的意思。小数点前的第一个数字是字段宽度,而不是“小数点前的位数”。(见prinft(3))

So You should use %10.6f instead. It can be tested easily in bash

所以你应该用%10.6f代替。在bash中可以很容易地测试它

$ printf "%3.6f\n%3.6f\n%3.6f" 123.456 12.345 1.234
123.456000
12.345000
1.234000
$ printf "%10.6f\n%10.6f\n%10.6f" 123.456 12.345 1.234
123.456000
 12.345000
  1.234000

You can see that the later aligns to the decimal point properly.

您可以看到,后面的部分会正确地对齐到小数点。

As sidharth c nadhan mentioned You can use the OFMT awk internal variable (seem awk(1)). An example:

正如sidharth c nadhan所提到的,您可以使用OFMT awk内部变量(似乎awk(1))。一个例子:

$ awk 'BEGIN{print 123.456; print 12.345; print 1.234}'
123.456
12.345
1.234
$ awk -vOFMT=%10.6f 'BEGIN{print 123.456; print 12.345; print 1.234}'
123.456000
 12.345000
  1.234000

As I see in You example the number with maximum digits can be 123456.1234567, so the format %15.7f to cover all and show a nice looking table.

正如我在您的示例中看到的,最大数字的数字可以是123456.1234567,因此格式为%15.7f,可以覆盖所有数字并显示一个漂亮的表。

But unfortunately it will not work if the number has no decimal point in it or even if it does, but it ends with .0.

但不幸的是,如果数字中没有小数点,它就不能工作,即使它有,但它以。0结束。

$ awk -vOFMT=%15.7f 'BEGIN{print 123.456;print 123;print 123.0;print 0.0+123.0}'
    123.4560000
123
123
123

I even tried gawk's strtonum() function, but the integers are considered as non-OFMT strings. See

我甚至尝试了gawk的strtonum()函数,但整数被认为是非ofmt字符串。看到

awk -vOFMT=%15.7f -vCONVFMT=%15.7f 'BEGIN{print 123.456; print strtonum(123); print strtonum(123.0)}'

It has the same output as before.

它的输出和以前一样。

So I think, you have to use printf anyway. The script can be a little bit shorter and a bit more configurable:

所以我认为,你必须使用printf。脚本可以更短一些,更可配置:

awk -vf='\t'%15.7f 'NR==FNR{x[$1]=sprintf("%s"f f f,$1,$2,$3,$4);next}$1 in x{printf("%s"f f f"\n",x[$1],$2,$3,$4)}' file1.txt file2.txt

The script will not work properly if there are duplicated IDs in the first file. If it does not happen then the two conditions can be changed and the ;next can be left off.

如果第一个文件中有重复的id,脚本将不能正常工作。如果它没有发生,那么这两个条件可以改变,下一个条件可以省略。

#2


0  

awk 'NR==FNR{x[$1]=$0;next} ($1 in x){split(x[$1],k,FS); printf "%s\t%9s\t%9s\t%9s\t%9s\t%9s\t%9s\n", $1,k[2],k[3],k[4],$2,$3,$4}' file1.txt file2.txt > fileout.txt