iconv任何编码到UTF-8。

I am trying to point iconv to a directory and all files will be converted UTF-8 regardless of the current encoding

我试图将iconv指向一个目录，所有文件都将被转换为UTF-8，而不考虑当前的编码。

I am using this script but you have to specify what encoding you are going FROM. How can I make it autdetect the current encoding?

我正在使用这个脚本，但是您必须指定您要从哪个编码。如何使它自动检测当前编码?

dir_iconv.sh

#!/bin/bash

ICONVBIN='/usr/bin/iconv' # path to iconv binary

if [ $# -lt 3 ]
then
    echo "$0 dir from_charset to_charset"
    exit
fi

for f in $1/*
do
    if test -f $f
    then
        echo -e "\nConverting $f"
        /bin/mv $f $f.old
        $ICONVBIN -f $2 -t $3 $f.old > $f
    else
        echo -e "\nSkipping $f - not a regular file";
    fi
done

terminal line

端子线

sudo convert/dir_iconv.sh convert/books CURRENT_ENCODING utf8

6 个解决方案

#1

#2

You can get what you need using standard gnu utils file and awk. Example:

您可以使用标准的gnu utils文件和awk获得所需的内容。例子:

file -bi .xsession-errors gives me: "text/plain; charset=us-ascii"

文件-bi .xsession-errors给我:“text/plain;charset = us - ascii”

so file -bi .xsession-errors |awk -F "=" '{print $2}' gives me "us-ascii"

因此，file -bi .xsession-error， |awk -F "=" '{print $2}'给我"us-ascii"

I use it in scripts like so:

我在这样的脚本中使用它:

CHARSET="$(file -bi "$i"|awk -F "=" '{print $2}')"

if [ "$CHARSET" != utf-8 ]; then

        iconv -f "$CHARSET" -t utf8 "$i" -o outfile

fi

#3

Compiling all them. Go to dir, create dir2utf8.sh :

编译所有。到dir，创建dir2utf8。承宪:

#!/bin/bash
# converting all files in a dir to utf8 

for f in *
do
    if test -f $f then
        echo -e "\nConverting $f"
        CHARSET="$( file -bi "$f"|awk -F "=" '{print $2}')"
        if [ "$CHARSET" != utf-8 ]; then
                iconv -f "$CHARSET" -t utf8 "$f" -o "$f"
        fi
    else
        echo -e "\nSkipping $f - it's a regular file";
    fi
done

#4

Here is my solution to inplace all files:

下面是我的解决方案:

#!/bin/bash

apt-get -y install recode uchardet > /dev/null
find "$1" -type f | while read FFN # 'dir' should be changed...
do
    encoding=$(uchardet "$FFN")
    echo "$FFN: $encoding"
    enc=`echo $encoding | sed 's#^x-mac-#mac#'`
    set +x
    recode $enc..UTF-8 "$FFN"
done

https://gist.github.com/demofly/25f856a96c29b89baa32

put it into convert-dir-to-utf8.sh and run:

把它放到convert-dir-to-utf8。sh并运行:

bash convert-dir-to-utf8.sh /pat/to/my/trash/dir

Note that sed is a workaround for mac encodings here. Many uncommon encodings need workarounds like this.

注意，sed是一个用于mac编码的工作区。许多不常见的编码需要这样的变通方法。

#5

Check out tools available for a data convertation in a linux cli: https://www.debian.org/doc/manuals/debian-reference/ch11.en.html

查看在linux cli中可用于数据转换的工具:https://www.debian.org/doc/manuals/debian-reference/ch11.en.html。

Also, there is a quest to find out a full list of encodings which are available in iconv. Just run iconv --list and find out that encoding names differs from names returned by uchardet tool (for example: x-mac-cyrillic in uchardet vs. mac-cyrillic in iconv)

此外，还有一项任务是找出在iconv中可用的编码的完整列表。运行iconv——列出并发现编码名称不同于uchardet工具返回的名称(例如:uchardet中的x-mac-cyrillic和iconv中的macl -cyrillic)

#6

enca command doesn't work for my Simplified-Chinese text file with GB2312 encoding.

enca命令对我的简体中文文本文件不适用GB2312编码。

Instead, I use the following function to convert the text file for me. You could of course re-direct the output into a file.

相反，我使用以下函数来转换文本文件。当然，您可以将输出重新引导到一个文件中。

It requires chardet and iconv commands.

它需要chardet和iconv命令。

detection_cat () 
{
    DET_OUT=$(chardet $1);
    ENC=$(echo $DET_OUT | sed "s|^.*: \(.*\) (confid.*$|\1|");
    iconv -f $ENC $1
}

#1

#2