如何从python中的字符串中删除ANSI转义序列?

时间:2021-08-07 03:07:04

This is my string:

这是我的字符串:

'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'

I was using code to retrieve the output from a SSH command and I want my string to only contain 'examplefile.zip'

我使用代码从SSH命令检索输出,我希望我的字符串只包含'examplefile.zip'

What I can use to remove the extra escape sequences?

我可以用什么来移除额外的转义序列?

5 个解决方案

#1


65  

Delete them with a regular expression:

用正则表达式删除它们:

import re

ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
ansi_escape.sub('', sometext)

Demo:

演示:

>>> import re
>>> ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
>>> sometext = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
>>> ansi_escape.sub('', sometext)
'ls\r\nexamplefile.zip\r\n'

(I've tidied up the escape sequence expression to follow the Wikipedia overview of ANSI escape codes, focusing on the CSI sequences, and ignoring the C1 codes as they are never used in today's UTF-8 world).

(我整理了转义序列的表达式,以跟随*对ANSI escape代码的概述,关注CSI序列,忽略了在今天的UTF-8世界中从未使用过的C1代码)。

#2


33  

The accepted answer to this question only considers color and font effects. There are a lot of sequences that do not end in 'm', such as cursor positioning, erasing, and scroll regions.

这个问题的答案只考虑颜色和字体效果。有很多序列不以“m”结尾,比如光标定位、擦除和滚动区域。

The complete regexp for Control Sequences (aka ANSI Escape Sequences) is

控制序列的完整regexp(即ANSI转义序列)是。

/(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]/

Refer to ECMA-48 Section 5.4 and ANSI escape code

参考ECMA-48第5.4节和ANSI转义代码。

#3


14  

Function

Based on Martijn Pieters♦'s answer with Jeff's regexp.

基于卡坦彼得♦回答与杰夫的regexp。

def escape_ansi(line):
    ansi_escape = re.compile(r'(\x9B|\x1B\[)[0-?]*[ -/]*[@-~]')
    return ansi_escape.sub('', line)

Test

def test_remove_ansi_escape_sequence(self):
    line = '\t\u001b[0;35mBlabla\u001b[0m                                  \u001b[0;36m172.18.0.2\u001b[0m'

    escaped_line = escape_ansi(line)

    self.assertEqual(escaped_line, '\tBlabla                                  172.18.0.2')

Testing

If you want to run it by yourself, use python3 (better unicode support, blablabla). Here is how the test file should be:

如果您想自己运行它,请使用python3(更好的unicode支持,blablabla)。下面是测试文件应该如何:

import unittest
import re

def escape_ansi(line):
    …

class TestStringMethods(unittest.TestCase):
    def test_remove_ansi_escape_sequence(self):
    …

if __name__ == '__main__':
    unittest.main()

#4


3  

The suggested regex didn't do the trick for me so I created one of my own. The following is a python regex that I created based on the spec found here

建议的regex并没有为我做这个,所以我创建了自己的一个。下面是我根据在这里找到的规范创建的python regex。

ansi_regex = r'\x1b(' \
             r'(\[\??\d+[hl])|' \
             r'([=<>a-kzNM78])|' \
             r'([\(\)][a-b0-2])|' \
             r'(\[\d{0,2}[ma-dgkjqi])|' \
             r'(\[\d+;\d+[hfy]?)|' \
             r'(\[;?[hf])|' \
             r'(#[3-68])|' \
             r'([01356]n)|' \
             r'(O[mlnp-z]?)|' \
             r'(/Z)|' \
             r'(\d+)|' \
             r'(\[\?\d;\d0c)|' \
             r'(\d;\dR))'
ansi_escape = re.compile(ansi_regex, flags=re.IGNORECASE)

I tested my regex on the following snippet (basically a copy paste from the ascii-table.com page)

我在下面的代码片段中测试了我的正则表达式(基本上是来自ascii-table.com页面的复制粘贴)

\x1b[20h    Set
\x1b[?1h    Set
\x1b[?3h    Set
\x1b[?4h    Set
\x1b[?5h    Set
\x1b[?6h    Set
\x1b[?7h    Set
\x1b[?8h    Set
\x1b[?9h    Set
\x1b[20l    Set
\x1b[?1l    Set
\x1b[?2l    Set
\x1b[?3l    Set
\x1b[?4l    Set
\x1b[?5l    Set
\x1b[?6l    Set
\x1b[?7l    Reset
\x1b[?8l    Reset
\x1b[?9l    Reset
\x1b=   Set
\x1b>   Set
\x1b(A  Set
\x1b)A  Set
\x1b(B  Set
\x1b)B  Set
\x1b(0  Set
\x1b)0  Set
\x1b(1  Set
\x1b)1  Set
\x1b(2  Set
\x1b)2  Set
\x1bN   Set
\x1bO   Set
\x1b[m  Turn
\x1b[0m Turn
\x1b[1m Turn
\x1b[2m Turn
\x1b[4m Turn
\x1b[5m Turn
\x1b[7m Turn
\x1b[8m Turn
\x1b[1;2    Set
\x1b[1A Move
\x1b[2B Move
\x1b[3C Move
\x1b[4D Move
\x1b[H  Move
\x1b[;H Move
\x1b[4;3H   Move
\x1b[f  Move
\x1b[;f Move
\x1b[1;2    Move
\x1bD   Move/scroll
\x1bM   Move/scroll
\x1bE   Move
\x1b7   Save
\x1b8   Restore
\x1bH   Set
\x1b[g  Clear
\x1b[0g Clear
\x1b[3g Clear
\x1b#3  Double-height
\x1b#4  Double-height
\x1b#5  Single
\x1b#6  Double
\x1b[K  Clear
\x1b[0K Clear
\x1b[1K Clear
\x1b[2K Clear
\x1b[J  Clear
\x1b[0J Clear
\x1b[1J Clear
\x1b[2J Clear
\x1b5n  Device
\x1b0n  Response:
\x1b3n  Response:
\x1b6n  Get
\x1b[c  Identify
\x1b[0c Identify
\x1b[?1;20c Response:
\x1bc   Reset
\x1b#8  Screen
\x1b[2;1y   Confidence
\x1b[2;2y   Confidence
\x1b[2;9y   Repeat
\x1b[2;10y  Repeat
\x1b[0q Turn
\x1b[1q Turn
\x1b[2q Turn
\x1b[3q Turn
\x1b[4q Turn
\x1b<   Enter/exit
\x1b=   Enter
\x1b>   Exit
\x1bF   Use
\x1bG   Use
\x1bA   Move
\x1bB   Move
\x1bC   Move
\x1bD   Move
\x1bH   Move
\x1b12  Move
\x1bI  
\x1bK  
\x1bJ  
\x1bZ  
\x1b/Z 
\x1bOP 
\x1bOQ 
\x1bOR 
\x1bOS 
\x1bA  
\x1bB  
\x1bC  
\x1bD  
\x1bOp 
\x1bOq 
\x1bOr 
\x1bOs 
\x1bOt 
\x1bOu 
\x1bOv 
\x1bOw 
\x1bOx 
\x1bOy 
\x1bOm 
\x1bOl 
\x1bOn 
\x1bOM 
\x1b[i 
\x1b[1i
\x1b[4i
\x1b[5i

Hopefully this will help others :)

希望这能帮助别人:)

#5


-1  

if you want to remove the \r\n bit, you can pass the string through this function (written by sarnold):

如果您想删除\r\n位,您可以通过这个函数传递字符串(由sarnold编写):

def stripEscape(string):
    """ Removes all escape sequences from the input string """
    delete = ""
    i=1
    while (i<0x20):
        delete += chr(i)
        i += 1
    t = string.translate(None, delete)
    return t

Careful though, this will lump together the text in front and behind the escape sequences. So, using Martijn's filtered string 'ls\r\nexamplefile.zip\r\n', you will get lsexamplefile.zip. Note the ls in front of the desired filename.

但是要小心,这将把文本放在前面和后面的转义序列后面。所以,使用Martijn的过滤字符串'ls\r\nexamplefile。你会得到lsexamplefile.zip。注意在需要的文件名前面的ls。

I would use the stripEscape function first to remove the escape sequences, then pass the output to Martijn's regular expression, which would avoid concatenating the unwanted bit.

我将首先使用stripEscape函数来删除转义序列,然后将输出传递给Martijn的正则表达式,这样可以避免将多余的部分连接起来。

#1


65  

Delete them with a regular expression:

用正则表达式删除它们:

import re

ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
ansi_escape.sub('', sometext)

Demo:

演示:

>>> import re
>>> ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
>>> sometext = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
>>> ansi_escape.sub('', sometext)
'ls\r\nexamplefile.zip\r\n'

(I've tidied up the escape sequence expression to follow the Wikipedia overview of ANSI escape codes, focusing on the CSI sequences, and ignoring the C1 codes as they are never used in today's UTF-8 world).

(我整理了转义序列的表达式,以跟随*对ANSI escape代码的概述,关注CSI序列,忽略了在今天的UTF-8世界中从未使用过的C1代码)。

#2


33  

The accepted answer to this question only considers color and font effects. There are a lot of sequences that do not end in 'm', such as cursor positioning, erasing, and scroll regions.

这个问题的答案只考虑颜色和字体效果。有很多序列不以“m”结尾,比如光标定位、擦除和滚动区域。

The complete regexp for Control Sequences (aka ANSI Escape Sequences) is

控制序列的完整regexp(即ANSI转义序列)是。

/(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]/

Refer to ECMA-48 Section 5.4 and ANSI escape code

参考ECMA-48第5.4节和ANSI转义代码。

#3


14  

Function

Based on Martijn Pieters♦'s answer with Jeff's regexp.

基于卡坦彼得♦回答与杰夫的regexp。

def escape_ansi(line):
    ansi_escape = re.compile(r'(\x9B|\x1B\[)[0-?]*[ -/]*[@-~]')
    return ansi_escape.sub('', line)

Test

def test_remove_ansi_escape_sequence(self):
    line = '\t\u001b[0;35mBlabla\u001b[0m                                  \u001b[0;36m172.18.0.2\u001b[0m'

    escaped_line = escape_ansi(line)

    self.assertEqual(escaped_line, '\tBlabla                                  172.18.0.2')

Testing

If you want to run it by yourself, use python3 (better unicode support, blablabla). Here is how the test file should be:

如果您想自己运行它,请使用python3(更好的unicode支持,blablabla)。下面是测试文件应该如何:

import unittest
import re

def escape_ansi(line):
    …

class TestStringMethods(unittest.TestCase):
    def test_remove_ansi_escape_sequence(self):
    …

if __name__ == '__main__':
    unittest.main()

#4


3  

The suggested regex didn't do the trick for me so I created one of my own. The following is a python regex that I created based on the spec found here

建议的regex并没有为我做这个,所以我创建了自己的一个。下面是我根据在这里找到的规范创建的python regex。

ansi_regex = r'\x1b(' \
             r'(\[\??\d+[hl])|' \
             r'([=<>a-kzNM78])|' \
             r'([\(\)][a-b0-2])|' \
             r'(\[\d{0,2}[ma-dgkjqi])|' \
             r'(\[\d+;\d+[hfy]?)|' \
             r'(\[;?[hf])|' \
             r'(#[3-68])|' \
             r'([01356]n)|' \
             r'(O[mlnp-z]?)|' \
             r'(/Z)|' \
             r'(\d+)|' \
             r'(\[\?\d;\d0c)|' \
             r'(\d;\dR))'
ansi_escape = re.compile(ansi_regex, flags=re.IGNORECASE)

I tested my regex on the following snippet (basically a copy paste from the ascii-table.com page)

我在下面的代码片段中测试了我的正则表达式(基本上是来自ascii-table.com页面的复制粘贴)

\x1b[20h    Set
\x1b[?1h    Set
\x1b[?3h    Set
\x1b[?4h    Set
\x1b[?5h    Set
\x1b[?6h    Set
\x1b[?7h    Set
\x1b[?8h    Set
\x1b[?9h    Set
\x1b[20l    Set
\x1b[?1l    Set
\x1b[?2l    Set
\x1b[?3l    Set
\x1b[?4l    Set
\x1b[?5l    Set
\x1b[?6l    Set
\x1b[?7l    Reset
\x1b[?8l    Reset
\x1b[?9l    Reset
\x1b=   Set
\x1b>   Set
\x1b(A  Set
\x1b)A  Set
\x1b(B  Set
\x1b)B  Set
\x1b(0  Set
\x1b)0  Set
\x1b(1  Set
\x1b)1  Set
\x1b(2  Set
\x1b)2  Set
\x1bN   Set
\x1bO   Set
\x1b[m  Turn
\x1b[0m Turn
\x1b[1m Turn
\x1b[2m Turn
\x1b[4m Turn
\x1b[5m Turn
\x1b[7m Turn
\x1b[8m Turn
\x1b[1;2    Set
\x1b[1A Move
\x1b[2B Move
\x1b[3C Move
\x1b[4D Move
\x1b[H  Move
\x1b[;H Move
\x1b[4;3H   Move
\x1b[f  Move
\x1b[;f Move
\x1b[1;2    Move
\x1bD   Move/scroll
\x1bM   Move/scroll
\x1bE   Move
\x1b7   Save
\x1b8   Restore
\x1bH   Set
\x1b[g  Clear
\x1b[0g Clear
\x1b[3g Clear
\x1b#3  Double-height
\x1b#4  Double-height
\x1b#5  Single
\x1b#6  Double
\x1b[K  Clear
\x1b[0K Clear
\x1b[1K Clear
\x1b[2K Clear
\x1b[J  Clear
\x1b[0J Clear
\x1b[1J Clear
\x1b[2J Clear
\x1b5n  Device
\x1b0n  Response:
\x1b3n  Response:
\x1b6n  Get
\x1b[c  Identify
\x1b[0c Identify
\x1b[?1;20c Response:
\x1bc   Reset
\x1b#8  Screen
\x1b[2;1y   Confidence
\x1b[2;2y   Confidence
\x1b[2;9y   Repeat
\x1b[2;10y  Repeat
\x1b[0q Turn
\x1b[1q Turn
\x1b[2q Turn
\x1b[3q Turn
\x1b[4q Turn
\x1b<   Enter/exit
\x1b=   Enter
\x1b>   Exit
\x1bF   Use
\x1bG   Use
\x1bA   Move
\x1bB   Move
\x1bC   Move
\x1bD   Move
\x1bH   Move
\x1b12  Move
\x1bI  
\x1bK  
\x1bJ  
\x1bZ  
\x1b/Z 
\x1bOP 
\x1bOQ 
\x1bOR 
\x1bOS 
\x1bA  
\x1bB  
\x1bC  
\x1bD  
\x1bOp 
\x1bOq 
\x1bOr 
\x1bOs 
\x1bOt 
\x1bOu 
\x1bOv 
\x1bOw 
\x1bOx 
\x1bOy 
\x1bOm 
\x1bOl 
\x1bOn 
\x1bOM 
\x1b[i 
\x1b[1i
\x1b[4i
\x1b[5i

Hopefully this will help others :)

希望这能帮助别人:)

#5


-1  

if you want to remove the \r\n bit, you can pass the string through this function (written by sarnold):

如果您想删除\r\n位,您可以通过这个函数传递字符串(由sarnold编写):

def stripEscape(string):
    """ Removes all escape sequences from the input string """
    delete = ""
    i=1
    while (i<0x20):
        delete += chr(i)
        i += 1
    t = string.translate(None, delete)
    return t

Careful though, this will lump together the text in front and behind the escape sequences. So, using Martijn's filtered string 'ls\r\nexamplefile.zip\r\n', you will get lsexamplefile.zip. Note the ls in front of the desired filename.

但是要小心,这将把文本放在前面和后面的转义序列后面。所以,使用Martijn的过滤字符串'ls\r\nexamplefile。你会得到lsexamplefile.zip。注意在需要的文件名前面的ls。

I would use the stripEscape function first to remove the escape sequences, then pass the output to Martijn's regular expression, which would avoid concatenating the unwanted bit.

我将首先使用stripEscape函数来删除转义序列,然后将输出传递给Martijn的正则表达式,这样可以避免将多余的部分连接起来。