如何(un)转义C / C ++中的字符串?

时间:2022-01-03 22:27:10

Given a counted string (either an array of characters, or a wrapper like std::string), is there a "proper" way to escape and/or unescape it in C or C++, such that "special" characters (like the null character) become C-style-escaped and "normal" characters stay the way they are?

给定一个计数字符串(一个字符数组,或者像std :: string这样的包装器),是否存在一种“正确”的方法来转义和/或在C或C ++中取消它,这样“特殊”字符(如null)角色)成为C风格的逃脱和“正常”的角色保持原样?

Or do I have to do it by hand?

或者我必须手动完成?

2 个解决方案

#1


10  

This is a function to process a single character:

这是一个处理单个字符的函数:

/*
** Does not generate hex character constants.
** Always generates triple-digit octal constants.
** Always generates escapes in preference to octal.
** Escape question mark to ensure no trigraphs are generated by repetitive use.
** Handling of 0x80..0xFF is locale-dependent (might be octal, might be literal).
*/

void chr_cstrlit(unsigned char u, char *buffer, size_t buflen)
{
    if (buflen < 2)
        *buffer = '\0';
    else if (isprint(u) && u != '\'' && u != '\"' && u != '\\' && u != '\?')
        sprintf(buffer, "%c", u);
    else if (buflen < 3)
        *buffer = '\0';
    else
    {
        switch (u)
        {
        case '\a':  strcpy(buffer, "\\a"); break;
        case '\b':  strcpy(buffer, "\\b"); break;
        case '\f':  strcpy(buffer, "\\f"); break;
        case '\n':  strcpy(buffer, "\\n"); break;
        case '\r':  strcpy(buffer, "\\r"); break;
        case '\t':  strcpy(buffer, "\\t"); break;
        case '\v':  strcpy(buffer, "\\v"); break;
        case '\\':  strcpy(buffer, "\\\\"); break;
        case '\'':  strcpy(buffer, "\\'"); break;
        case '\"':  strcpy(buffer, "\\\""); break;
        case '\?':  strcpy(buffer, "\\\?"); break;
        default:
            if (buflen < 5)
                *buffer = '\0';
            else
                sprintf(buffer, "\\%03o", u);
            break;
        }
    }
}

And this is the code to handle a null-terminated string (using the function above):

这是处理以null结尾的字符串的代码(使用上面的函数):

void str_cstrlit(const char *str, char *buffer, size_t buflen)
{
    unsigned char u;
    size_t len;

    while ((u = (unsigned char)*str++) != '\0')
    {
        chr_cstrlit(u, buffer, buflen);
        if ((len = strlen(buffer)) == 0)
            return;
        buffer += len;
        buflen -= len;
    }
}

#2


0  

Rather than allocating a new buffer to contain the escaped string I like to escape my string while I write it to a stream.

而不是分配一个新的缓冲区来包含转义的字符串,我喜欢在将它写入流时转义我的字符串。

The following function makes for readable and concise code.

以下函数可实现可读且简洁的代码。

struct Escaped
{
    const char* str;

    friend inline std::ostream& operator<<(std::ostream& os, const Escaped& e)
    {
        for (const char* char_p = e.str; *char_p != '\0'; char_p++)
        {
            switch (*char_p)
            {
                case '\a':  os << "\\a"; break;
                case '\b':  os << "\\b"; break;
                case '\f':  os << "\\f"; break;
                case '\n':  os << "\\n"; break;
                case '\r':  os << "\\r"; break;
                case '\t':  os << "\\t"; break;
                case '\v':  os << "\\v"; break;
                case '\\':  os << "\\\\"; break;
                case '\'':  os << "\\'"; break;
                case '\"':  os << "\\\""; break;
                case '\?':  os << "\\\?"; break;
                default: os << *char_p;
            }
        }
        return os;
    }
};

int main()
{
    std::cout << Escaped{ "foo\n\tbar" } << std::endl;
}

Produces

foo\n   bar

#1


10  

This is a function to process a single character:

这是一个处理单个字符的函数:

/*
** Does not generate hex character constants.
** Always generates triple-digit octal constants.
** Always generates escapes in preference to octal.
** Escape question mark to ensure no trigraphs are generated by repetitive use.
** Handling of 0x80..0xFF is locale-dependent (might be octal, might be literal).
*/

void chr_cstrlit(unsigned char u, char *buffer, size_t buflen)
{
    if (buflen < 2)
        *buffer = '\0';
    else if (isprint(u) && u != '\'' && u != '\"' && u != '\\' && u != '\?')
        sprintf(buffer, "%c", u);
    else if (buflen < 3)
        *buffer = '\0';
    else
    {
        switch (u)
        {
        case '\a':  strcpy(buffer, "\\a"); break;
        case '\b':  strcpy(buffer, "\\b"); break;
        case '\f':  strcpy(buffer, "\\f"); break;
        case '\n':  strcpy(buffer, "\\n"); break;
        case '\r':  strcpy(buffer, "\\r"); break;
        case '\t':  strcpy(buffer, "\\t"); break;
        case '\v':  strcpy(buffer, "\\v"); break;
        case '\\':  strcpy(buffer, "\\\\"); break;
        case '\'':  strcpy(buffer, "\\'"); break;
        case '\"':  strcpy(buffer, "\\\""); break;
        case '\?':  strcpy(buffer, "\\\?"); break;
        default:
            if (buflen < 5)
                *buffer = '\0';
            else
                sprintf(buffer, "\\%03o", u);
            break;
        }
    }
}

And this is the code to handle a null-terminated string (using the function above):

这是处理以null结尾的字符串的代码(使用上面的函数):

void str_cstrlit(const char *str, char *buffer, size_t buflen)
{
    unsigned char u;
    size_t len;

    while ((u = (unsigned char)*str++) != '\0')
    {
        chr_cstrlit(u, buffer, buflen);
        if ((len = strlen(buffer)) == 0)
            return;
        buffer += len;
        buflen -= len;
    }
}

#2


0  

Rather than allocating a new buffer to contain the escaped string I like to escape my string while I write it to a stream.

而不是分配一个新的缓冲区来包含转义的字符串,我喜欢在将它写入流时转义我的字符串。

The following function makes for readable and concise code.

以下函数可实现可读且简洁的代码。

struct Escaped
{
    const char* str;

    friend inline std::ostream& operator<<(std::ostream& os, const Escaped& e)
    {
        for (const char* char_p = e.str; *char_p != '\0'; char_p++)
        {
            switch (*char_p)
            {
                case '\a':  os << "\\a"; break;
                case '\b':  os << "\\b"; break;
                case '\f':  os << "\\f"; break;
                case '\n':  os << "\\n"; break;
                case '\r':  os << "\\r"; break;
                case '\t':  os << "\\t"; break;
                case '\v':  os << "\\v"; break;
                case '\\':  os << "\\\\"; break;
                case '\'':  os << "\\'"; break;
                case '\"':  os << "\\\""; break;
                case '\?':  os << "\\\?"; break;
                default: os << *char_p;
            }
        }
        return os;
    }
};

int main()
{
    std::cout << Escaped{ "foo\n\tbar" } << std::endl;
}

Produces

foo\n   bar