使用移动的std :: string的.data()成员对小字符串不起作用?

时间:2022-04-27 13:17:42

Why does the following program print garbage instead of hello? Interestingly, if I replace hello with hello how are you, then it prints hello how are you.

为什么以下程序打印垃圾而不是打招呼?有趣的是,如果我用hello替换hello你好吗,那么打印你好你好吗?

#include <string>
#include <iostream>

class Buffer
{
public:
    Buffer(std::string s):
        _raw(const_cast<char*>(s.data())),
        _buffer(std::move(s))
    {    
    }

    void Print()
    {
        std::cout << _raw;
    }

private:
    char* _raw;
    std::string _buffer;
};    

int main()
{   
    Buffer b("hello");
    b.Print();
}

3 个解决方案

#1


11  

From your question, you imply a class invariant of Buffer. A class invariant is a relationship between the data members of a class that is assumed to always be true. In your case the implied invariant is:

根据您的问题,您暗示了Buffer的类不变量。类不变量是假定始终为true的类的数据成员之间的关系。在您的情况下,隐含的不变量是:

assert(_raw == _buffer.data());

Joachim Pileborg correctly describes why this invariant is not maintained in your Buffer(std::string s) constructor (upvoted).

Joachim Pileborg正确地描述了为什么不在缓冲区(std :: string s)构造函数(upvoted)中维护这个不变量。

It turns out that maintaining this invariant is surprisingly tricky. Therefore my very first recommendation is that you redesign Buffer such that this invariant is no longer needed. The simplest way to do that is to compute _raw on the fly whenever you need it, instead of storing it. For example:

事实证明,保持这种不变性是非常棘手的。因此,我的第一个建议是重新设计Buffer,以便不再需要这个不变量。最简单的方法是在需要时动态计算_raw,而不是存储它。例如:

void Print()
{
    std::cout << _buffer.data();
}

That being said, if you really need to store _raw and maintain this invariant:

话虽这么说,如果你真的需要存储_raw并保持这个不变量:

assert(_raw == _buffer.data());

The following is the path you will need to go down...

以下是您需要走的路径......

Buffer(std::string s)
    : _buffer(std::move(s))
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Reorder your initialization such that you first construct _buffer by moving into it, and then point into _buffer. Do not point into the local s which will be destructed as soon as this constructor completes.

重新排序初始化,以便首先通过移入它来构造_buffer,然后指向_buffer。不要指向本构造函数完成后将被破坏的本地s。

A very subtle point here is that despite the fact that I have reordered the initialization list in the constructor, I have not yet actually reordered the actual construction. To do that, I must reorder the list of data member declarations:

这里非常微妙的一点是,尽管我已经在构造函数中重新排序了初始化列表,但我还没有对实际构造进行重新排序。为此,我必须重新排序数据成员声明列表:

private:
    std::string _buffer;
    char* _raw;

It is this order, and not the order of the initialization list in the constructor that determines which member is constructed first. Some compilers with some warnings enabled will warn you if you attempt to order your constructor initialization list differently than the order the members will actually be constructed.

它是这个顺序,而不是构造函数中初始化列表的顺序,它确定首先构造哪个成员。某些启用了某些警告的编译器会在您尝试按照实际构造成员的顺序不同的方式对构造函数初始化列表进行排序时发出警告。

Now your program will run as expected, for any string input. However we are just getting started. Buffer is still buggy as your invariant is still not maintained. The best way to demonstrate this is to assert your invariant in ~Buffer():

现在,对于任何字符串输入,您的程序将按预期运行。但是我们刚刚开始。缓冲区仍然存在错误,因为仍然没有维护您的不变量。证明这一点的最好方法是在~Muffer()中断言你的不变量:

~Buffer()
{
    assert(_raw == _buffer.data());
}

As it stands (and without the user-declared ~Buffer() I just recommended), the compiler helpfully supplies you with four more signatures:

因为它(并且没有用户声明的~Muffer()我刚推荐),编译器有助于为您提供另外四个签名:

Buffer(const Buffer&) = default;
Buffer& operator=(const Buffer&) = default;
Buffer(Buffer&&) = default;
Buffer& operator=(Buffer&&) = default;

And the compiler breaks your invariant for every one of these signatures. If you add ~Buffer() as I suggested, the compiler will not supply the move members, but it will still supply the copy members, and still get them wrong (though that behavior has been deprecated). And even if the destructor did inhibit the copy members (as it might in a future standard), the code is still dangerous as under maintenance someone might "optimize" your code like so:

编译器会破坏每个签名的不变量。如果你按照我的建议添加~Buffer(),编译器将不会提供移动成员,但它仍然会提供复制成员,但仍然会错误(尽管这种行为已被弃用)。即使析构函数确实禁止了复制成员(因为它可能在未来的标准中),代码仍然很危险,因为维护人员可能会“优化”您的代码,如下所示:

#ifndef NDEBUG
    ~Buffer()
    {
        assert(_raw == _buffer.data());
    }
#endif

in which case the compiler would supply the buggy copy and move members in release mode.

在这种情况下,编译器将提供错误副本并在发布模式下移动成员。

To fix the code you must re-establish your class invariant every time _buffer is constructed, or outstanding pointers into it might be invalidated. For example:

要修复代码,每次构造_buffer时都必须重新建立类不变量,否则对它的未完成指针可能会失效。例如:

Buffer(const Buffer& b)
    : _buffer(b._buffer)
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Buffer& operator=(const Buffer& b)
{
    if (this != &b)
    {
        _buffer = b._buffer;
        _raw = const_cast<char*>(_buffer.data());
    }
    return *this;
}

If you add any members in the future which have the potential for invalidating _buffer.data(), you must remember to reset _raw. For example a set_string(std::string) member function would need this treatment.

如果您将来添加任何可能使_buffer.data()无效的成员,您必须记住重置_raw。例如,set_string(std :: string)成员函数需要这种处理。

Though you did not directly ask, your question alludes to a very important point in class design: Be aware of your class invariants, and what it takes to maintain them. Corollary: Minimize the number of invariants you have to manually maintain. And test that your invariants actually are maintained.

虽然你没有直接提问,但你的问题提到了课堂设计中一个非常重要的问题:要注意你的班级不变量,以及维护它们需要做些什么。推论:最小化您必须手动维护的不变量的数量。并测试您的不变量实际上是否得到维护。

#2


7  

The constructor takes its argument by value, and when the constructor returns that argument goes out of scope and the object s is destructed.

构造函数按值获取其参数,并且当构造函数返回时,该参数超出范围并且对象s被破坏。

But you save a pointer to the data of that object, and once the object is destructed that pointer is no longer valid, leaving you with a stray pointer and undefined behavior when you dereference the pointer.

但是你保存了一个指向该对象数据的指针,一旦该对象被破坏,指针不再有效,当你取消引用指针时,你会留下一个迷路指针和未定义的行为。

#3


1  

Buffer b("hello");

This is creating temporary string to pass to constructor. When that string goes out of scope at the end of your constructor, you are left with dangling _raw.

这是创建临时字符串以传递给构造函数。当该字符串超出构造函数末尾的范围时,您将留下悬空_raw。

That means an undefined behavior as when you call Print _raw is pointing to de-allocated memory.

这意味着一个未定义的行为,就像你调用Print _raw指向解除分配的内存。

#1


11  

From your question, you imply a class invariant of Buffer. A class invariant is a relationship between the data members of a class that is assumed to always be true. In your case the implied invariant is:

根据您的问题,您暗示了Buffer的类不变量。类不变量是假定始终为true的类的数据成员之间的关系。在您的情况下,隐含的不变量是:

assert(_raw == _buffer.data());

Joachim Pileborg correctly describes why this invariant is not maintained in your Buffer(std::string s) constructor (upvoted).

Joachim Pileborg正确地描述了为什么不在缓冲区(std :: string s)构造函数(upvoted)中维护这个不变量。

It turns out that maintaining this invariant is surprisingly tricky. Therefore my very first recommendation is that you redesign Buffer such that this invariant is no longer needed. The simplest way to do that is to compute _raw on the fly whenever you need it, instead of storing it. For example:

事实证明,保持这种不变性是非常棘手的。因此,我的第一个建议是重新设计Buffer,以便不再需要这个不变量。最简单的方法是在需要时动态计算_raw,而不是存储它。例如:

void Print()
{
    std::cout << _buffer.data();
}

That being said, if you really need to store _raw and maintain this invariant:

话虽这么说,如果你真的需要存储_raw并保持这个不变量:

assert(_raw == _buffer.data());

The following is the path you will need to go down...

以下是您需要走的路径......

Buffer(std::string s)
    : _buffer(std::move(s))
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Reorder your initialization such that you first construct _buffer by moving into it, and then point into _buffer. Do not point into the local s which will be destructed as soon as this constructor completes.

重新排序初始化,以便首先通过移入它来构造_buffer,然后指向_buffer。不要指向本构造函数完成后将被破坏的本地s。

A very subtle point here is that despite the fact that I have reordered the initialization list in the constructor, I have not yet actually reordered the actual construction. To do that, I must reorder the list of data member declarations:

这里非常微妙的一点是,尽管我已经在构造函数中重新排序了初始化列表,但我还没有对实际构造进行重新排序。为此,我必须重新排序数据成员声明列表:

private:
    std::string _buffer;
    char* _raw;

It is this order, and not the order of the initialization list in the constructor that determines which member is constructed first. Some compilers with some warnings enabled will warn you if you attempt to order your constructor initialization list differently than the order the members will actually be constructed.

它是这个顺序,而不是构造函数中初始化列表的顺序,它确定首先构造哪个成员。某些启用了某些警告的编译器会在您尝试按照实际构造成员的顺序不同的方式对构造函数初始化列表进行排序时发出警告。

Now your program will run as expected, for any string input. However we are just getting started. Buffer is still buggy as your invariant is still not maintained. The best way to demonstrate this is to assert your invariant in ~Buffer():

现在,对于任何字符串输入,您的程序将按预期运行。但是我们刚刚开始。缓冲区仍然存在错误,因为仍然没有维护您的不变量。证明这一点的最好方法是在~Muffer()中断言你的不变量:

~Buffer()
{
    assert(_raw == _buffer.data());
}

As it stands (and without the user-declared ~Buffer() I just recommended), the compiler helpfully supplies you with four more signatures:

因为它(并且没有用户声明的~Muffer()我刚推荐),编译器有助于为您提供另外四个签名:

Buffer(const Buffer&) = default;
Buffer& operator=(const Buffer&) = default;
Buffer(Buffer&&) = default;
Buffer& operator=(Buffer&&) = default;

And the compiler breaks your invariant for every one of these signatures. If you add ~Buffer() as I suggested, the compiler will not supply the move members, but it will still supply the copy members, and still get them wrong (though that behavior has been deprecated). And even if the destructor did inhibit the copy members (as it might in a future standard), the code is still dangerous as under maintenance someone might "optimize" your code like so:

编译器会破坏每个签名的不变量。如果你按照我的建议添加~Buffer(),编译器将不会提供移动成员,但它仍然会提供复制成员,但仍然会错误(尽管这种行为已被弃用)。即使析构函数确实禁止了复制成员(因为它可能在未来的标准中),代码仍然很危险,因为维护人员可能会“优化”您的代码,如下所示:

#ifndef NDEBUG
    ~Buffer()
    {
        assert(_raw == _buffer.data());
    }
#endif

in which case the compiler would supply the buggy copy and move members in release mode.

在这种情况下,编译器将提供错误副本并在发布模式下移动成员。

To fix the code you must re-establish your class invariant every time _buffer is constructed, or outstanding pointers into it might be invalidated. For example:

要修复代码,每次构造_buffer时都必须重新建立类不变量,否则对它的未完成指针可能会失效。例如:

Buffer(const Buffer& b)
    : _buffer(b._buffer)
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Buffer& operator=(const Buffer& b)
{
    if (this != &b)
    {
        _buffer = b._buffer;
        _raw = const_cast<char*>(_buffer.data());
    }
    return *this;
}

If you add any members in the future which have the potential for invalidating _buffer.data(), you must remember to reset _raw. For example a set_string(std::string) member function would need this treatment.

如果您将来添加任何可能使_buffer.data()无效的成员,您必须记住重置_raw。例如,set_string(std :: string)成员函数需要这种处理。

Though you did not directly ask, your question alludes to a very important point in class design: Be aware of your class invariants, and what it takes to maintain them. Corollary: Minimize the number of invariants you have to manually maintain. And test that your invariants actually are maintained.

虽然你没有直接提问,但你的问题提到了课堂设计中一个非常重要的问题:要注意你的班级不变量,以及维护它们需要做些什么。推论:最小化您必须手动维护的不变量的数量。并测试您的不变量实际上是否得到维护。

#2


7  

The constructor takes its argument by value, and when the constructor returns that argument goes out of scope and the object s is destructed.

构造函数按值获取其参数,并且当构造函数返回时,该参数超出范围并且对象s被破坏。

But you save a pointer to the data of that object, and once the object is destructed that pointer is no longer valid, leaving you with a stray pointer and undefined behavior when you dereference the pointer.

但是你保存了一个指向该对象数据的指针,一旦该对象被破坏,指针不再有效,当你取消引用指针时,你会留下一个迷路指针和未定义的行为。

#3


1  

Buffer b("hello");

This is creating temporary string to pass to constructor. When that string goes out of scope at the end of your constructor, you are left with dangling _raw.

这是创建临时字符串以传递给构造函数。当该字符串超出构造函数末尾的范围时,您将留下悬空_raw。

That means an undefined behavior as when you call Print _raw is pointing to de-allocated memory.

这意味着一个未定义的行为,就像你调用Print _raw指向解除分配的内存。