std字符串应该崩溃但不会崩溃

时间:2021-09-17 12:27:47

I have a class:

我上课了:

class A {
  public:
  string B;
};

and then a code:

然后是一个代码:

A a1;
a1.B = "abc";

printf("%p.\n", a1.B.c_str());

A a2(a1);

printf("%p.\n", a2.B.c_str());

c_str's of both instances refer to same place (this I understand, copy constructor copied A bit-by-bit, and string internally stores data in char*, and pointer got copied.

两个实例的c_str引用相同的位置(我理解,复制构造函数逐位复制,字符串在内部将数据存储在char *中,并且指针被复制。

but the question is, why doesn't this code crash? a1 and a2 are stack variables, when desconstructing them string B's will also get deconstructed, won't internal char* of those strings (that point to same memory location) get deleted twice? isn't it double delete, which should cause crash? btw I disabled gcc optimizations and valgrind doesn't show anything as well.

但问题是,为什么这段代码没有崩溃? a1和a2是堆栈变量,当解构它们时,字符串B也将被解构,这些字符串的内部char *(指向相同的内存位置)是否会被删除两次?是不是双重删除,这应该导致崩溃?顺便说一句,我禁用了gcc优化,而valgrind也没有显示任何内容。

4 个解决方案

#1


13  

No, the pointer did not get copied. The copy constructor of std::string creates a new buffer and copies the data from the buffer of the other string.

不,指针没有被复制。 std :: string的复制构造函数创建一个新缓冲区并从另一个字符串的缓冲区复制数据。

Edit: the C++ standard used to allow copy-on-write semantics, which would share the pointer (and would require reference counting to go along with it), but this was disallowed starting with C++11. Apparently there were versions of GCC which did this.

编辑:用于允许写时复制语义的C ++标准,它将共享指针(并且需要引用计数与它一起),但是从C ++ 11开始不允许这样做。显然有GCC的版本就是这样做的。

#2


3  

For GCC 4.*

对于GCC 4. *

There is an internal counter in the string class, to know the number of instances pointing to the buffer. When the counter is turned to 0, the instance has the responsability to free the memory. It's the same behaviour than shared pointer (boost or C++11).

字符串类中有一个内部计数器,用于了解指向缓冲区的实例数。当计数器变为0时,实例负责释放内存。它与共享指针(boost或C ++ 11)的行为相同。

Moreover, when the string is modified, then a new buffer is allocated to avoid the modification on the other instances sharing the buffer.

此外,当字符串被修改时,则分配新缓冲区以避免在共享缓冲区的其他实例上进行修改。

#3


2  

should crash but doesn't

应该崩溃但不会崩溃

This statement should be taken with a grain of salt. C++ has no concept of "must crash". It has a concept of undefined behaviour, which may or may not result in crashes. Even so, your code has no undefined behaviour.

这句话应该带着一点点的含义。 C ++没有“必须崩溃”的概念。它有一个未定义行为的概念,可能会也可能不会导致崩溃。即便如此,您的代码也没有未定义的行为。

c_str's of both instances refer to same place (this I understand, copy constructor copied A bit-by-bit, and string internally stores data in char*, and pointer got copied.

两个实例的c_str引用相同的位置(我理解,复制构造函数逐位复制,字符串在内部将数据存储在char *中,并且指针被复制。

You are talking about the implementation of std::string. You must must instead look at its interface in order to decide which operations are safe and which aren't.

您正在谈论std :: string的实现。您必须改为查看其界面,以确定哪些操作是安全的,哪些操作不安全。

Other than that, the implementation you are talking about, called copy-on-write or "COW", is obsolete since C++11. Latest GCC versions have abandoned it.

除此之外,你所讨论的实现,称为copy-on-write或“COW”,自C ++ 11以来已经过时了。最新的GCC版本已经放弃了它。

See GCC 5 Changes, New Features, and Fixes:

请参阅GCC 5更改,新功能和修复:

A new implementation of std::string is enabled by default, using the small string optimization instead of copy-on-write reference counting.

默认情况下,使用小字符串优化而不是写入时复制引用计数来启用std :: string的新实现。

Small-string optimisation is the same technique used also, for example, in the Visual C++ implementation of std::string. It works in a completely different way, so your understanding of how std::string works on the inside is no longer correct if you use a sufficiently new GCC version, or it has never been correct if you use Visual C++.

小字符串优化也是使用的技术,例如,在std :: string的Visual C ++实现中。它以完全不同的方式工作,因此如果您使用足够新的GCC版本,那么您对std :: string如何在内部工作的理解不再正确,或者如果您使用Visual C ++则从未正确。

but the question is, why doesn't this code crash?

但问题是,为什么这段代码没有崩溃?

Because it uses std::string operations correctly according to the documentation of its interface and because your compiler is not completely broken.

因为它根据其接口的文档正确使用std :: string操作,并且因为您的编译器没有完全被破坏。

You are basically asking why your compiler produces a working binary for correct code.

您基本上是在问为什么编译器会生成正确代码的工作二进制文件。

a1 and a2 are stack variables,

a1和a2是堆栈变量,

Yes (the correct term would be that the objects have "automatic storage duration").

是(正确的术语是对象具有“自动存储持续时间”)。

when desconstructing them string B's will also get deconstructed, won't internal char* of those strings (that point to same memory location) get deleted twice?

当解构它们时,字符串B也将被解构,这些字符串的内部字符*(指向相同的内存位置)是否会被删除两次?

Your compiler's std::string implementation makes sure that this does not happen. Either it doesn't use COW at all, or the destructor contains code that checks if the shared buffer was already deleted.

您的编译器的std :: string实现确保不会发生这种情况。它根本不使用COW,或者析构函数包含检查共享缓冲区是否已被删除的代码。

If you are using an older GCC version, then you can just look at the source code of your std::string implementation to find out how exactly it's done. It's open source, after all -- but beware, for it might look a bit scary. For example, here's the destructor code for an older GCC version:

如果您使用的是较旧的GCC版本,那么您可以查看std :: string实现的源代码,以了解它是如何完成的。毕竟它是开源的 - 但要注意,因为它可能看起来有点可怕。例如,这是旧GCC版本的析构函数代码:

~basic_string()
{ _M_rep()->_M_dispose(this->get_allocator()); }

Then look at _M_dispose (in the same file) and you'll see that it's a very complicated implementation with various checks and synchronisations.

然后查看_M_dispose(在同一个文件中),你会发现它是一个非常复杂的实现,具有各种检查和同步。

Also consider this:

还要考虑这个:

If the sheer act of copying a std::string would result in crashes, then the whole class would be completely pointless, wouldn't it?

如果复制std :: string的纯粹行为会导致崩溃,那么整个类将完全没有意义,不是吗?

#4


-2  

It doesn't crash because string copy actually duplicates the string, so both strings will point to different memory locations with same data.

它不会崩溃,因为字符串复制实际上会复制字符串,因此两个字符串将指向具有相同数据的不同内存位置。

#1


13  

No, the pointer did not get copied. The copy constructor of std::string creates a new buffer and copies the data from the buffer of the other string.

不,指针没有被复制。 std :: string的复制构造函数创建一个新缓冲区并从另一个字符串的缓冲区复制数据。

Edit: the C++ standard used to allow copy-on-write semantics, which would share the pointer (and would require reference counting to go along with it), but this was disallowed starting with C++11. Apparently there were versions of GCC which did this.

编辑:用于允许写时复制语义的C ++标准,它将共享指针(并且需要引用计数与它一起),但是从C ++ 11开始不允许这样做。显然有GCC的版本就是这样做的。

#2


3  

For GCC 4.*

对于GCC 4. *

There is an internal counter in the string class, to know the number of instances pointing to the buffer. When the counter is turned to 0, the instance has the responsability to free the memory. It's the same behaviour than shared pointer (boost or C++11).

字符串类中有一个内部计数器,用于了解指向缓冲区的实例数。当计数器变为0时,实例负责释放内存。它与共享指针(boost或C ++ 11)的行为相同。

Moreover, when the string is modified, then a new buffer is allocated to avoid the modification on the other instances sharing the buffer.

此外,当字符串被修改时,则分配新缓冲区以避免在共享缓冲区的其他实例上进行修改。

#3


2  

should crash but doesn't

应该崩溃但不会崩溃

This statement should be taken with a grain of salt. C++ has no concept of "must crash". It has a concept of undefined behaviour, which may or may not result in crashes. Even so, your code has no undefined behaviour.

这句话应该带着一点点的含义。 C ++没有“必须崩溃”的概念。它有一个未定义行为的概念,可能会也可能不会导致崩溃。即便如此,您的代码也没有未定义的行为。

c_str's of both instances refer to same place (this I understand, copy constructor copied A bit-by-bit, and string internally stores data in char*, and pointer got copied.

两个实例的c_str引用相同的位置(我理解,复制构造函数逐位复制,字符串在内部将数据存储在char *中,并且指针被复制。

You are talking about the implementation of std::string. You must must instead look at its interface in order to decide which operations are safe and which aren't.

您正在谈论std :: string的实现。您必须改为查看其界面,以确定哪些操作是安全的,哪些操作不安全。

Other than that, the implementation you are talking about, called copy-on-write or "COW", is obsolete since C++11. Latest GCC versions have abandoned it.

除此之外,你所讨论的实现,称为copy-on-write或“COW”,自C ++ 11以来已经过时了。最新的GCC版本已经放弃了它。

See GCC 5 Changes, New Features, and Fixes:

请参阅GCC 5更改,新功能和修复:

A new implementation of std::string is enabled by default, using the small string optimization instead of copy-on-write reference counting.

默认情况下,使用小字符串优化而不是写入时复制引用计数来启用std :: string的新实现。

Small-string optimisation is the same technique used also, for example, in the Visual C++ implementation of std::string. It works in a completely different way, so your understanding of how std::string works on the inside is no longer correct if you use a sufficiently new GCC version, or it has never been correct if you use Visual C++.

小字符串优化也是使用的技术,例如,在std :: string的Visual C ++实现中。它以完全不同的方式工作,因此如果您使用足够新的GCC版本,那么您对std :: string如何在内部工作的理解不再正确,或者如果您使用Visual C ++则从未正确。

but the question is, why doesn't this code crash?

但问题是,为什么这段代码没有崩溃?

Because it uses std::string operations correctly according to the documentation of its interface and because your compiler is not completely broken.

因为它根据其接口的文档正确使用std :: string操作,并且因为您的编译器没有完全被破坏。

You are basically asking why your compiler produces a working binary for correct code.

您基本上是在问为什么编译器会生成正确代码的工作二进制文件。

a1 and a2 are stack variables,

a1和a2是堆栈变量,

Yes (the correct term would be that the objects have "automatic storage duration").

是(正确的术语是对象具有“自动存储持续时间”)。

when desconstructing them string B's will also get deconstructed, won't internal char* of those strings (that point to same memory location) get deleted twice?

当解构它们时,字符串B也将被解构,这些字符串的内部字符*(指向相同的内存位置)是否会被删除两次?

Your compiler's std::string implementation makes sure that this does not happen. Either it doesn't use COW at all, or the destructor contains code that checks if the shared buffer was already deleted.

您的编译器的std :: string实现确保不会发生这种情况。它根本不使用COW,或者析构函数包含检查共享缓冲区是否已被删除的代码。

If you are using an older GCC version, then you can just look at the source code of your std::string implementation to find out how exactly it's done. It's open source, after all -- but beware, for it might look a bit scary. For example, here's the destructor code for an older GCC version:

如果您使用的是较旧的GCC版本,那么您可以查看std :: string实现的源代码,以了解它是如何完成的。毕竟它是开源的 - 但要注意,因为它可能看起来有点可怕。例如,这是旧GCC版本的析构函数代码:

~basic_string()
{ _M_rep()->_M_dispose(this->get_allocator()); }

Then look at _M_dispose (in the same file) and you'll see that it's a very complicated implementation with various checks and synchronisations.

然后查看_M_dispose(在同一个文件中),你会发现它是一个非常复杂的实现,具有各种检查和同步。

Also consider this:

还要考虑这个:

If the sheer act of copying a std::string would result in crashes, then the whole class would be completely pointless, wouldn't it?

如果复制std :: string的纯粹行为会导致崩溃,那么整个类将完全没有意义,不是吗?

#4


-2  

It doesn't crash because string copy actually duplicates the string, so both strings will point to different memory locations with same data.

它不会崩溃,因为字符串复制实际上会复制字符串,因此两个字符串将指向具有相同数据的不同内存位置。