在Visual Studio中，与std :: async一起使用时未调用`thread_local`变量'析构函数，这是一个错误吗？

The following code

以下代码

#include <iostream>
#include <future>
#include <thread>
#include <mutex>

std::mutex m;

struct Foo {
    Foo() {
        std::unique_lock<std::mutex> lock{m};
        std::cout <<"Foo Created in thread " <<std::this_thread::get_id() <<"\n";
    }

    ~Foo() {
        std::unique_lock<std::mutex> lock{m};
        std::cout <<"Foo Deleted in thread " <<std::this_thread::get_id() <<"\n";
    }

    void proveMyExistance() {
        std::unique_lock<std::mutex> lock{m};
        std::cout <<"Foo this = " << this <<"\n";
    }
};

int threadFunc() {
    static thread_local Foo some_thread_var;

    // Prove the variable initialized
    some_thread_var.proveMyExistance();

    // The thread runs for some time
    std::this_thread::sleep_for(std::chrono::milliseconds{100}); 

    return 1;
}

int main() {
    auto a1 = std::async(std::launch::async, threadFunc);
    auto a2 = std::async(std::launch::async, threadFunc);
    auto a3 = std::async(std::launch::async, threadFunc);

    a1.wait();
    a2.wait();
    a3.wait();

    std::this_thread::sleep_for(std::chrono::milliseconds{1000});        

    return 0;
}

Compiled and run width clang in macOS:

在macOS中编译并运行宽度clang:

clang++ test.cpp -std=c++14 -pthread
./a.out

Got result

Foo Created in thread 0x70000d9f2000
Foo Created in thread 0x70000daf8000
Foo Created in thread 0x70000da75000
Foo this = 0x7fd871d00000
Foo this = 0x7fd871c02af0
Foo this = 0x7fd871e00000
Foo Deleted in thread 0x70000daf8000
Foo Deleted in thread 0x70000da75000
Foo Deleted in thread 0x70000d9f2000

Compiled and run in Visual Studio 2015 Update 3:

在Visual Studio 2015 Update 3中编译并运行:

Foo Created in thread 7180
Foo this = 00000223B3344120
Foo Created in thread 8712
Foo this = 00000223B3346750
Foo Created in thread 11220
Foo this = 00000223B3347E60

Destructor are not called.

析构函数不会被调用。

Is this a bug or some undefined grey zone?

这是一个错误还是一些未定义的灰色区域?

P.S.

If the sleep std::this_thread::sleep_for(std::chrono::milliseconds{1000}); at the end is not long enough, you may not see all 3 "Delete" messages sometimes.

如果睡眠std :: this_thread :: sleep_for(std :: chrono :: milliseconds {1000});最后时间不够长,有时你可能看不到所有3个“删除”消息。

When using std::thread instead of std::async, the destructors get called on both platform, and all 3 "Delete" messages will always be printed.

当使用std :: thread而不是std :: async时,将在两个平台上调用析构函数,并且将始终打印所有3个“删除”消息。

2 个解决方案

#1

Introductory Note: I have now learned a lot more about this and have therefore re-written my answer. Thanks to @super, @M.M and (latterly) @DavidHaim and @NoSenseEtAl for putting me on the right track.

介绍性说明:我现在已经对此有了更多了解,因此重新编写了我的答案。感谢@super,@ M.M和(后来)@DavidHaim和@NoSenseEtAl让我走上正轨。

tl;dr Microsoft's implementation of std::async is non-conformant, but they have their reasons and what they have done can actually be useful, once you understand it properly.

tl;博士微软的std :: async实现是不符合要求的,但是一旦你理解正确,他们就有他们的理由和他们所做的事情实际上是有用的。

For those who don't want that, it is not too difficult to code up a drop-in replacement replacement for std::async which works the same way on all platforms. I have posted one here.

对于那些不想要它的人来说,为std :: async编写一个替换替代品并不太难,它在所有平台上都以相同的方式工作。我在这里发了一个。

Edit: Wow, how open MS are being these days, I like it, see: https://github.com/MicrosoftDocs/cpp-docs/issues/308

编辑:哇,MS现在多么开放,我喜欢它,请参阅:https://github.com/MicrosoftDocs/cpp-docs/issues/308

Let's being at the beginning. cppreference has this to say (emphasis and strikethrough mine):

让我们开始吧。 cppreference有这个说法(强调和删除我的):

The template function async runs the function f asynchronously ( ~~potentially~~ optionally in a separate thread which may be part of a thread pool).

模板函数async异步运行函数f(可能可选地在可能是线程池的一部分的单独线程中)。

However, the C++ standard says this:

但是,C ++标准说:

If launch::async is set in policy, [std::async] calls [the function f] as if in a new thread of execution ...

如果在策略中设置了launch :: async,[std :: async]会调用[函数f],就像在新的执行线程中一样...

So which is correct? The two statements have very different semantics as the OP has discovered. Well of course the standard is correct, as both clang and gcc show, so why does the Windows implementation differ? And like so many things, it comes down to history.

哪个是正确的? OP已经发现,这两个语句具有非常不同的语义。当然,标准是正确的,因为clang和gcc都显示,那么为什么Windows实现会有所不同呢?就像很多事情一样,它归结为历史。

The (oldish) link that M.M dredged up has this to say, amongst other things:

M.M疏浚的(古老)链接有这样的说法,其中包括:

... Microsoft has its implementation of [std::async] in the form of PPL (Parallel Pattern Library) ... [and] I can understand the eagerness of those companies to bend the rules and make these libraries accessible through std::async, especially if they can dramatically improve performance...

...微软以PPL(并行模式库)的形式实现了[std :: async] ... [和]我可以理解这些公司急于改变规则并通过std访问这些库的渴望: :async,特别是如果它们可以显着提高性能......

... Microsoft wanted to change the semantics of std::async when called with launch_policy::async. I think this was pretty much ruled out in the ensuing discussion ... (rationale follows, if you want to know more then read the link, it's well worth it).

...当使用launch_policy :: async调用时,Microsoft希望更改std :: async的语义。我认为在随后的讨论中几乎排除了这一点......(理由如下,如果你想了解更多,那么阅读链接,这是值得的)。

And PPL is based on Windows' built-in support for ThreadPools, so @super was right.

PPL基于Windows对ThreadPools的内置支持,所以@super是对的。

So what does the Windows thread pool do and what is it good for? Well, it's intended to manage frequently-sheduled, short-running tasks in an efficient way so point 1 is don't abuse it, but my simple tests show that if this is your use-case then it can offer significant efficiencies. It does, essentially, two things

那么Windows线程池做了什么以及它有什么用呢?好吧,它旨在以有效的方式管理经常运行,短期运行的任务,所以第1点不要滥用它,但我的简单测试表明,如果这是你的用例,那么它可以提供显着的效率。它本质上是两件事

It recycles threads, rather than having to always start a new one for each asynchronous task you launch.

它会回收线程,而不必总是为您启动的每个异步任务启动一个新线程。

It limits the total number of background threads it uses, after which a call to std::async will block until a thread becomes free. On my machine, this number is 768.

它限制了它使用的后台线程总数,之后对std :: async的调用将阻塞,直到线程变为空闲。在我的机器上,这个数字是768。

So knowing all that, we can now explain the OP's observations:

所以,我们可以解释OP的观察:

A new thread is created for each of the three tasks started by main() (because none of them terminates immediately).

为main()启动的三个任务中的每个任务创建一个新线程(因为它们都不会立即终止)。
Each of these three threads creates a new thread-local variable Foo some_thread_var.

这三个线程中的每一个都创建一个新的线程局部变量Foo some_thread_var。
These three tasks all run to completion but the threads they are running on remain in existence (sleeping).

这三个任务都运行完成,但它们运行的线程仍然存在(休眠)。
The program then sleeps for a short while and then exits, leaving the 3 thread-local variables un-destructed.

程序然后休眠一会儿然后退出,留下3个线程局部变量未被破坏。

I ran a number of tests and in addition to this I found a few key things:

我运行了一些测试,除此之外我发现了一些关键的东西:

When a thread is recycled, the thread-local variables are re-used. Specifically, they are not destroyed and then re-created (you have been warned!).

当线程被回收时,线程局部变量被重用。具体来说,它们不会被销毁然后重新创建(您已被警告过!)。

If all the asynchonous tasks complete and you wait long enough, the thread pool terminates all the associated threads and the thread-local variables are then destroyed. (No doubt the actual rules are more complex than that but that's what I observed).

如果所有异步任务都完成并且您等待足够长的时间,则线程池将终止所有关联的线程,然后销毁线程局部变量。 (毫无疑问,实际的规则比这更复杂,但这就是我所观察到的)。

As new asynchonous tasks are submitted, the thread pool limits the rate at which new threads are created, in the hope that one will become free before it needs to perform all that work (creating new threads is expensive). A call to std::async might therefore take a while to return (up to 300ms in my tests). In the meantime, it's just hanging around, hoping that its ship will come in. This behaviour is documented but I call it out here in case it takes you by surprise.

当提交新的异步任务时,线程池限制了创建新线程的速率,希望在它需要执行所有工作之前它们将变为空闲(创建新线程是昂贵的)。因此,调用std :: async可能需要一段时间才能返回(在我的测试中最多300毫秒)。在此期间,它只是闲逛,希望它的船将进来。这种行为有记录,但我在这里称呼它,以防它让你感到惊讶。

Conclusions:

Microsoft's implementation of std::async is non-conformant but it is clearly designed with a specific purpose, and that purpose is to make good use of the Win32 ThreadPool API. You can beat them up for blantantly flouting the standard but it's been this way for a long time and they probably have (important!) customers who rely on it. I will ask them to call this out in their documentation. Not doing that is criminal.

Microsoft的std :: async实现不符合要求,但它的设计明确是出于特定目的,其目的是充分利用Win32 ThreadPool API。你可以肆无忌惮地蔑视它们,但是很长一段时间都是这样,他们可能有(重要的)客户依赖它。我会请他们在他们的文件中说出来。不这样做是犯罪行为。
It is not safe to use thread_local variables in std::async tasks on Windows. Just don't do it, it will end in tears.

在Windows上的std :: async任务中使用thread_local变量是不安全的。只是不要这样做,它会以泪水结束。

#2

Looks like just another of many bugs in VC++. Consider this quote from n4750

看起来只是VC ++中的许多错误中的另一个。请考虑n4750的这句话

All variables declared with the thread_local keyword have thread storage duration . The storage for these entities shall last for the duration of the thread in which they are created. There is a distinct object or reference per thread, and use of the declared name refers to the entity associated with the current thread. 2 A variable with thread storage duration shall be initialized before its first odr-use (6.2) and, if constructed, shall be destroyed on thread exit.

使用thread_local关键字声明的所有变量都具有线程存储持续时间。这些实体的存储应持续创建它们的线程的持续时间。每个线程有一个不同的对象或引用,声明的名称的使用是指与当前线程关联的实体。 2具有螺纹存储持续时间的变量应在其第一次使用之前初始化(6.2),如果构造,则应在螺纹退出时销毁。

+this

If the implementation chooses the launch::async policy, — (5.3) a call to a waiting function on an asynchronous return object that shares the shared state created by this async call shall block until the associated thread has completed, as if joined, or else time out (33.3.2.5);

如果实现选择了launch :: async策略, - (5.3)对共享由此异步调用创建的共享状态的异步返回对象的等待函数的调用将阻塞,直到关联的线程完成,就好像已连接,或者别的时间超时(33.3.2.5);

I could be wrong("thread exit" vs "thread completed", but I feel this means that thread_local variables need to be destroyed before .wait() call unblocks.

我可能是错的(“线程退出”vs“线程已完成”,但我觉得这意味着在.wait()调用unblocks之前需要销毁thread_local变量。

#1