长时间运行的存储过程，没有保持对Azure数据库的连接打开

We have very long running stored procedure doing ETL work in order to load data from raw table into star schema (Fact - Dimensions) in Azure database.

为了将数据从原始表加载到Azure数据库中的星型模式(事实维)，我们有很长的运行存储过程。

This stored procedure takes around 10 hours to 20 hours running over 10 million rows (using MERGE statement).

这个存储过程需要10到20个小时，运行超过1000万行(使用MERGE语句)。

At the moment, we run stored procedure from C# code (ADO.NET) with keeping CommandTimeout = 0 (forever). But sometime the connection is dropped as the connection to Azure database is unstable.

目前，我们从c#代码(ADO.NET)运行存储过程，并保持CommandTimeout = 0(永远)。但是，当与Azure数据库的连接不稳定时，连接就会被删除。

Is it possible to run stored procedure on database level without keeping connection opened all the time, and then log the process of Stored procedure in the Progress table to track down the progress?

是否可以在数据库级上运行存储过程而不始终保持连接打开，然后在进度表中记录存储过程以跟踪进程?

I see some recommendations:

我看到一些建议:

Agent Job, seems not possible on Azure database as it does not support at the moment.

代理工作，在Azure数据库上似乎不可能，因为它目前不支持。
SqlCommand.BeginExecuteNonQuery: I am not sure 100% BeginExecuteNonQuery still keeps connection opened under the hood or not.

SqlCommand。BeginExecuteNonQuery:我不确定100%的BeginExecuteNonQuery是否仍然在hood下面打开连接。

Is there any other way to do this?

还有别的办法吗?

2 个解决方案

#1

You can use Azure Automation runbook.

您可以使用Azure自动运行库。

https://azure.microsoft.com/en-us/blog/azure-automation-your-sql-agent-in-the-cloud/?cdn=disable

https://gallery.technet.microsoft.com/scriptcenter/How-to-use-a-SQL-Command-be77f9d2

https://azure.microsoft.com/en-us/blog/azure-automation-runbook-management/?cdn=disable

http://www.sqlservercentral.com/articles/Azure+SQL+database/117804/

/ 117804 / http://www.sqlservercentral.com/articles/Azure + SQL数据库

Hope this helps.

希望这个有帮助。

Regards,

问候,

Alberto Morillo

阿尔贝托时期

#2

Azure Data Factory has a Stored Procedure task which could do this. It has a timeout property in the policy section which is optional. If you leave it out, it defaults to infinite:

Azure数据工厂有一个存储过程任务，可以完成这个任务。它在策略部分有一个可选的超时属性。如果你忽略它，它默认为无穷大:

"policy": {
           "concurrency": 1,
           "retry": 3
           },

If you specify the timeout as 0 when creating the activity, you'll see it disappear when you provision the task in the portal. You could also try specify the timeout at 1 day (24 hours), eg "timeout": "1.00:00:00", although I haven't tested it times out correctly.

如果在创建活动时将超时指定为0，那么在门户中提供任务时将会看到超时消失。您也可以尝试指定1天(24小时)的超时，例如“timeout”:“1.00:00”，尽管我没有正确地测试它超时。

You could also set the timeout to 0 in the connection string although again I haven't tested this option, eg

您还可以将连接字符串的超时设置为0，尽管我同样没有测试这个选项，例如

{
  "name": "AzureSqlLinkedService",
  "properties": {
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=0"
    }
  }
}

I would regard this as more straightforward than Azure Automation but that's a personal choice. Maybe try both options and see which works best for you.

我认为这比Azure自动化更直接，但这是我个人的选择。也许两种选择都试试，看看哪一种最适合你。

I agree with some of the other comments being made about the MERGE taking too long for that volume of records. I suspect either your table does not have appropriate indexing to support the MERGE or you're either running too low a service tier. What service tier are you running on, eg Basic,Standard, Premium (P1-P15). Consider raising a separate question with the DDL of your table including indexes and some sample data, the MERGE statement and service tier, I'm sure that can go faster.

我同意其他一些关于合并占用了大量记录的时间的评论。我怀疑，要么您的表没有适当的索引来支持合并，要么您的服务层运行得太低。您正在运行的服务层是什么，例如基本的、标准的、高级的(P1-P15)。考虑对表的DDL(包括索引和一些示例数据)、MERGE语句和服务层提出一个单独的问题，我相信这会更快。

As a test / quick fix, you could always refactor the MERGE as the appropriate INSERT / UPDATE / DELETE - I bet it goes faster. Let us know.

作为一个测试/快速修复，您总是可以重构MERGE作为适当的插入/更新/删除——我打赌它会运行得更快。让我们知道。

The connection between Azure Data Factory and Azure database should be stable. If it isn't you can raise support tickets. However for cloud architecture (and really any architecture) you need to make good design decisions which allow for the possibility of things going wrong. That means architecturally, you have to design for the possibility of the connection dropping, or the job failing. Example is make sure your job is restartable from the point of failure, make sure the error reporting is good etc.

Azure数据工厂和Azure数据库之间的连接应该是稳定的。如果不是，你可以提出支持票。然而，对于云架构(实际上是任何架构)，您需要做出良好的设计决策，以允许出现问题的可能性。这意味着在体系结构上，您必须为连接丢失或作业失败的可能性进行设计。例如，确保您的作业从失败点开始就可以重新启动，确保错误报告良好等等。

Also, from experience, given your data volumes (which I regard as low), this job is taking far too long. There must be an issue with it or the design. It is my strongest recommendation that you attempt to resolve this issue.

此外，根据经验，考虑到你的数据量(我认为数据量很低)，这项工作花的时间太长了。它或设计一定有问题。我最强烈的建议是你设法解决这个问题。

#1