Python中预准备语句和参数化查询之间的混淆

时间:2022-01-19 00:10:28

As far as I understand, prepared statements are (mainly) a database feature that allows you to separate parameters from the code that uses such parameters. Example:

据我所知,准备好的语句(主要)是一个数据库功能,允许您将参数与使用这些参数的代码分开。例:

PREPARE fooplan (int, text, bool, numeric) AS
    INSERT INTO foo VALUES($1, $2, $3, $4);
EXECUTE fooplan(1, 'Hunter Valley', 't', 200.00);

A parameterized query substitutes the manual string interpolation, so instead of doing

参数化查询替代手动字符串插值,因此不是做

cursor.execute("SELECT FROM tablename WHERE fieldname = %s" % value)

we can do

我们可以做的

cursor.execute("SELECT FROM tablename WHERE fieldname = %s", [value])

Now, it seems that prepared statements are, for the most part, used in the database language and parameterized queries are mainly used in the programming language connecting to the database, although I have seen exceptions to this rule.

现在,似乎准备好的语句大部分都用在数据库语言中,参数化查询主要用于连接数据库的编程语言,尽管我已经看到了这个规则的例外。

The problem is that asking about the difference between prepared statement and parameterized query brings a lot of confusion. Their purpose is admittedly the same, but their methodology seems distinct. Yet, there are sources indicating that both are the same. MySQLdb and Psycopg2 seem to support parameterized queries but don’t support prepared statements (e.g. here for MySQLdb and in the TODO list for postgres drivers or this answer in the sqlalchemy group). Actually, there is a gist implementing a psycopg2 cursor supporting prepared statements and a minimal explanation about it. There is also a suggestion of subclassing the cursor object in psycopg2 to provide the prepared statement manually.

问题是询问预准备语句和参数化查询之间的区别会带来很多混乱。他们的目的无疑是相同的,但他们的方法似乎是截然不同的。然而,有消息表明两者都是相同的。 MySQLdb和Psycopg2似乎支持参数化查询,但不支持预处理语句(例如,这里是MySQLdb和postgres驱动程序的TODO列表或sqlalchemy组中的这个答案)。实际上,有一个实现psycopg2游标的要点,支持准备好的语句和关于它的最小解释。还有一个建议是在psycopg2中继承游标对象以手动提供准备好的语句。

I would like to get an authoritative answer to the following questions:

我想得到以下问题的权威答案:

  • Is there a meaningful difference between prepared statement and parameterized query? Does this matter in practice? If you use parameterized queries, do you need to worry about prepared statements?

    预准备语句和参数化查询之间是否存在有意义的区别?这在实践中是否重要?如果使用参数化查询,是否需要担心预处理语句?

  • If there is a difference, what is the current status of prepared statements in the Python ecosystem? Which database adapters support prepared statements?

    如果存在差异,Python生态系统中预准备语句的当前状态是什么?哪些数据库适配器支持预处理语

3 个解决方案

#1


9  

  • Prepared statement: A reference to a pre-interpreted query routine on the database, ready to accept parameters

    准备语句:对数据库上的预解释查询例程的引用,准备接受参数

  • Parametrized query: A query made by your code in such a way that you are passing values in alongside some SQL that has placeholder values, usually ? or %s or something of that flavor.

    参数化查询:您的代码以这样的方式进行的查询:您通常将值与一些具有占位符值的SQL一起传递?或%s或那种味道的东西。

The confusion here seems to stem from the (apparent) lack of distinction between the ability to directly get a prepared statement object and the ability to pass values into a 'parametrized query' method that acts very much like one... because it is one, or at least makes one for you.

这里的混淆似乎源于直接获得预准备语句对象的能力(明显)与将值传递到“参数化查询”方法之间的区别,这种方法非常像一个...因为它是一个,或至少为你做一个。

For example: the C interface of the SQLite3 library has a lot of tools for working with prepared statement objects, but the Python api makes almost no mention of them. You can't prepare a statement and use it multiple times whenever you want. Instead, you can use sqlite3.executemany(sql, params) which takes the SQL code, creates a prepared statement internally, then uses that statement in a loop to process each of your parameter tuples in the iterable you gave.

例如:SQLite3库的C接口有许多用于处理预处理语句对象的工具,但Python api几乎没有提及它们。您无法随时准备声明并多次使用它。相反,您可以使用sqlite3.executemany(sql,params),它接受SQL代码,在内部创建预准备语句,然后在循环中使用该语句来处理您给出的可迭代中的每个参数元组。

Many other SQL libraries in Python behave the same way. Working with prepared statement objects can be a real pain, and can lead to ambiguity, and in a language like Python which has such a lean towards clarity and ease over raw execution speed they aren't really the greatest option. Essentially, if you find yourself having to make hundreds of thousands or millions of calls to a complex SQL query that gets re-interpreted every time, you should probably be doing things differently. Regardless, sometimes people wish they could have direct access to these objects because if you keep the same prepared statement around the database server won't have to keep interpreting the same SQL code over and over; most of the time this will be approaching the problem from the wrong direction and you will get much greater savings elsewhere or by restructuring your code.*

Python中的许多其他SQL库的行为方式相同。使用准备好的语句对象可能是一个真正的痛苦,并且可能导致歧义,并且在像Python这样的语言中,这种语言倾向于清晰度和易于原始执行速度,它们实际上并不是最好的选择。从本质上讲,如果您发现自己不得不对每次重新解释的复杂SQL查询进行数十万或数百万次调用,那么您应该采取不同的方式。无论如何,有时人们希望他们可以直接访问这些对象,因为如果你在数据库服务器周围保留相同的预处理语句就不必一遍又一遍地解释相同的SQL代码;大多数情况下,这将从错误的方向接近问题,您将在其他地方或通过重组代码获得更大的节省。*

Perhaps more importantly in general is the way that prepared statements and parametrized queries keep your data sanitary and separate from your SQL code. This is vastly preferable to string formatting! You should think of parametrized queries and prepared statements, in one form or another, as the only way to pass variable data from your application into the database. If you try to build the SQL statement otherwise, it will not only run significantly slower but you will be vulnerable to other problems.

也许更重要的是,准备语句和参数化查询的方式可以使您的数据保持卫生并与SQL代码分开。这比字符串格式更受欢迎!您应该以一种或另一种形式将参数化查询和预准备语句视为将可变数据从应用程序传递到数据库的唯一方法。如果您尝试构建SQL语句,否则它将不仅运行速度明显变慢,而且您将容易受到其他问题的影响。

*e.g., by producing the data that is to be fed into the DB in a generator function then using executemany() to insert it all at once from the generator, rather than calling execute() each time you loop.

*例如,通过在生成器函数中生成要提供给DB的数据,然后使用executemany()从生成器一次插入所有数据,而不是每次循环时调用execute()。

tl;dr

A parametrized query is a single operation which generates a prepared statement internally, then passes in your parameters and executes.

参数化查询是单个操作,它在内部生成预准备语句,然后传入您的参数并执行。

#2


1  

First, your questions shows very good preparation - well done.

首先,你的问题表明非常好的准备 - 干得好。

I am not sure, if I am the person to provide authoritative answer, but I will try to explain my understanding of the situation.

我不确定,如果我是提供权威答案的人,但我会尝试解释我对情况的理解。

Prepared statement is an object, created on side of database server as a result of PREPARE statement, turning provided SQL statement into sort of temporary procedure with parameters. Prepared statement has lifetime of current database session and are discarded after the session is over. SQL statement DEALOCATE allows destroying the prepared statement explicitly.

Prepared语句是一个对象,由于PREPARE语句在数据库服务器端创建,将提供的SQL语句转换为带参数的临时过程。 Prepared语句具有当前数据库会话的生命周期,并在会话结束后被丢弃。 SQL语句DEALOCATE允许显式销毁准备好的语句。

Database clients can use SQL statement EXECUTE to execute the prepared statement by calling it's name and parameters.

数据库客户端可以使用SQL语句EXECUTE通过调用它的名称和参数来执行预准备语句。

Parametrized statement is alias for prepared statement as usually, the prepared statement has some parameters.

参数化语句通常是预准备语句的别名,准备好的语句有一些参数。

Parametrized query seems to be less often used alias for the same (24 mil Google hits for parametrized statement, 14 mil hits for parametrized query). It is possible, that some people use this term for another purpose.

参数化查询似乎不常使用相同的别名(参数化语句为24 mil Google命中,参数化查询为14 mil命中)。有些人可能会将此术语用于其他目的。

Advantages of prepared statements are:

准备好的陈述的好处是:

  • faster execution of actual prepared statement call (not counting the time for PREPARE)
  • 更快地执行实际准备好的语句调用(不计算PREPARE的时间)
  • resistency to SQL injection attack
  • 对SQL注入攻击的依赖性

Players in executing SQL query

Real application will probably have following participants:

真正的应用可能会有以下参与者:

  • application code
  • 应用代码
  • ORM package (e.g. sqlalchemy)
  • ORM包(例如sqlalchemy)
  • database driver
  • 数据库驱动
  • database server
  • 数据库服务器

From application point of view it is not easy to know, if the code will really use prepared statement on database server or not as any of participants may lack support of prepared statements.

从应用程序的角度来看,如果代码确实在数据库服务器上使用预准备语句,并不是因为任何参与者可能缺乏对预准备语句的支持,那么要知道这一点并不容易。

Conclusions

In application code prevent direct shaping of SQL query as it is prone to SQL injection attack. For this reason it is recommended using whatever the ORM provides to parametrized query even if it does not result on using prepared statements on database server side as the ORM code can be optimized to prevent this sort of attack.

在应用程序代码中防止直接整形SQL查询,因为它容易受到SQL注入攻击。因此,建议使用ORM为参数化查询提供的任何内容,即使它不会导致在数据库服务器端使用预准备语句,因为可以优化ORM代码以防止此类攻击。

Decide, if prepared statement is worth for performance reasons. If you have simple SQL query, which is executed only few times, it will not help, sometime it will even slow down the execution a bit.

决定,如果准备好的声明值得出于性能原因。如果你有简单的SQL查询,它只执行几次,它将无济于事,有时它甚至会减慢执行速度。

For complex query being executed many times and having relatively short execution time will be the effect the biggest. In such a case, you may follow these steps:

对于执行多次且执行时间相对较短的复杂查询,效果最大。在这种情况下,您可以按照以下步骤操作:

  • check, that the database you are going to use supports the PREPARE statement. In most cases it will be present.
  • 检查您要使用的数据库是否支持PREPARE语句。在大多数情况下它会存在。
  • check, that the drive you use is supporting prepared statements and if not, try to find another one supporting it.
  • 检查,您使用的驱动器是否支持准备好的语句,如果没有,请尝试找另一个支持它的语句。
  • Check support of this feature on ORM package level. Sometime it vary driver by driver (e.g. sqlalchemy states some limitations on prepared statements with MySQL due to how MySQL manages that).
  • 在ORM包级别检查此功能的支持。有时它会因驱动程序而改变驱动程序(例如,由于MySQL管理它的方式,sqlalchemy声明对MySQL准备好的语句有一些限制)。

If you are in search for real authoritative answer, I would head to authors of sqlalchemy.

如果您正在寻找真正权威的答案,我会前往sqlalchemy的作者。

#3


0  

An sql statement can't be execute immediately: the DBMS must interpret them before the execution.

无法立即执行sql语句:DBMS必须在执行之前解释它们。

Prepared statements are statement already interpreted, the DBMS change parameters and the query starts immediately. This is a feature of certain DBMS and you can achieve fast response (comparable with stored procedures).

准备好的语句已经解释了语句,DBMS更改参数和查询立即启动。这是某些DBMS的一个功能,您可以实现快速响应(与存储过程相当)。

Parametrized statement are just a way you compose the query string in your programming languages. Since it doesn't matter how sql string are formed, you have slower response by DBMS.

参数化语句只是您编写编程语言中查询字符串的一种方式。由于sql字符串的形成方式无关紧要,因此DBMS的响应速度较慢。

If you measure time executing 3-4 time the same query (select with different conditions) you will see the same time with parametrized queries, the time is smaller from the second execution of prepared statement (the first time the DBMS has to interpret the script anyway).

如果您测量执行3-4次相同查询的时间(使用不同条件选择),您将看到与参数化查询相同的时间,从准备语句的第二次执行开始的时间较短(DBMS第一次必须解释脚本无论如何)。

#1


9  

  • Prepared statement: A reference to a pre-interpreted query routine on the database, ready to accept parameters

    准备语句:对数据库上的预解释查询例程的引用,准备接受参数

  • Parametrized query: A query made by your code in such a way that you are passing values in alongside some SQL that has placeholder values, usually ? or %s or something of that flavor.

    参数化查询:您的代码以这样的方式进行的查询:您通常将值与一些具有占位符值的SQL一起传递?或%s或那种味道的东西。

The confusion here seems to stem from the (apparent) lack of distinction between the ability to directly get a prepared statement object and the ability to pass values into a 'parametrized query' method that acts very much like one... because it is one, or at least makes one for you.

这里的混淆似乎源于直接获得预准备语句对象的能力(明显)与将值传递到“参数化查询”方法之间的区别,这种方法非常像一个...因为它是一个,或至少为你做一个。

For example: the C interface of the SQLite3 library has a lot of tools for working with prepared statement objects, but the Python api makes almost no mention of them. You can't prepare a statement and use it multiple times whenever you want. Instead, you can use sqlite3.executemany(sql, params) which takes the SQL code, creates a prepared statement internally, then uses that statement in a loop to process each of your parameter tuples in the iterable you gave.

例如:SQLite3库的C接口有许多用于处理预处理语句对象的工具,但Python api几乎没有提及它们。您无法随时准备声明并多次使用它。相反,您可以使用sqlite3.executemany(sql,params),它接受SQL代码,在内部创建预准备语句,然后在循环中使用该语句来处理您给出的可迭代中的每个参数元组。

Many other SQL libraries in Python behave the same way. Working with prepared statement objects can be a real pain, and can lead to ambiguity, and in a language like Python which has such a lean towards clarity and ease over raw execution speed they aren't really the greatest option. Essentially, if you find yourself having to make hundreds of thousands or millions of calls to a complex SQL query that gets re-interpreted every time, you should probably be doing things differently. Regardless, sometimes people wish they could have direct access to these objects because if you keep the same prepared statement around the database server won't have to keep interpreting the same SQL code over and over; most of the time this will be approaching the problem from the wrong direction and you will get much greater savings elsewhere or by restructuring your code.*

Python中的许多其他SQL库的行为方式相同。使用准备好的语句对象可能是一个真正的痛苦,并且可能导致歧义,并且在像Python这样的语言中,这种语言倾向于清晰度和易于原始执行速度,它们实际上并不是最好的选择。从本质上讲,如果您发现自己不得不对每次重新解释的复杂SQL查询进行数十万或数百万次调用,那么您应该采取不同的方式。无论如何,有时人们希望他们可以直接访问这些对象,因为如果你在数据库服务器周围保留相同的预处理语句就不必一遍又一遍地解释相同的SQL代码;大多数情况下,这将从错误的方向接近问题,您将在其他地方或通过重组代码获得更大的节省。*

Perhaps more importantly in general is the way that prepared statements and parametrized queries keep your data sanitary and separate from your SQL code. This is vastly preferable to string formatting! You should think of parametrized queries and prepared statements, in one form or another, as the only way to pass variable data from your application into the database. If you try to build the SQL statement otherwise, it will not only run significantly slower but you will be vulnerable to other problems.

也许更重要的是,准备语句和参数化查询的方式可以使您的数据保持卫生并与SQL代码分开。这比字符串格式更受欢迎!您应该以一种或另一种形式将参数化查询和预准备语句视为将可变数据从应用程序传递到数据库的唯一方法。如果您尝试构建SQL语句,否则它将不仅运行速度明显变慢,而且您将容易受到其他问题的影响。

*e.g., by producing the data that is to be fed into the DB in a generator function then using executemany() to insert it all at once from the generator, rather than calling execute() each time you loop.

*例如,通过在生成器函数中生成要提供给DB的数据,然后使用executemany()从生成器一次插入所有数据,而不是每次循环时调用execute()。

tl;dr

A parametrized query is a single operation which generates a prepared statement internally, then passes in your parameters and executes.

参数化查询是单个操作,它在内部生成预准备语句,然后传入您的参数并执行。

#2


1  

First, your questions shows very good preparation - well done.

首先,你的问题表明非常好的准备 - 干得好。

I am not sure, if I am the person to provide authoritative answer, but I will try to explain my understanding of the situation.

我不确定,如果我是提供权威答案的人,但我会尝试解释我对情况的理解。

Prepared statement is an object, created on side of database server as a result of PREPARE statement, turning provided SQL statement into sort of temporary procedure with parameters. Prepared statement has lifetime of current database session and are discarded after the session is over. SQL statement DEALOCATE allows destroying the prepared statement explicitly.

Prepared语句是一个对象,由于PREPARE语句在数据库服务器端创建,将提供的SQL语句转换为带参数的临时过程。 Prepared语句具有当前数据库会话的生命周期,并在会话结束后被丢弃。 SQL语句DEALOCATE允许显式销毁准备好的语句。

Database clients can use SQL statement EXECUTE to execute the prepared statement by calling it's name and parameters.

数据库客户端可以使用SQL语句EXECUTE通过调用它的名称和参数来执行预准备语句。

Parametrized statement is alias for prepared statement as usually, the prepared statement has some parameters.

参数化语句通常是预准备语句的别名,准备好的语句有一些参数。

Parametrized query seems to be less often used alias for the same (24 mil Google hits for parametrized statement, 14 mil hits for parametrized query). It is possible, that some people use this term for another purpose.

参数化查询似乎不常使用相同的别名(参数化语句为24 mil Google命中,参数化查询为14 mil命中)。有些人可能会将此术语用于其他目的。

Advantages of prepared statements are:

准备好的陈述的好处是:

  • faster execution of actual prepared statement call (not counting the time for PREPARE)
  • 更快地执行实际准备好的语句调用(不计算PREPARE的时间)
  • resistency to SQL injection attack
  • 对SQL注入攻击的依赖性

Players in executing SQL query

Real application will probably have following participants:

真正的应用可能会有以下参与者:

  • application code
  • 应用代码
  • ORM package (e.g. sqlalchemy)
  • ORM包(例如sqlalchemy)
  • database driver
  • 数据库驱动
  • database server
  • 数据库服务器

From application point of view it is not easy to know, if the code will really use prepared statement on database server or not as any of participants may lack support of prepared statements.

从应用程序的角度来看,如果代码确实在数据库服务器上使用预准备语句,并不是因为任何参与者可能缺乏对预准备语句的支持,那么要知道这一点并不容易。

Conclusions

In application code prevent direct shaping of SQL query as it is prone to SQL injection attack. For this reason it is recommended using whatever the ORM provides to parametrized query even if it does not result on using prepared statements on database server side as the ORM code can be optimized to prevent this sort of attack.

在应用程序代码中防止直接整形SQL查询,因为它容易受到SQL注入攻击。因此,建议使用ORM为参数化查询提供的任何内容,即使它不会导致在数据库服务器端使用预准备语句,因为可以优化ORM代码以防止此类攻击。

Decide, if prepared statement is worth for performance reasons. If you have simple SQL query, which is executed only few times, it will not help, sometime it will even slow down the execution a bit.

决定,如果准备好的声明值得出于性能原因。如果你有简单的SQL查询,它只执行几次,它将无济于事,有时它甚至会减慢执行速度。

For complex query being executed many times and having relatively short execution time will be the effect the biggest. In such a case, you may follow these steps:

对于执行多次且执行时间相对较短的复杂查询,效果最大。在这种情况下,您可以按照以下步骤操作:

  • check, that the database you are going to use supports the PREPARE statement. In most cases it will be present.
  • 检查您要使用的数据库是否支持PREPARE语句。在大多数情况下它会存在。
  • check, that the drive you use is supporting prepared statements and if not, try to find another one supporting it.
  • 检查,您使用的驱动器是否支持准备好的语句,如果没有,请尝试找另一个支持它的语句。
  • Check support of this feature on ORM package level. Sometime it vary driver by driver (e.g. sqlalchemy states some limitations on prepared statements with MySQL due to how MySQL manages that).
  • 在ORM包级别检查此功能的支持。有时它会因驱动程序而改变驱动程序(例如,由于MySQL管理它的方式,sqlalchemy声明对MySQL准备好的语句有一些限制)。

If you are in search for real authoritative answer, I would head to authors of sqlalchemy.

如果您正在寻找真正权威的答案,我会前往sqlalchemy的作者。

#3


0  

An sql statement can't be execute immediately: the DBMS must interpret them before the execution.

无法立即执行sql语句:DBMS必须在执行之前解释它们。

Prepared statements are statement already interpreted, the DBMS change parameters and the query starts immediately. This is a feature of certain DBMS and you can achieve fast response (comparable with stored procedures).

准备好的语句已经解释了语句,DBMS更改参数和查询立即启动。这是某些DBMS的一个功能,您可以实现快速响应(与存储过程相当)。

Parametrized statement are just a way you compose the query string in your programming languages. Since it doesn't matter how sql string are formed, you have slower response by DBMS.

参数化语句只是您编写编程语言中查询字符串的一种方式。由于sql字符串的形成方式无关紧要,因此DBMS的响应速度较慢。

If you measure time executing 3-4 time the same query (select with different conditions) you will see the same time with parametrized queries, the time is smaller from the second execution of prepared statement (the first time the DBMS has to interpret the script anyway).

如果您测量执行3-4次相同查询的时间(使用不同条件选择),您将看到与参数化查询相同的时间,从准备语句的第二次执行开始的时间较短(DBMS第一次必须解释脚本无论如何)。