在SQL Server中验证表中的字符串——CLR函数或T-SQL(问题更新)

时间:2020-12-08 09:34:09

I need to check If a column value (string) in SQL server table starts with a small letter and can only contain '_', '-', numbers and alphabets. I know I can use a SQL server CLR function for that. However, I am trying to implement that validation using a scalar UDF and could make very little here...I can use 'NOT LIKE', but I am not sure how to make sure I validate the string irrespective of the order of characters or in other words write a pattern in SQL for this. Am I better off using a SQL CLR function? Any help will be appreciated..

我需要检查SQL server表中的列值(string)是否以一个小字母开头,只能包含“_”、“-”、数字和字母。我知道我可以使用SQL server CLR函数。但是,我正在尝试使用一个标量UDF来实现这个验证,并且在这里可以做的非常少……我可以使用“NOT LIKE”,但我不确定如何确保验证字符串,而不考虑字符的顺序,或者换句话说,为它编写一个SQL模式。我最好使用SQL CLR函数吗?如有任何帮助,我们将不胜感激。

Thanks in advance

谢谢提前

Thank you everyone for their comments. This morning, I chose to go CLR function way. For the purpose of what I was trying to achieve, I created one CLR function which does the validation of an input string and have that called from a SQL UDF and It works well.

谢谢大家的意见。今天早上,我选择了CLR函数的方式。为了实现我要实现的目的,我创建了一个CLR函数,它负责对输入字符串进行验证,并从SQL UDF调用它,它工作得很好。

Just to measure the performance of t-SQL UDF using SQL CLR function vs t- SQL UDF, I created a SQL CLR function which will just check if the input string contains only small letters, it should return true else false and have that called from a UDF (IsLowerCaseCLR). After that I also created a regular t-SQL UDF(IsLowerCaseTSQL) which does the same thing using the 'NOT LIKE'. Then I created a table (Person) with columns Name(varchar) and IsValid(bit) columns and populate that with names to test.

为了使用SQL CLR函数和t-SQL UDF衡量t-SQL UDF的性能,我创建了一个SQL CLR函数,它只检查输入字符串是否只包含小字母,它应该返回true else false,并从UDF (IsLowerCaseCLR)调用它。之后,我还创建了一个常规的t-SQL UDF(IsLowerCaseTSQL),它使用“NOT LIKE”执行相同的操作。然后我用列名(varchar)和IsValid(bit)列创建了一个表(Person),并用要测试的名称填充它。

Data :- 1000 records with 'Ashish' as value for Name column 1000 records with 'ashish' as value for Name column

数据:- 1000记录,名称列为“Ashish”,名称列为“Ashish”

then I ran the following :- UPDATE Person Set IsValid=1 WHERE dbo.IsLowerCaseTSQL (Name) Above updated 1000 records (with Isvalid=1) and took less than a second.

然后我运行了以下命令:- UPDATE Person Set IsValid=1,其中dbo。IsLowerCaseTSQL(名称)更新了1000条记录(Isvalid=1),用时不到一秒。

I deleted all the data in the table and repopulated the same with same data. Then updated the same table using Sql CLR UDF (with Isvalid=1) and this took 3 seconds!

我删除了表中的所有数据,并用相同的数据重新填充相同的数据。然后使用Sql CLR UDF (Isvalid=1)更新同一个表,这花费了3秒!

If update happens for 5000 records, regular UDF takes 0 seconds compared to CLR UDF which takes 16 seconds!

如果对5000条记录进行更新,普通的UDF需要0秒,而CLR UDF需要16秒!

I am very less knowledgeable on t-SQL regular expression or I could have tested my actual more complex validation criteria. But I just wanted to know, even I could have written that, would that have been faster than the SQL CLR function considering the example above. Are we using SQL CLR because we can implement we can implement lot richer logic which would have been difficult otherwise If we write in regular SQL.

我对t-SQL正则表达式的了解非常少,或者我可以测试实际更复杂的验证标准。但我只是想知道,如果考虑到上面的例子,它会比SQL CLR函数更快。我们使用SQL CLR是因为我们可以实现更丰富的逻辑,否则如果我们用常规SQL编写,就很难实现这些逻辑。

Sorry for this long post. I just want to know from the experts. Please feel free to ask if you could not understand anything here.

很抱歉这么长时间发邮件。我只是想从专家那里知道。如果你不明白这里的意思,请尽管问。

Thank you again for your time.

再次感谢您的时间。

2 个解决方案

#1


4  

WHERE
    ASCII(LEFT(column, 1)) BETWEEN ASCII('a') AND ASCII('z')
    AND
    column COLLATE LATIN1_GENERAL_BIN NOT LIKE '%[^-_a-zA-Z0-9]%'

You need COLLATE to ignore accents (ä à ö etc) by default

默认情况下,您需要COLLATE来忽略重音(a a a o等)

#2


2  

CLR is faster than UDF - for this situation I would be using CLR to allow me to run regular expressions for comparisons. But PATINDEX supports limited regex syntax, so you could use:

CLR比UDF快——在这种情况下,我将使用CLR来运行正则表达式进行比较。但是PATINDEX支持有限的regex语法,因此您可以使用:

WHERE PATINDEX('%[regex]%', t.column) > 0

...to return rows that satisfy the expression, because PATINDEX returns a number based on the first position in the string it is testing. If the value is zero, the regex isn't in the string.

…返回满足表达式的行,因为PATINDEX根据它正在测试的字符串中的第一个位置返回一个数字。如果值为0,则regex不在字符串中。

#1


4  

WHERE
    ASCII(LEFT(column, 1)) BETWEEN ASCII('a') AND ASCII('z')
    AND
    column COLLATE LATIN1_GENERAL_BIN NOT LIKE '%[^-_a-zA-Z0-9]%'

You need COLLATE to ignore accents (ä à ö etc) by default

默认情况下,您需要COLLATE来忽略重音(a a a o等)

#2


2  

CLR is faster than UDF - for this situation I would be using CLR to allow me to run regular expressions for comparisons. But PATINDEX supports limited regex syntax, so you could use:

CLR比UDF快——在这种情况下,我将使用CLR来运行正则表达式进行比较。但是PATINDEX支持有限的regex语法,因此您可以使用:

WHERE PATINDEX('%[regex]%', t.column) > 0

...to return rows that satisfy the expression, because PATINDEX returns a number based on the first position in the string it is testing. If the value is zero, the regex isn't in the string.

…返回满足表达式的行,因为PATINDEX根据它正在测试的字符串中的第一个位置返回一个数字。如果值为0,则regex不在字符串中。