T-SQL删除所有非字符和非数字字符

时间:2022-02-05 07:54:14

Is there a smarter way to remove all special characters rather than having a series of about 15 nested replace statements?

是否有一种更聪明的方法来删除所有特殊字符,而不是使用一系列15个嵌套的替换语句?

The following works, but only handles three characters (ampersand, blank and period).

下面的工作,但只处理三个字符(&,空格和句号)。

select CustomerID, CustomerName, 
   Replace(Replace(Replace(CustomerName,'&',''),' ',''),'.','') as CustomerNameStripped
from Customer 

5 个解决方案

#1


14  

One flexible-ish way;

一个flexible-ish方式;

ALTER FUNCTION [dbo].[fnRemovePatternFromString](@BUFFER VARCHAR(MAX), @PATTERN VARCHAR(128)) RETURNS VARCHAR(MAX) AS
BEGIN
    DECLARE @POS INT = PATINDEX(@PATTERN, @BUFFER)
    WHILE @POS > 0 BEGIN
        SET @BUFFER = STUFF(@BUFFER, @POS, 1, '')
        SET @POS = PATINDEX(@PATTERN, @BUFFER)
    END
    RETURN @BUFFER
END

select dbo.fnRemovePatternFromString('cake & beer $3.99!?c', '%[$&.!?]%')

(No column name)
cake  beer 399c

#2


7  

Create a function:

创建一个函数:

CREATE FUNCTION dbo.StripNonAlphaNumerics
(
  @s VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
  DECLARE @p INT = 1, @n VARCHAR(255) = '';
  WHILE @p <= LEN(@s)
  BEGIN
    IF SUBSTRING(@s, @p, 1) LIKE '[A-Za-z0-9]'
    BEGIN
      SET @n += SUBSTRING(@s, @p, 1);
    END 
    SET @p += 1;
  END
  RETURN(@n);
END
GO

Then:

然后:

SELECT Result = dbo.StripNonAlphaNumerics
('My Customer''s dog & #1 friend are dope, yo!');

Results:

结果:

Result
------
MyCustomersdog1friendaredopeyo

To make it more flexible, you could pass in the pattern you want to allow:

为了使它更灵活,您可以传入您希望允许的模式:

CREATE FUNCTION dbo.StripNonAlphaNumerics
(
  @s VARCHAR(255),
  @pattern VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
  DECLARE @p INT = 1, @n VARCHAR(255) = '';
  WHILE @p <= LEN(@s)
  BEGIN
    IF SUBSTRING(@s, @p, 1) LIKE @pattern
    BEGIN
      SET @n += SUBSTRING(@s, @p, 1);
    END 
    SET @p += 1;
  END
  RETURN(@n);
END
GO

Then:

然后:

SELECT r = dbo.StripNonAlphaNumerics
('Bob''s dog & #1 friend are dope, yo!', '[A-Za-z0-9]');

Results:

结果:

r
------
Bobsdog1friendaredopeyo

#3


6  

I faced this problem several years ago, so I wrote a SQL function to do the trick. Here is the original article (was used to scrape text out of HTML). I have since updated the function, as follows:

几年前我遇到了这个问题,所以我编写了一个SQL函数来解决这个问题。这是原始文章(用于从HTML中提取文本)。我已经更新了功能如下:

IF (object_id('dbo.fn_CleanString') IS NOT NULL)
BEGIN
  PRINT 'Dropping: dbo.fn_CleanString'
  DROP function dbo.fn_CleanString
END
GO
PRINT 'Creating: dbo.fn_CleanString'
GO
CREATE FUNCTION dbo.fn_CleanString 
(
  @string varchar(8000)
) 
returns varchar(8000)
AS
BEGIN
---------------------------------------------------------------------------------------------------
-- Title:        CleanString
-- Date Created: March 26, 2011
-- Author:       William McEvoy
--               
-- Description:  This function removes special ascii characters from a string.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


declare @char        char(1),
        @len         int,
        @count       int,
        @newstring   varchar(8000),
        @replacement char(1)

select  @count       = 1,
        @len         = 0,
        @newstring   = '',
        @replacement = ' '



---------------------------------------------------------------------------------------------------
-- M A I N   P R O C E S S I N G
---------------------------------------------------------------------------------------------------


-- Remove Backspace characters
select @string = replace(@string,char(8),@replacement)

-- Remove Tabs
select @string = replace(@string,char(9),@replacement)

-- Remove line feed
select @string = replace(@string,char(10),@replacement)

-- Remove carriage return
select @string = replace(@string,char(13),@replacement)


-- Condense multiple spaces into a single space
-- This works by changing all double spaces to be OX where O = a space, and X = a special character
-- then all occurrences of XO are changed to O,
-- then all occurrences of X  are changed to nothing, leaving just the O which is actually a single space
select @string = replace(replace(replace(ltrim(rtrim(@string)),'  ', ' ' + char(7)),char(7)+' ',''),char(7),'')


--  Parse each character, remove non alpha-numeric

select @len = len(@string)

WHILE (@count <= @len)
BEGIN

  -- Examine the character
  select @char = substring(@string,@count,1)


  IF (@char like '[a-z]') or (@char like '[A-Z]') or (@char like '[0-9]')
    select @newstring = @newstring + @char
  ELSE
    select @newstring = @newstring + @replacement

  select @count = @count + 1

END


return @newstring
END

GO
IF (object_id('dbo.fn_CleanString') IS NOT NULL)
  PRINT 'Function created.'
ELSE
  PRINT 'Function NOT created.'
GO

#4


1  

If you can use SQL CLR you can use .NET regular expressions for this.

如果您可以使用SQL CLR,您可以使用。net正则表达式。

There is a third party (free) package that includes this and more - SQL Sharp .

有一个第三方(免费)包,包括这个和更多的SQL Sharp。

#5


1  

I know this is an old thread, but still, might be handy for others. Here's a quick and dirty (Which I've done inversely - stripping out non-numerics) - using a recursive CTE. What makes this one nice for me is that it's an inline function - so gets around the nasty RBAR effect of the usual scalar and table-valued functions. Adjust your filter as needs be to include or exclude whatever char types.

我知道这是一条老掉牙的线索,但对其他人来说还是很有用的。这里有一个快速的和肮脏的(我做过相反的-除去非数字)-使用一个递归的CTE。这个函数的优点在于它是一个内联函数,因此可以避开通常的标量函数和表值函数的RBAR效应。根据需要调整过滤器,以包含或排除任何char类型。

        Create Function fncV1_iStripAlphasFromData (
            @iString Varchar(max)
        )
        Returns 
        Table With Schemabinding
        As

            Return(

                with RawData as
                (
                    Select @iString as iString
                )
                ,
                Anchor as
                (

                    Select Case(IsNumeric (substring(iString, 1, 1))) when 1 then substring(iString, 1, 1) else '' End as oString, 2 as CharPos from RawData
                    UNION ALL
                    Select a.oString + Case(IsNumeric (substring(@iString, a.CharPos, 1))) when 1 then substring(@iString, a.CharPos, 1) else '' End, a.CharPos + 1
                    from RawData r
                    Inner Join Anchor a on a.CharPos <= len(rtrim(ltrim(@iString)))

                )

                Select top 1 oString from Anchor order by CharPos Desc

            )

Go

select * from dbo.fncV1_iStripAlphasFromData ('00000')
select * from dbo.fncV1_iStripAlphasFromData ('00A00')
select * from dbo.fncV1_iStripAlphasFromData ('12345ABC6789!&*0')

#1


14  

One flexible-ish way;

一个flexible-ish方式;

ALTER FUNCTION [dbo].[fnRemovePatternFromString](@BUFFER VARCHAR(MAX), @PATTERN VARCHAR(128)) RETURNS VARCHAR(MAX) AS
BEGIN
    DECLARE @POS INT = PATINDEX(@PATTERN, @BUFFER)
    WHILE @POS > 0 BEGIN
        SET @BUFFER = STUFF(@BUFFER, @POS, 1, '')
        SET @POS = PATINDEX(@PATTERN, @BUFFER)
    END
    RETURN @BUFFER
END

select dbo.fnRemovePatternFromString('cake & beer $3.99!?c', '%[$&.!?]%')

(No column name)
cake  beer 399c

#2


7  

Create a function:

创建一个函数:

CREATE FUNCTION dbo.StripNonAlphaNumerics
(
  @s VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
  DECLARE @p INT = 1, @n VARCHAR(255) = '';
  WHILE @p <= LEN(@s)
  BEGIN
    IF SUBSTRING(@s, @p, 1) LIKE '[A-Za-z0-9]'
    BEGIN
      SET @n += SUBSTRING(@s, @p, 1);
    END 
    SET @p += 1;
  END
  RETURN(@n);
END
GO

Then:

然后:

SELECT Result = dbo.StripNonAlphaNumerics
('My Customer''s dog & #1 friend are dope, yo!');

Results:

结果:

Result
------
MyCustomersdog1friendaredopeyo

To make it more flexible, you could pass in the pattern you want to allow:

为了使它更灵活,您可以传入您希望允许的模式:

CREATE FUNCTION dbo.StripNonAlphaNumerics
(
  @s VARCHAR(255),
  @pattern VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
  DECLARE @p INT = 1, @n VARCHAR(255) = '';
  WHILE @p <= LEN(@s)
  BEGIN
    IF SUBSTRING(@s, @p, 1) LIKE @pattern
    BEGIN
      SET @n += SUBSTRING(@s, @p, 1);
    END 
    SET @p += 1;
  END
  RETURN(@n);
END
GO

Then:

然后:

SELECT r = dbo.StripNonAlphaNumerics
('Bob''s dog & #1 friend are dope, yo!', '[A-Za-z0-9]');

Results:

结果:

r
------
Bobsdog1friendaredopeyo

#3


6  

I faced this problem several years ago, so I wrote a SQL function to do the trick. Here is the original article (was used to scrape text out of HTML). I have since updated the function, as follows:

几年前我遇到了这个问题,所以我编写了一个SQL函数来解决这个问题。这是原始文章(用于从HTML中提取文本)。我已经更新了功能如下:

IF (object_id('dbo.fn_CleanString') IS NOT NULL)
BEGIN
  PRINT 'Dropping: dbo.fn_CleanString'
  DROP function dbo.fn_CleanString
END
GO
PRINT 'Creating: dbo.fn_CleanString'
GO
CREATE FUNCTION dbo.fn_CleanString 
(
  @string varchar(8000)
) 
returns varchar(8000)
AS
BEGIN
---------------------------------------------------------------------------------------------------
-- Title:        CleanString
-- Date Created: March 26, 2011
-- Author:       William McEvoy
--               
-- Description:  This function removes special ascii characters from a string.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


declare @char        char(1),
        @len         int,
        @count       int,
        @newstring   varchar(8000),
        @replacement char(1)

select  @count       = 1,
        @len         = 0,
        @newstring   = '',
        @replacement = ' '



---------------------------------------------------------------------------------------------------
-- M A I N   P R O C E S S I N G
---------------------------------------------------------------------------------------------------


-- Remove Backspace characters
select @string = replace(@string,char(8),@replacement)

-- Remove Tabs
select @string = replace(@string,char(9),@replacement)

-- Remove line feed
select @string = replace(@string,char(10),@replacement)

-- Remove carriage return
select @string = replace(@string,char(13),@replacement)


-- Condense multiple spaces into a single space
-- This works by changing all double spaces to be OX where O = a space, and X = a special character
-- then all occurrences of XO are changed to O,
-- then all occurrences of X  are changed to nothing, leaving just the O which is actually a single space
select @string = replace(replace(replace(ltrim(rtrim(@string)),'  ', ' ' + char(7)),char(7)+' ',''),char(7),'')


--  Parse each character, remove non alpha-numeric

select @len = len(@string)

WHILE (@count <= @len)
BEGIN

  -- Examine the character
  select @char = substring(@string,@count,1)


  IF (@char like '[a-z]') or (@char like '[A-Z]') or (@char like '[0-9]')
    select @newstring = @newstring + @char
  ELSE
    select @newstring = @newstring + @replacement

  select @count = @count + 1

END


return @newstring
END

GO
IF (object_id('dbo.fn_CleanString') IS NOT NULL)
  PRINT 'Function created.'
ELSE
  PRINT 'Function NOT created.'
GO

#4


1  

If you can use SQL CLR you can use .NET regular expressions for this.

如果您可以使用SQL CLR,您可以使用。net正则表达式。

There is a third party (free) package that includes this and more - SQL Sharp .

有一个第三方(免费)包,包括这个和更多的SQL Sharp。

#5


1  

I know this is an old thread, but still, might be handy for others. Here's a quick and dirty (Which I've done inversely - stripping out non-numerics) - using a recursive CTE. What makes this one nice for me is that it's an inline function - so gets around the nasty RBAR effect of the usual scalar and table-valued functions. Adjust your filter as needs be to include or exclude whatever char types.

我知道这是一条老掉牙的线索,但对其他人来说还是很有用的。这里有一个快速的和肮脏的(我做过相反的-除去非数字)-使用一个递归的CTE。这个函数的优点在于它是一个内联函数,因此可以避开通常的标量函数和表值函数的RBAR效应。根据需要调整过滤器,以包含或排除任何char类型。

        Create Function fncV1_iStripAlphasFromData (
            @iString Varchar(max)
        )
        Returns 
        Table With Schemabinding
        As

            Return(

                with RawData as
                (
                    Select @iString as iString
                )
                ,
                Anchor as
                (

                    Select Case(IsNumeric (substring(iString, 1, 1))) when 1 then substring(iString, 1, 1) else '' End as oString, 2 as CharPos from RawData
                    UNION ALL
                    Select a.oString + Case(IsNumeric (substring(@iString, a.CharPos, 1))) when 1 then substring(@iString, a.CharPos, 1) else '' End, a.CharPos + 1
                    from RawData r
                    Inner Join Anchor a on a.CharPos <= len(rtrim(ltrim(@iString)))

                )

                Select top 1 oString from Anchor order by CharPos Desc

            )

Go

select * from dbo.fncV1_iStripAlphasFromData ('00000')
select * from dbo.fncV1_iStripAlphasFromData ('00A00')
select * from dbo.fncV1_iStripAlphasFromData ('12345ABC6789!&*0')