如何从MySQL的字符串中删除所有非字母数字字符?

时间:2021-04-09 09:33:29

I'm working on a routine that compares strings, but for better efficiency I need to remove all characters that are not letters or numbers.

我正在研究一个比较字符串的例程,但是为了提高效率,我需要删除所有非字母或数字的字符。

I'm using multiple REPLACE functions now, but maybe there is a faster and nicer solution ?

我现在用了多个替换函数,但也许有一个更快更好的解决方案?

17 个解决方案

#1


70  

None of these answers worked for me. I had to create my own function called alphanum which stripped the chars for me:

这些答案对我都不起作用。我必须创造我自己的功能,叫做字母显影,它为我去除字符:

DROP FUNCTION IF EXISTS alphanum; 
DELIMITER | 
CREATE FUNCTION alphanum( str CHAR(255) ) RETURNS CHAR(255) DETERMINISTIC
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(255) DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN 
        SET ret=CONCAT(ret,c); 
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  RETURN ret; 
END | 
DELIMITER ; 

Now I can do:

现在我能做的:

select 'This works finally!', alphanum('This works finally!');

and I get:

我得到:

+---------------------+---------------------------------+
| This works finally! | alphanum('This works finally!') |
+---------------------+---------------------------------+
| This works finally! | Thisworksfinally                |
+---------------------+---------------------------------+
1 row in set (0.00 sec)

Hurray!

华友世纪!

#2


18  

From a performance point of view, (and on the assumption that you read more than you write)

从性能的角度来看,(假设你读的比你写的多)

I think the best way would be to pre calculate and store a stripped version of the column, This way you do the transform less.

我认为最好的方法是预计算并存储列的一个简化版本,这样就可以减少转换。

You can then put an index on the new column and get the database to do the work for you.

然后,您可以在新列上放置一个索引,并让数据库为您完成这项工作。

#3


12  

SELECT teststring REGEXP '[[:alnum:]]+';

SELECT * FROM testtable WHERE test REGEXP '[[:alnum:]]+'; 

See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Scroll down to the section that says: [:character_class:]

请参见:http://dev.mysql.com/doc/refman/5.1/en/regexp.html向下滚动到显示:[:character_class:]的部分

If you want to manipulate strings the fastest way will be to use a str_udf, see:
https://github.com/hholzgra/mysql-udf-regexp

如果您希望操作字符串,最快的方法是使用str_udf,请参见:https://github.com/hholzgra/mysql-udf-regexp

#4


6  

Based on the answer by Ryan Shillington, modified to work with strings longer than 255 characters and preserving spaces from the original string.

根据Ryan Shillington的回答,修改为可以处理长度超过255个字符的字符串并保留原始字符串中的空格。

FYI there is lower(str) in the end.

FYI在末端有较低的(str)。

I used this to compare strings:

我用这个来比较弦

DROP FUNCTION IF EXISTS spacealphanum;
DELIMITER $$
CREATE FUNCTION `spacealphanum`( str TEXT ) RETURNS TEXT CHARSET utf8
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret TEXT DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN 
        SET ret=CONCAT(ret,c); 
      ELSEIF  c = ' ' THEN
          SET ret=CONCAT(ret," ");
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  SET ret = lower(ret);
  RETURN ret; 
  END $$
  DELIMITER ;

#5


5  

The fastest way I was able to find (and using ) is with convert().

我能找到(和使用)的最快方法是使用convert()。

from Doc. CONVERT() with USING is used to convert data between different character sets.

从医生。转换()使用用于在不同字符集之间转换数据。

Example:

例子:

convert(string USING ascii)

In your case the right character set will be self defined

在这种情况下,正确的字符集将是自定义的

NOTE from Doc. The USING form of CONVERT() is available as of 4.1.0.

注意医生。CONVERT()的使用形式从4.1.0开始可用。

#6


4  

Be careful, characters like ’ or » are considered as alpha by MySQL. It better to use something like :

注意,像'或'或' '这样的字符被MySQL视为alpha。最好使用以下内容:

IF c BETWEEN 'a' AND 'z' OR c BETWEEN 'A' AND 'Z' OR c BETWEEN '0' AND '9' OR c = '-' THEN

如果c在a和z之间或者c在a和z之间或者c在0和9之间或者c = '-'之间

#7


4  

I have written this UDF. However, it only trims special characters at the beginning of the string. It also converts the string to lower case. You can update this function if desired.

我写了这个UDF。但是,它只在字符串的开头修饰特殊字符。它还将字符串转换为小写。如果需要,可以更新这个函数。

DELIMITER //

DROP FUNCTION IF EXISTS DELETE_DOUBLE_SPACES//

CREATE FUNCTION DELETE_DOUBLE_SPACES ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    DECLARE result VARCHAR(250);
    SET result = REPLACE( title, '  ', ' ' );
    WHILE (result <> title) DO 
        SET title = result;
        SET result = REPLACE( title, '  ', ' ' );
    END WHILE;
    RETURN result;
END//

DROP FUNCTION IF EXISTS LFILTER//

CREATE FUNCTION LFILTER ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    WHILE (1=1) DO
        IF(  ASCII(title) BETWEEN ASCII('a') AND ASCII('z')
            OR ASCII(title) BETWEEN ASCII('A') AND ASCII('Z')
            OR ASCII(title) BETWEEN ASCII('0') AND ASCII('9')
        ) THEN
            SET title = LOWER( title );
            SET title = REPLACE(
                REPLACE(
                    REPLACE(
                        title,
                        CHAR(10), ' '
                    ),
                    CHAR(13), ' '
                ) ,
                CHAR(9), ' '
            );
            SET title = DELETE_DOUBLE_SPACES( title );
            RETURN title;
        ELSE
            SET title = SUBSTRING( title, 2 );          
        END IF;
    END WHILE;
END//
DELIMITER ;

SELECT LFILTER(' !@#$%^&*()_+1a    b');

Also, you could use regular expressions but this requires installing a MySql extension.

另外,您可以使用正则表达式,但这需要安装一个MySql扩展。

#8


2  

Straight and battletested solution for latin and cyrillic characters:

拉丁和西里尔字母的直接和战场测试解决方案:

DELIMITER //

CREATE FUNCTION `remove_non_numeric_and_letters`(input TEXT)
  RETURNS TEXT
  BEGIN
    DECLARE output TEXT DEFAULT '';
    DECLARE iterator INT DEFAULT 1;
    WHILE iterator < (LENGTH(input) + 1) DO
      IF SUBSTRING(input, iterator, 1) IN
         ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я')
      THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
    END WHILE;
    RETURN output;
  END //

DELIMITER ;

Usage:

用法:

-- outputs "hello12356"
SELECT remove_non_numeric_and_letters('hello - 12356-привет ""]')

#9


1  

I had a similar problem with trying to match last names in our database that were slightly different. For example, sometimes people entered the same person's name as "McDonald" and also as "Mc Donald", or "St John" and "St. John".

我也遇到过类似的问题,试图匹配数据库中稍微不同的姓。例如,有时人们把同一个人的名字输入“麦当劳”和“McDonald”,或者“St John”和“St John”。

Instead of trying to convert the Mysql data, I solved the problem by creating a function (in PHP) that would take a string and create an alpha-only regular expression:

我没有尝试转换Mysql数据,而是通过创建一个函数(PHP)来解决这个问题,这个函数将使用一个字符串,并创建一个只包含字母的正则表达式:

function alpha_only_regex($str) {
    $alpha_only = str_split(preg_replace('/[^A-Z]/i', '', $str));
    return '^[^a-zA-Z]*'.implode('[^a-zA-Z]*', $alpha_only).'[^a-zA-Z]*$';
}

Now I can search the database with a query like this:

现在我可以用这样的查询来搜索数据库:

$lastname_regex = alpha_only_regex($lastname);
$query = "SELECT * FROM my_table WHERE lastname REGEXP '$lastname_regex';

#10


1  

So far, the only alternative approach less complicated than the other answers here is to determine the full set of special characters of the column, i.e. all the special characters that are in use in that column at the moment, and then do a sequential replace of all those characters, e.g.

到目前为止,唯一的替代方法比另一种更复杂的答案是确定的全套特殊字符的列,即所有的特殊字符,也应用于这一列,然后做一个顺序替换的字符,如:

update pages set slug = lower(replace(replace(replace(replace(name, ' ', ''), '-', ''), '.', ''), '&', '')); # replacing just space, -, ., & only

.

This is only advisable on a known set of data, otherwise it's trivial for some special characters to slip past with a blacklist approach instead of a whitelist approach.

这只适用于已知的数据集,否则一些特殊字符将通过黑名单而不是白名单方法。

Obviously, the simplest way is to pre-validate the data outside of sql due to the lack of robust built-in whitelisting (e.g. via a regex replace).

显然,最简单的方法是预先验证sql之外的数据,因为缺乏健壮的内置白名单(例如通过regex替换)。

#11


1  

This can be done with a regular expression replacer function I posted in another answer and have blogged about here. It may not be the most efficient solution possible and might look overkill for the job in hand - but like a Swiss army knife, it may come in useful for other reasons.

这可以通过我在另一个答案中发布的正则表达式replacer函数来实现,我在这里写了博客。这可能不是最有效的解决方案,也可能看起来有些过头了——但就像瑞士军刀一样,它可能因为其他原因而有用。

It can be seen in action removing all non-alphanumeric characters in this Rextester online demo.

可以看到在这个Rextester在线演示中移除所有非字母数字字符的动作。

SQL (excluding the function code for brevity):

SQL(不包括简短的函数代码):

SELECT txt,
       reg_replace(txt,
                   '[^a-zA-Z0-9]+',
                   '',
                   TRUE,
                   0,
                   0
                   ) AS `reg_replaced`
FROM test;

#12


0  

Probably a silly suggestion compared to others:

和其他人相比,这可能是一个愚蠢的建议:

if(!preg_match("/^[a-zA-Z0-9]$/",$string)){
    $sortedString=preg_replace("/^[a-zA-Z0-9]+$/","",$string);
}

#13


0  

I needed to get only alphabetic characters of a string in a procedure, and did:

在一个过程中,我只需要获得字符串的字母字符,并且做到了:

SET @source = "whatever you want";
SET @target = '';
SET @i = 1;
SET @len = LENGTH(@source);
WHILE @i <= @len DO
    SET @char = SUBSTRING(@source, @i, 1);
    IF ((ORD(@char) >= 65 && ORD(@char) <= 90) || (ORD(@char) >= 97 && ORD(@char) <= 122)) THEN
        SET @target = CONCAT(@target, @char);
    END IF;
    SET @i = @i + 1;
END WHILE;

#14


0  

I tried a few solutions but at the end used replace. My data set is part numbers and I fairly know what to expect. But just for sanity, I used PHP to build the long query:

我尝试了一些解决方案,但最终还是用了replace。我的数据集是零件号,我很清楚会发生什么。但为了保持清醒,我用PHP构建了长查询:

$dirty = array(' ', '-', '.', ',', ':', '?', '/', '!', '&', '@');
$query = 'part_no';
foreach ($dirty as $dirt) {
    $query = "replace($query,'$dirt','')";
}
echo $query;

This outputs something I used to get a headache from:

这输出了一些我曾经头疼的东西:

replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(part_no,' ',''),'-',''),'.',''),',',''),':',''),'?',''),'/',''),'!',''),'&',''),'@','')

#15


0  

if you are using php then....

如果你是使用php,那么....

try{
$con = new PDO ("mysql:host=localhost;dbname=dbasename","root","");
}
catch(PDOException $e){
echo "error".$e-getMessage();   
}

$select = $con->prepare("SELECT * FROM table");
$select->setFetchMode(PDO::FETCH_ASSOC);
$select->execute();

while($data=$select->fetch()){ 

$id = $data['id'];
$column = $data['column'];
$column = preg_replace("/[^a-zA-Z0-9]+/", " ", $column); //remove all special characters

$update = $con->prepare("UPDATE table SET column=:column WHERE id='$id'");
$update->bindParam(':column', $column );
$update->execute();

// echo $column."<br>";
} 

#16


0  

Needed to replace non-alphanumeric characters rather than remove non-alphanumeric characters so I have created this based on Ryan Shillington's alphanum. Works for strings up to 255 characters in length

需要替换非字母数字字符而不是删除非字母数字字符,所以我基于Ryan Shillington的字母数字创建了这个字符。适用于长度可达255个字符的字符串

DROP FUNCTION IF EXISTS alphanumreplace; 
DELIMITER | 
CREATE FUNCTION alphanumreplace( str CHAR(255), d CHAR(32) ) RETURNS CHAR(255) 
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(32) DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN SET ret=CONCAT(ret,c); 
      ELSE SET ret=CONCAT(ret,d);
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  RETURN ret; 
END | 
DELIMITER ; 

Example:

例子:

select 'hello world!',alphanum('hello world!'),alphanumreplace('hello world!','-');
+--------------+--------------------------+-------------------------------------+
| hello world! | alphanum('hello world!') | alphanumreplace('hello world!','-') |
+--------------+--------------------------+-------------------------------------+
| hello world! | helloworld               | hello-world-                        |
+--------------+--------------------------+-------------------------------------+

You'll need to add the alphanum function seperately if you want that, I just have it here for the example.

你需要分别添加字母函数如果你想要的话,我在这里举个例子。

#17


-1  

the alphanum function (self answered) have a bug, but I don't know why. For text "cas synt ls 75W140 1L" return "cassyntls75W1401", "L" from the end is missing some how.

字母函数(self - answers)有一个bug,但我不知道为什么。对于文本“syntls75W140 1L”返回“cassyntls75W1401”,结尾的“L”缺少了一些方法。

Now I use

现在我使用

delimiter //
DROP FUNCTION IF EXISTS alphanum //
CREATE FUNCTION alphanum(prm_strInput varchar(255))
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE v_char VARCHAR(1);
  DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';
WHILE (i <= LENGTH(prm_strInput) )  DO
  SET v_char = SUBSTR(prm_strInput,i,1);
  IF v_char REGEXP  '^[A-Za-z0-9]+$' THEN 
        SET v_parseStr = CONCAT(v_parseStr,v_char);  
  END IF;
  SET i = i + 1;
END WHILE;
RETURN trim(v_parseStr);
END
//

(found on google)

(在谷歌上发现)

#1


70  

None of these answers worked for me. I had to create my own function called alphanum which stripped the chars for me:

这些答案对我都不起作用。我必须创造我自己的功能,叫做字母显影,它为我去除字符:

DROP FUNCTION IF EXISTS alphanum; 
DELIMITER | 
CREATE FUNCTION alphanum( str CHAR(255) ) RETURNS CHAR(255) DETERMINISTIC
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(255) DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN 
        SET ret=CONCAT(ret,c); 
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  RETURN ret; 
END | 
DELIMITER ; 

Now I can do:

现在我能做的:

select 'This works finally!', alphanum('This works finally!');

and I get:

我得到:

+---------------------+---------------------------------+
| This works finally! | alphanum('This works finally!') |
+---------------------+---------------------------------+
| This works finally! | Thisworksfinally                |
+---------------------+---------------------------------+
1 row in set (0.00 sec)

Hurray!

华友世纪!

#2


18  

From a performance point of view, (and on the assumption that you read more than you write)

从性能的角度来看,(假设你读的比你写的多)

I think the best way would be to pre calculate and store a stripped version of the column, This way you do the transform less.

我认为最好的方法是预计算并存储列的一个简化版本,这样就可以减少转换。

You can then put an index on the new column and get the database to do the work for you.

然后,您可以在新列上放置一个索引,并让数据库为您完成这项工作。

#3


12  

SELECT teststring REGEXP '[[:alnum:]]+';

SELECT * FROM testtable WHERE test REGEXP '[[:alnum:]]+'; 

See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Scroll down to the section that says: [:character_class:]

请参见:http://dev.mysql.com/doc/refman/5.1/en/regexp.html向下滚动到显示:[:character_class:]的部分

If you want to manipulate strings the fastest way will be to use a str_udf, see:
https://github.com/hholzgra/mysql-udf-regexp

如果您希望操作字符串,最快的方法是使用str_udf,请参见:https://github.com/hholzgra/mysql-udf-regexp

#4


6  

Based on the answer by Ryan Shillington, modified to work with strings longer than 255 characters and preserving spaces from the original string.

根据Ryan Shillington的回答,修改为可以处理长度超过255个字符的字符串并保留原始字符串中的空格。

FYI there is lower(str) in the end.

FYI在末端有较低的(str)。

I used this to compare strings:

我用这个来比较弦

DROP FUNCTION IF EXISTS spacealphanum;
DELIMITER $$
CREATE FUNCTION `spacealphanum`( str TEXT ) RETURNS TEXT CHARSET utf8
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret TEXT DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN 
        SET ret=CONCAT(ret,c); 
      ELSEIF  c = ' ' THEN
          SET ret=CONCAT(ret," ");
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  SET ret = lower(ret);
  RETURN ret; 
  END $$
  DELIMITER ;

#5


5  

The fastest way I was able to find (and using ) is with convert().

我能找到(和使用)的最快方法是使用convert()。

from Doc. CONVERT() with USING is used to convert data between different character sets.

从医生。转换()使用用于在不同字符集之间转换数据。

Example:

例子:

convert(string USING ascii)

In your case the right character set will be self defined

在这种情况下,正确的字符集将是自定义的

NOTE from Doc. The USING form of CONVERT() is available as of 4.1.0.

注意医生。CONVERT()的使用形式从4.1.0开始可用。

#6


4  

Be careful, characters like ’ or » are considered as alpha by MySQL. It better to use something like :

注意,像'或'或' '这样的字符被MySQL视为alpha。最好使用以下内容:

IF c BETWEEN 'a' AND 'z' OR c BETWEEN 'A' AND 'Z' OR c BETWEEN '0' AND '9' OR c = '-' THEN

如果c在a和z之间或者c在a和z之间或者c在0和9之间或者c = '-'之间

#7


4  

I have written this UDF. However, it only trims special characters at the beginning of the string. It also converts the string to lower case. You can update this function if desired.

我写了这个UDF。但是,它只在字符串的开头修饰特殊字符。它还将字符串转换为小写。如果需要,可以更新这个函数。

DELIMITER //

DROP FUNCTION IF EXISTS DELETE_DOUBLE_SPACES//

CREATE FUNCTION DELETE_DOUBLE_SPACES ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    DECLARE result VARCHAR(250);
    SET result = REPLACE( title, '  ', ' ' );
    WHILE (result <> title) DO 
        SET title = result;
        SET result = REPLACE( title, '  ', ' ' );
    END WHILE;
    RETURN result;
END//

DROP FUNCTION IF EXISTS LFILTER//

CREATE FUNCTION LFILTER ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    WHILE (1=1) DO
        IF(  ASCII(title) BETWEEN ASCII('a') AND ASCII('z')
            OR ASCII(title) BETWEEN ASCII('A') AND ASCII('Z')
            OR ASCII(title) BETWEEN ASCII('0') AND ASCII('9')
        ) THEN
            SET title = LOWER( title );
            SET title = REPLACE(
                REPLACE(
                    REPLACE(
                        title,
                        CHAR(10), ' '
                    ),
                    CHAR(13), ' '
                ) ,
                CHAR(9), ' '
            );
            SET title = DELETE_DOUBLE_SPACES( title );
            RETURN title;
        ELSE
            SET title = SUBSTRING( title, 2 );          
        END IF;
    END WHILE;
END//
DELIMITER ;

SELECT LFILTER(' !@#$%^&*()_+1a    b');

Also, you could use regular expressions but this requires installing a MySql extension.

另外,您可以使用正则表达式,但这需要安装一个MySql扩展。

#8


2  

Straight and battletested solution for latin and cyrillic characters:

拉丁和西里尔字母的直接和战场测试解决方案:

DELIMITER //

CREATE FUNCTION `remove_non_numeric_and_letters`(input TEXT)
  RETURNS TEXT
  BEGIN
    DECLARE output TEXT DEFAULT '';
    DECLARE iterator INT DEFAULT 1;
    WHILE iterator < (LENGTH(input) + 1) DO
      IF SUBSTRING(input, iterator, 1) IN
         ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я')
      THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
    END WHILE;
    RETURN output;
  END //

DELIMITER ;

Usage:

用法:

-- outputs "hello12356"
SELECT remove_non_numeric_and_letters('hello - 12356-привет ""]')

#9


1  

I had a similar problem with trying to match last names in our database that were slightly different. For example, sometimes people entered the same person's name as "McDonald" and also as "Mc Donald", or "St John" and "St. John".

我也遇到过类似的问题,试图匹配数据库中稍微不同的姓。例如,有时人们把同一个人的名字输入“麦当劳”和“McDonald”,或者“St John”和“St John”。

Instead of trying to convert the Mysql data, I solved the problem by creating a function (in PHP) that would take a string and create an alpha-only regular expression:

我没有尝试转换Mysql数据,而是通过创建一个函数(PHP)来解决这个问题,这个函数将使用一个字符串,并创建一个只包含字母的正则表达式:

function alpha_only_regex($str) {
    $alpha_only = str_split(preg_replace('/[^A-Z]/i', '', $str));
    return '^[^a-zA-Z]*'.implode('[^a-zA-Z]*', $alpha_only).'[^a-zA-Z]*$';
}

Now I can search the database with a query like this:

现在我可以用这样的查询来搜索数据库:

$lastname_regex = alpha_only_regex($lastname);
$query = "SELECT * FROM my_table WHERE lastname REGEXP '$lastname_regex';

#10


1  

So far, the only alternative approach less complicated than the other answers here is to determine the full set of special characters of the column, i.e. all the special characters that are in use in that column at the moment, and then do a sequential replace of all those characters, e.g.

到目前为止,唯一的替代方法比另一种更复杂的答案是确定的全套特殊字符的列,即所有的特殊字符,也应用于这一列,然后做一个顺序替换的字符,如:

update pages set slug = lower(replace(replace(replace(replace(name, ' ', ''), '-', ''), '.', ''), '&', '')); # replacing just space, -, ., & only

.

This is only advisable on a known set of data, otherwise it's trivial for some special characters to slip past with a blacklist approach instead of a whitelist approach.

这只适用于已知的数据集,否则一些特殊字符将通过黑名单而不是白名单方法。

Obviously, the simplest way is to pre-validate the data outside of sql due to the lack of robust built-in whitelisting (e.g. via a regex replace).

显然,最简单的方法是预先验证sql之外的数据,因为缺乏健壮的内置白名单(例如通过regex替换)。

#11


1  

This can be done with a regular expression replacer function I posted in another answer and have blogged about here. It may not be the most efficient solution possible and might look overkill for the job in hand - but like a Swiss army knife, it may come in useful for other reasons.

这可以通过我在另一个答案中发布的正则表达式replacer函数来实现,我在这里写了博客。这可能不是最有效的解决方案,也可能看起来有些过头了——但就像瑞士军刀一样,它可能因为其他原因而有用。

It can be seen in action removing all non-alphanumeric characters in this Rextester online demo.

可以看到在这个Rextester在线演示中移除所有非字母数字字符的动作。

SQL (excluding the function code for brevity):

SQL(不包括简短的函数代码):

SELECT txt,
       reg_replace(txt,
                   '[^a-zA-Z0-9]+',
                   '',
                   TRUE,
                   0,
                   0
                   ) AS `reg_replaced`
FROM test;

#12


0  

Probably a silly suggestion compared to others:

和其他人相比,这可能是一个愚蠢的建议:

if(!preg_match("/^[a-zA-Z0-9]$/",$string)){
    $sortedString=preg_replace("/^[a-zA-Z0-9]+$/","",$string);
}

#13


0  

I needed to get only alphabetic characters of a string in a procedure, and did:

在一个过程中,我只需要获得字符串的字母字符,并且做到了:

SET @source = "whatever you want";
SET @target = '';
SET @i = 1;
SET @len = LENGTH(@source);
WHILE @i <= @len DO
    SET @char = SUBSTRING(@source, @i, 1);
    IF ((ORD(@char) >= 65 && ORD(@char) <= 90) || (ORD(@char) >= 97 && ORD(@char) <= 122)) THEN
        SET @target = CONCAT(@target, @char);
    END IF;
    SET @i = @i + 1;
END WHILE;

#14


0  

I tried a few solutions but at the end used replace. My data set is part numbers and I fairly know what to expect. But just for sanity, I used PHP to build the long query:

我尝试了一些解决方案,但最终还是用了replace。我的数据集是零件号,我很清楚会发生什么。但为了保持清醒,我用PHP构建了长查询:

$dirty = array(' ', '-', '.', ',', ':', '?', '/', '!', '&', '@');
$query = 'part_no';
foreach ($dirty as $dirt) {
    $query = "replace($query,'$dirt','')";
}
echo $query;

This outputs something I used to get a headache from:

这输出了一些我曾经头疼的东西:

replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(part_no,' ',''),'-',''),'.',''),',',''),':',''),'?',''),'/',''),'!',''),'&',''),'@','')

#15


0  

if you are using php then....

如果你是使用php,那么....

try{
$con = new PDO ("mysql:host=localhost;dbname=dbasename","root","");
}
catch(PDOException $e){
echo "error".$e-getMessage();   
}

$select = $con->prepare("SELECT * FROM table");
$select->setFetchMode(PDO::FETCH_ASSOC);
$select->execute();

while($data=$select->fetch()){ 

$id = $data['id'];
$column = $data['column'];
$column = preg_replace("/[^a-zA-Z0-9]+/", " ", $column); //remove all special characters

$update = $con->prepare("UPDATE table SET column=:column WHERE id='$id'");
$update->bindParam(':column', $column );
$update->execute();

// echo $column."<br>";
} 

#16


0  

Needed to replace non-alphanumeric characters rather than remove non-alphanumeric characters so I have created this based on Ryan Shillington's alphanum. Works for strings up to 255 characters in length

需要替换非字母数字字符而不是删除非字母数字字符,所以我基于Ryan Shillington的字母数字创建了这个字符。适用于长度可达255个字符的字符串

DROP FUNCTION IF EXISTS alphanumreplace; 
DELIMITER | 
CREATE FUNCTION alphanumreplace( str CHAR(255), d CHAR(32) ) RETURNS CHAR(255) 
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(32) DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN SET ret=CONCAT(ret,c); 
      ELSE SET ret=CONCAT(ret,d);
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  RETURN ret; 
END | 
DELIMITER ; 

Example:

例子:

select 'hello world!',alphanum('hello world!'),alphanumreplace('hello world!','-');
+--------------+--------------------------+-------------------------------------+
| hello world! | alphanum('hello world!') | alphanumreplace('hello world!','-') |
+--------------+--------------------------+-------------------------------------+
| hello world! | helloworld               | hello-world-                        |
+--------------+--------------------------+-------------------------------------+

You'll need to add the alphanum function seperately if you want that, I just have it here for the example.

你需要分别添加字母函数如果你想要的话,我在这里举个例子。

#17


-1  

the alphanum function (self answered) have a bug, but I don't know why. For text "cas synt ls 75W140 1L" return "cassyntls75W1401", "L" from the end is missing some how.

字母函数(self - answers)有一个bug,但我不知道为什么。对于文本“syntls75W140 1L”返回“cassyntls75W1401”,结尾的“L”缺少了一些方法。

Now I use

现在我使用

delimiter //
DROP FUNCTION IF EXISTS alphanum //
CREATE FUNCTION alphanum(prm_strInput varchar(255))
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE v_char VARCHAR(1);
  DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';
WHILE (i <= LENGTH(prm_strInput) )  DO
  SET v_char = SUBSTR(prm_strInput,i,1);
  IF v_char REGEXP  '^[A-Za-z0-9]+$' THEN 
        SET v_parseStr = CONCAT(v_parseStr,v_char);  
  END IF;
  SET i = i + 1;
END WHILE;
RETURN trim(v_parseStr);
END
//

(found on google)

(在谷歌上发现)