奇怪的JS正则之 /[A-z]/.test("\\"); // true

本文是在一个国外介绍JS的网站上转载过来的，作者很逗，先是举例JS让人XX的例子，再动手实践发现JS隐藏的黑知识。为什么 /[A-z]/.test("\\"); 是 true ，你看懂了么？

When I use regular expressions and I want to validate a range of letters, I can do it using a-z or A-Z. Even when I use A-z it works fine too. The problem comes doing some test:

  /[A-Z]/.test("A"); // true

  /[A-Z]/.test("b"); // false

  /[A-Z]/.test("Z"); // true

  /[A-Z]/.test("z"); // false

  /[a-z]/.test("a"); // true

  /[a-z]/.test("A"); // false

  /[a-z]/.test("z"); // true

  /[a-z]/.test("Z"); // false

The weird thing comes when I do this test:

  /[A-z]/.test("A"); // true

  /[A-z]/.test("a"); // true

  /[A-z]/.test("Z"); // true

  /[A-z]/.test("z"); // true

  /[A-z]/.test("m"); // true

  /[A-z]/.test("D"); // true

  /[A-z]/.test("\\"); // true WTF?

It's supposed to accept only letters from A to Z and a to z. Can someone explain this?

— @byoigres

I had a look into this with the following code:

  var re = /[A-z]/g,s=(function(){

    var f = String.fromCharCode;

    for(var i=0;i<6000;i++) f=f.bind(0, i);

    return f();

  })(),q,z=[];while((q=re.exec(s)) != null) z.push(q[0]);z

It returns

  ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",

  "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\", "]", "^",

  "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",

  "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]

It is likely, I think that A-z literally means 'any character between 'A' and 'z' in unicode code-point order, or at least charCode order. This allows (I think non-standard) statements like /[ -y]/g:

  var re = /[ -y]/g,s=(function(){

    var f = String.fromCharCode;

    for(var i=0;i<6000;i++) f=f.bind(0, i);

    return f();

  })(),q,z=[];while((q=re.exec(s)) != null) z.push(q[0]);z

Which returns

  [" ", "!", """, "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".",

  "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=",

  ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L",

  "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[",

  "\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",

  "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y"]`

This probably has some potential security implications because if you're using [A-z] to sanitise something, you'll accept []^_`

A very interesting find!

— zemnmez

wtfjs is free software. get the source on github.

原文完， A－z 我倒是知道是包括 A－Z和a-z 的，因为我记得 ASCII 里面是先大写字母再小写字母的，所以 A－z 包括大写和小写。只是为何

[A-z]/.test("\\");

也是 ture，这个真没有研究过呢，不过看完本文就懂了。因为在 ASCII 表中，Z 到 a 他俩不是接着的，中间还有6个常用字符：

"[", "\", "]", "^", "_", "`",

仔细看的话，还会发现 9 和 A 也不是连着的，所以下面的式子也会成立

[1-z]/.test("\@");

反正JS正则的［］中的字符序列是按照ASCII表来连续比对的。看完算是涨姿势了。

秒客网

奇怪的JS正则之 /[A-z]/.test("\\"); // true

相关文章