正则表达式帮助 - 匹配任何不在列表中的URL参数和值

时间:2021-10-31 08:14:46

Thank you for looking at this!


I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:


Raw URL:


My List of Know Parameters:



Resulting (Cleaned up) URL:



Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.


Thank you so much for the help!!!


Solution Based on Feedback Below:


pageURL = window.location.pathname + window.location.search;

pageURL = window.location.pathname + window.location.search;

knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';

knownParams ='knownParamA | knownParamB | knownParamC | knownParamD';

var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');

var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:'+ knownParams +')(?==))[^ =] + = [^&] *','gi');

var urlCleanerRegexStep2 = new RegExp('?&', '');

var urlCleanerRegexStep2 = new RegExp('?&','');

cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');

cleanPageURL = pageURL.replace(urlCleanerRegexStep1,“”).replace(urlCleanerRegexStep2,'?$ 1');

1 个解决方案



Negative searches are tricky, and require zero-width lookaheads.


This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)


step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:


clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

You can chain these together, of course:


clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
  replace(/[?&]([^=]+=[^&]*)/, '?$1');

Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.


pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');

To help you interpret these regexes:


  • [?&] = either ? or &
  • [?&] =要么?要么 &

  • (...) = captured group
  • (...)=被捕获的组

  • (?!...) = not followed by a match for this group
  • (?!...)=后面没有匹配这个组

  • (?:...) = uncaptured group
  • (?:...)=未被捕获的组

  • (?=...) = followed by a match for this group
  • (?= ...)=后跟该组的匹配

  • = = =
  • = = =

  • [^=] = any character other than =
  • [^ =] = =以外的任何字符

  • + = one or more times
  • + =一次或多次

  • [^&] = any character other than &
  • [^&] =除&之外的任何字符

  • * = zero or more times
  • * =零次或多次

Outside the regex body,


  • The g flag means 'all matches' (as opposed to only the first)
  • g标志表示“所有匹配”(而不是仅与第一个匹配)

  • The i flag means 'case-insensitive'
  • i标志意味着'不区分大小写'

  • In the replacement string, $1 means 'captured group 1'
  • 在替换字符串中,$ 1表示“捕获的组1”



Negative searches are tricky, and require zero-width lookaheads.


This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)


step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:


clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"

You can chain these together, of course:


clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
  replace(/[?&]([^=]+=[^&]*)/, '?$1');

Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.


pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');

To help you interpret these regexes:


  • [?&] = either ? or &
  • [?&] =要么?要么 &

  • (...) = captured group
  • (...)=被捕获的组

  • (?!...) = not followed by a match for this group
  • (?!...)=后面没有匹配这个组

  • (?:...) = uncaptured group
  • (?:...)=未被捕获的组

  • (?=...) = followed by a match for this group
  • (?= ...)=后跟该组的匹配

  • = = =
  • = = =

  • [^=] = any character other than =
  • [^ =] = =以外的任何字符

  • + = one or more times
  • + =一次或多次

  • [^&] = any character other than &
  • [^&] =除&之外的任何字符

  • * = zero or more times
  • * =零次或多次

Outside the regex body,


  • The g flag means 'all matches' (as opposed to only the first)
  • g标志表示“所有匹配”(而不是仅与第一个匹配)

  • The i flag means 'case-insensitive'
  • i标志意味着'不区分大小写'

  • In the replacement string, $1 means 'captured group 1'
  • 在替换字符串中,$ 1表示“捕获的组1”