如何编码URL以避免Java中的特殊字符? [重复]

时间:2022-10-06 12:13:50

This question already has an answer here:

这个问题在这里已有答案:

i need java code to encode URL to avoid special characters such as spaces and % and & ...etc

我需要java代码来编码URL,以避免使用空格和%和&...等特殊字符

6 个解决方案

#1


64  

URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".

URL构造很棘手,因为URL的不同部分对允许的字符具有不同的规则:例如,加号在URL的查询组件中保留,因为它表示空格,但在URL的路径组件中,加号没有特殊含义,空格编码为“%20”。

RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).

RFC 2396解释(在2.4.2节中)完整的URL始终采用其编码形式:您获取各个组件的字符串(方案,权限,路径等),根据自己的规则对每个组件进行编码,然后将它们组合成完整的URL字符串。尝试构建一个完整的未编码的URL字符串,然后单独编码会导致细微的错误,例如路径中的空格被错误地更改为加号(符合RFC的服务器将解释为真实加号,而不是编码空格)。

In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.

在Java中,构建URL的正确方法是使用URI类。使用其中一个多参数构造函数将URL组件作为单独的字符串,它将根据该组件的规则正确地转义每个组件。 toASCIIString()方法为您提供了可以发送到服务器的正确转义和编码的字符串。要解码URL,请使用单字符串构造函数构造URI对象,然后使用访问器方法(如getPath())来检索已解码的组件。

Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).

不要使用URLEncoder类!尽管名称,该类实际上做HTML表单编码,而不是URL编码。连接未编码的字符串以生成“未编码”的URL然后通过URLEncoder传递它是不正确的。这样做会导致问题(特别是前面提到的关于空间和路径中的加号的问题)。

#2


10  

This is a duplicate of the below question. You may find more detailed information and discussion about this issue at the below question

这是以下问题的副本。您可以在以下问题中找到有关此问题的更多详细信息和讨论

HTTP URL Address Encoding in Java

Java中的HTTP URL地址编码

public class URLParamEncoder {

    public static String encode(String input) {
        StringBuilder resultStr = new StringBuilder();
        for (char ch : input.toCharArray()) {
            if (isUnsafe(ch)) {
                resultStr.append('%');
                resultStr.append(toHex(ch / 16));
                resultStr.append(toHex(ch % 16));
            } else {
                resultStr.append(ch);
            }
        }
        return resultStr.toString();
    }

    private static char toHex(int ch) {
        return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
    }

    private static boolean isUnsafe(char ch) {
        if (ch > 128 || ch < 0)
            return true;
        return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
    }

}

#3


5  

If you don't want to do it manually use Apache Commons - Codec library. The class you are looking at is: org.apache.commons.codec.net.URLCodec

如果您不想手动使用Apache Commons - Codec库。您正在查看的课程是:org.apache.commons.codec.net.URLCodec

String final url = "http://www.google.com?...."
String final urlSafe = org.apache.commons.codec.net.URLCodec.encode(url);

#4


1  

I would echo what Wyzard wrote but add that:

我会回应Wyzard所写的内容,但补充一点:

  • for query parameters, HTML encoding is often exactly what the server is expecting; outside these, it is correct that URLEncoder should not be used
  • 对于查询参数,HTML编码通常正是服务器所期望的;在这些之外,不应该使用URLEncoder是正确的

  • the most recent URI spec is RFC 3986, so you should refer to that as a primary source
  • 最新的URI规范是RFC 3986,因此您应该将其作为主要来源

I wrote a blog post a while back about this subject: Java: safe character handling and URL building

我不久前写了一篇关于这个主题的博客文章:Java:安全的字符处理和URL构建

#5


1  

I also spent quite some time with this issue, so that's my solution:

我也花了很长时间来解决这个问题,所以这是我的解决方案:

String urlString2Decode = "http://www.test.com/äüö/path with blanks/";
String decodedURL = URLDecoder.decode(urlString2Decode, "UTF-8");
URL url = new URL(decodedURL);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String decodedURLAsString = uri.toASCIIString();

#6


-1  

Here is my solution which is pretty easy:

这是我的解决方案,非常简单:

Instead of encoding the url itself i encoded the parameters that I was passing because the parameter was user input and the user could input any unexpected string of special characters so this worked for me fine :)

而不是编码网址本身我编码我传递的参数,因为参数是用户输入,用户可以输入任何意外的特殊字符串,所以这对我很好:)

String review="User input"; /*USER INPUT AS STRING THAT WILL BE PASSED AS PARAMTER TO URL*/
try {
    review = URLEncoder.encode(review,"utf-8");
    review = review.replace(" " , "+");
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}
String URL = "www.test.com/test.php"+"?user_review="+review;

#1


64  

URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".

URL构造很棘手,因为URL的不同部分对允许的字符具有不同的规则:例如,加号在URL的查询组件中保留,因为它表示空格,但在URL的路径组件中,加号没有特殊含义,空格编码为“%20”。

RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).

RFC 2396解释(在2.4.2节中)完整的URL始终采用其编码形式:您获取各个组件的字符串(方案,权限,路径等),根据自己的规则对每个组件进行编码,然后将它们组合成完整的URL字符串。尝试构建一个完整的未编码的URL字符串,然后单独编码会导致细微的错误,例如路径中的空格被错误地更改为加号(符合RFC的服务器将解释为真实加号,而不是编码空格)。

In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.

在Java中,构建URL的正确方法是使用URI类。使用其中一个多参数构造函数将URL组件作为单独的字符串,它将根据该组件的规则正确地转义每个组件。 toASCIIString()方法为您提供了可以发送到服务器的正确转义和编码的字符串。要解码URL,请使用单字符串构造函数构造URI对象,然后使用访问器方法(如getPath())来检索已解码的组件。

Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).

不要使用URLEncoder类!尽管名称,该类实际上做HTML表单编码,而不是URL编码。连接未编码的字符串以生成“未编码”的URL然后通过URLEncoder传递它是不正确的。这样做会导致问题(特别是前面提到的关于空间和路径中的加号的问题)。

#2


10  

This is a duplicate of the below question. You may find more detailed information and discussion about this issue at the below question

这是以下问题的副本。您可以在以下问题中找到有关此问题的更多详细信息和讨论

HTTP URL Address Encoding in Java

Java中的HTTP URL地址编码

public class URLParamEncoder {

    public static String encode(String input) {
        StringBuilder resultStr = new StringBuilder();
        for (char ch : input.toCharArray()) {
            if (isUnsafe(ch)) {
                resultStr.append('%');
                resultStr.append(toHex(ch / 16));
                resultStr.append(toHex(ch % 16));
            } else {
                resultStr.append(ch);
            }
        }
        return resultStr.toString();
    }

    private static char toHex(int ch) {
        return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
    }

    private static boolean isUnsafe(char ch) {
        if (ch > 128 || ch < 0)
            return true;
        return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
    }

}

#3


5  

If you don't want to do it manually use Apache Commons - Codec library. The class you are looking at is: org.apache.commons.codec.net.URLCodec

如果您不想手动使用Apache Commons - Codec库。您正在查看的课程是:org.apache.commons.codec.net.URLCodec

String final url = "http://www.google.com?...."
String final urlSafe = org.apache.commons.codec.net.URLCodec.encode(url);

#4


1  

I would echo what Wyzard wrote but add that:

我会回应Wyzard所写的内容,但补充一点:

  • for query parameters, HTML encoding is often exactly what the server is expecting; outside these, it is correct that URLEncoder should not be used
  • 对于查询参数,HTML编码通常正是服务器所期望的;在这些之外,不应该使用URLEncoder是正确的

  • the most recent URI spec is RFC 3986, so you should refer to that as a primary source
  • 最新的URI规范是RFC 3986,因此您应该将其作为主要来源

I wrote a blog post a while back about this subject: Java: safe character handling and URL building

我不久前写了一篇关于这个主题的博客文章:Java:安全的字符处理和URL构建

#5


1  

I also spent quite some time with this issue, so that's my solution:

我也花了很长时间来解决这个问题,所以这是我的解决方案:

String urlString2Decode = "http://www.test.com/äüö/path with blanks/";
String decodedURL = URLDecoder.decode(urlString2Decode, "UTF-8");
URL url = new URL(decodedURL);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String decodedURLAsString = uri.toASCIIString();

#6


-1  

Here is my solution which is pretty easy:

这是我的解决方案,非常简单:

Instead of encoding the url itself i encoded the parameters that I was passing because the parameter was user input and the user could input any unexpected string of special characters so this worked for me fine :)

而不是编码网址本身我编码我传递的参数,因为参数是用户输入,用户可以输入任何意外的特殊字符串,所以这对我很好:)

String review="User input"; /*USER INPUT AS STRING THAT WILL BE PASSED AS PARAMTER TO URL*/
try {
    review = URLEncoder.encode(review,"utf-8");
    review = review.replace(" " , "+");
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}
String URL = "www.test.com/test.php"+"?user_review="+review;