Character Encoding tomcat.

时间:2021-10-12 22:41:54

default character encoding of the request or response body:

  If a character encoding is not specified, the Servlet specification requires that an encoding of ISO-8859-1 is used.

  For JSP pages,The request character encoding handling is the same,for JSP pages in standard syntax the default response charset is the usual ISO-8859-1, but for the ones in XML syntax it is UTF-8.

Default encoding for GET

  Many browsers are starting to offer (default) options of encoding URIs using UTF-8 instead of ISO-8859-1.

  HTML 4.0 recommends the use of UTF-8 to encode the query string.

Default Encoding for POST

  ISO-8859-1 is defined as the default character set for HTTP request and response bodies in the servlet specification

change how GET parameters are interpreted

  Tomcat will use ISO-8859-1 as the default character encoding of the entire URL, including the query string ("GET parameters").

  There are two ways to specify how GET parameters are interpreted:

    Set the URIEncoding attribute on the <Connector> element in server.xml to something specific (e.g. URIEncoding="UTF-8").

    Set the useBodyEncodingForURI attribute on the <Connector> element in server.xml to true. This will cause the Connector to use the request body's encoding for GET parameters.

change how POST parameters are interpreted

  POST requests should specify the encoding of the parameters and values they send. Since many clients fail to set an explicit encoding, the default is used (ISO-8859-1).

  In many cases this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an example filter. Please take a look at:

5.x

 webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java

6.x

 webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java

5.5.36+, 6.0.36+, 7.x

Since 7.0.20 the filter became first-class citizen and was moved from the examples into core Tomcat and is available to any web application without the need to compile and bundle it separately. See documentation for the list of filters provided by Tomcat. The class name is:

 org.apache.catalina.filters.SetCharacterEncodingFilter

Note:

  The request encoding setting is effective only if it is done earlier than parameters are parsed. Once parsing happens, there is no way back.

  Parameters parsing is triggered by the first method that asks for parameter name or value.

  Make sure that the filter is positioned before any other filters that ask for request parameters.

  The positioning depends on the order of filter-mapping declarations in the WEB-INF/web.xml file, though since Servlet 3.0 specification there are additional options to control the order.

  To check the actual order you can throw an Exception from your page and check its stack trace for filter names.

What can you recommend to just make everything work? (How to use UTF-8 everywhere)

  Using UTF-8 as your character encoding for everything is a safe bet. This should work for pretty much every situation.

  In order to completely switch to using UTF-8, you need to make the following changes:

    1. Set URIEncoding="UTF-8" on your <Connector> in server.xml. References: HTTP ConnectorAJP Connector.

    2. Use a character encoding filter with the default encoding set to UTF-8

    3. Change all your JSPs to include charset name in their contentType.

      For example, use <%@page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents).

    4. Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8.

      Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8").

    5. Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
    6. Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. For more information see http://www.mail-archive.com/users@tomcat.apache.org/msg21117.html.

How can I test if my configuration will work correctly?

  The following sample JSP should work on a clean Tomcat install for any input. If you set the URIEncoding="UTF-8" on the connector, it will also work with method="GET".

<%@ page contentType="text/html; charset=UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Character encoding test page</title>
</head>
<body>
<p>Data posted to this form was:
<%
request.setCharacterEncoding("UTF-8");
out.print(request.getParameter("mydata"));
%> </p>
<form method="POST" action="index.jsp">
<input type="text" name="mydata">
<input type="submit" value="Submit" />
<input type="reset" value="Reset" />
</form>
</body>
</html>

How can I send higher characters in my HTTP headers?

  You have to encode them in some way before you insert them into a header. Using url-encoding (% + high byte number + low byte number) would be a good idea.

ref:  http://wiki.apache.org/tomcat/FAQ/CharacterEncoding