Servlet输出特殊字符乱码问题解析百分号符号显示异常的解决方案与编码技巧及常见错误处理方法

引言

在Java Web开发中，Servlet作为核心技术之一，经常需要处理各种文本输出。然而，在处理特殊字符（如百分号%）时，开发者常常会遇到乱码或显示异常的问题。这类问题不仅影响用户体验，还可能导致功能异常。本文将深入分析Servlet输出特殊字符乱码的原因，特别是百分号符号显示异常的问题，并提供实用的解决方案、编码技巧以及常见错误处理方法。

Servlet字符编码基础

字符编码概念

字符编码是将字符集中的字符编码为指定集合中某一对象（例如比特模式、自然数序列、8位组或者电脉冲），以便文本在计算机中存储和通过通信网络传递。常见的字符编码包括ASCII、ISO-8859-1、GB2312、UTF-8等。

Servlet中的编码类型

在Servlet中，主要涉及以下几种编码：

请求编码：客户端发送请求时使用的编码，可以通过request.setCharacterEncoding()设置。
响应编码：服务器返回响应时使用的编码，可以通过response.setCharacterEncoding()设置。
JVM默认编码：Java虚拟机使用的默认编码，可以通过System.getProperty("file.encoding")获取。
容器编码：Web容器（如Tomcat）使用的编码，通常在配置文件中设置。

Servlet处理字符的流程

当Servlet处理字符输出时，通常经历以下流程：

获取字符串数据（可能是从数据库、文件或用户输入）
在内存中处理字符串（Java内部使用UTF-16编码）
将字符串写入响应输出流（转换为指定的响应编码）
容器将响应发送给客户端（可能再次转换编码）

在这个过程中，任何一个环节的编码设置不当，都可能导致字符显示异常。

百分号符号显示异常的原因分析

百分号(%)在Web应用中具有特殊含义，它是URL编码中的转义字符。在URL编码中，百分号用于表示特殊字符的编码形式，例如空格被编码为”%20”。因此，当Servlet输出百分号时，可能会被误解为URL编码的开始，从而导致显示异常。

原因一：URL编码自动转换

Servlet容器在处理URL和参数时，会自动进行URL解码。如果百分号没有被正确编码，容器可能会尝试将其后面的两个字符作为十六进制数进行解码，导致显示异常。

例如，如果输出字符串中包含”%20”，容器会将其解码为空格，而不是显示原样的”%20”。

原因二：响应内容类型设置不当

如果响应的内容类型(Content-Type)没有正确设置字符编码，浏览器可能会使用错误的编码来解析响应内容，导致特殊字符显示异常。

例如，如果响应头设置为Content-Type: text/html而没有指定编码，浏览器可能会使用默认编码（如ISO-8859-1）来解析内容，而该编码不支持某些特殊字符。

原因三：多次编码或解码

在某些情况下，开发者可能会对字符串进行多次编码或解码，导致原始字符被错误处理。例如，先对百分号进行URL编码，然后又进行HTML编码，或者反过来。

原因四：不同组件间的编码不一致

在复杂的Web应用中，不同的组件（如Servlet、过滤器、JSP、JavaScript等）可能使用不同的编码处理字符，导致数据在传递过程中被错误转换。

解决方案

方案一：正确设置响应编码

在Servlet中，应该明确设置响应的字符编码，确保浏览器使用正确的编码解析内容。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // 设置响应内容类型和编码 response.setContentType("text/html;charset=UTF-8"); // 或者分别设置 // response.setContentType("text/html"); // response.setCharacterEncoding("UTF-8"); PrintWriter out = response.getWriter(); out.println("<html>"); out.println("<head><title>特殊字符测试</title></head>"); out.println("<body>"); out.println("百分号示例: %20 %25 %40"); out.println("</body>"); out.println("</html>"); }

方案二：使用URL编码处理百分号

如果需要在URL或表单数据中传递百分号，应该使用URL编码将其转换为”%25”。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String originalString = "折扣率: 50%"; // 对百分号进行URL编码 String encodedString = URLEncoder.encode(originalString, "UTF-8"); out.println("<html>"); out.println("<head><title>URL编码示例</title></head>"); out.println("<body>"); out.println("原始字符串: " + originalString + "<br>"); out.println("URL编码后: " + encodedString + "<br>"); // 在URL中使用编码后的字符串 out.println("<a href='nextPage?data=" + encodedString + "'>链接</a>"); out.println("</body>"); out.println("</html>"); }

在接收端，需要使用URLDecoder进行解码：

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String data = request.getParameter("data"); // 对参数进行URL解码 String decodedString = URLDecoder.decode(data, "UTF-8"); out.println("<html>"); out.println("<head><title>URL解码示例</title></head>"); out.println("<body>"); out.println("编码后的参数: " + data + "<br>"); out.println("解码后的字符串: " + decodedString + "<br>"); out.println("</body>"); out.println("</html>"); }

方案三：使用HTML实体编码

在HTML内容中，可以使用HTML实体编码来表示特殊字符。百分号的HTML实体是%或%。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String originalString = "折扣率: 50%"; // 对百分号进行HTML实体编码 String htmlEncodedString = originalString.replace("%", "&#37;"); out.println("<html>"); out.println("<head><title>HTML实体编码示例</title></head>"); out.println("<body>"); out.println("原始字符串: " + originalString + "<br>"); out.println("HTML实体编码后: " + htmlEncodedString + "<br>"); out.println("</body>"); out.println("</html>"); }

方案四：使用JavaScript处理

如果需要在客户端JavaScript中处理包含百分号的字符串，可以使用JavaScript的编码函数。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); out.println("<html>"); out.println("<head><title>JavaScript编码示例</title></head>"); out.println("<body>"); out.println("<div id='display'></div>"); out.println("<script>"); out.println("var str = '折扣率: 50%';"); out.println("// 使用encodeURIComponent进行编码"); out.println("var encodedStr = encodeURIComponent(str);"); out.println("document.getElementById('display').innerHTML = '原始字符串: ' + str + '<br>';"); out.println("document.getElementById('display').innerHTML += '编码后: ' + encodedStr + '<br>';"); out.println("// 使用decodeURIComponent进行解码"); out.println("var decodedStr = decodeURIComponent(encodedStr);"); out.println("document.getElementById('display').innerHTML += '解码后: ' + decodedStr + '<br>';"); out.println("</script>"); out.println("</body>"); out.println("</html>"); }

方案五：使用过滤器统一设置编码

为了避免在每个Servlet中都设置编码，可以使用过滤器统一设置请求和响应的编码。

@WebFilter("/*") public class EncodingFilter implements Filter { private String encoding = "UTF-8"; public void init(FilterConfig filterConfig) throws ServletException { String encodingParam = filterConfig.getInitParameter("encoding"); if (encodingParam != null) { encoding = encodingParam; } } public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { request.setCharacterEncoding(encoding); response.setCharacterEncoding(encoding); response.setContentType("text/html;charset=" + encoding); chain.doFilter(request, response); } public void destroy() { // 清理代码 } }

在web.xml中配置过滤器：

<filter> <filter-name>EncodingFilter</filter-name> <filter-class>com.example.EncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> </filter> <filter-mapping> <filter-name>EncodingFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping>

编码技巧

技巧一：统一使用UTF-8编码

在整个应用中统一使用UTF-8编码，可以避免大多数编码问题。UTF-8是一种可变长度的Unicode编码，能够表示世界上大多数语言的字符。

// 在Servlet中统一使用UTF-8 request.setCharacterEncoding("UTF-8"); response.setCharacterEncoding("UTF-8"); response.setContentType("text/html;charset=UTF-8"); // 在数据库连接中使用UTF-8 String url = "jdbc:mysql://localhost:3306/mydb?useUnicode=true&characterEncoding=UTF-8"; Connection conn = DriverManager.getConnection(url, username, password); // 在JSP页面中设置UTF-8 <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

技巧二：使用Java 7+的try-with-resources处理流

在处理输出流时，使用try-with-resources可以确保流被正确关闭，避免资源泄漏。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); // 使用try-with-resources确保PrintWriter被正确关闭 try (PrintWriter out = response.getWriter()) { out.println("<html>"); out.println("<head><title>资源管理示例</title></head>"); out.println("<body>"); out.println("折扣率: 50%"); out.println("</body>"); out.println("</html>"); } }

技巧三：使用StringBuilder处理大量字符串拼接

当需要输出大量包含特殊字符的字符串时，使用StringBuilder可以提高性能。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); StringBuilder sb = new StringBuilder(); sb.append("<html>"); sb.append("<head><title>StringBuilder示例</title></head>"); sb.append("<body>"); // 添加大量包含百分号的内容 for (int i = 0; i < 100; i++) { sb.append("折扣率: ").append(i).append("%<br>"); } sb.append("</body>"); sb.append("</html>"); out.println(sb.toString()); }

技巧四：使用工具类处理编码

创建一个工具类，封装常用的编码处理方法，提高代码复用性。

public class EncodingUtils { private static final String DEFAULT_ENCODING = "UTF-8"; /** * URL编码 */ public static String urlEncode(String str) { try { return URLEncoder.encode(str, DEFAULT_ENCODING); } catch (UnsupportedEncodingException e) { throw new RuntimeException("编码失败: " + e.getMessage(), e); } } /** * URL解码 */ public static String urlDecode(String str) { try { return URLDecoder.decode(str, DEFAULT_ENCODING); } catch (UnsupportedEncodingException e) { throw new RuntimeException("解码失败: " + e.getMessage(), e); } } /** * HTML实体编码 */ public static String htmlEncode(String str) { if (str == null) { return null; } StringBuilder sb = new StringBuilder(); for (int i = 0; i < str.length(); i++) { char c = str.charAt(i); switch (c) { case '%': sb.append("&#37;"); break; case '<': sb.append("&lt;"); break; case '>': sb.append("&gt;"); break; case '&': sb.append("&amp;"); break; case '"': sb.append("&quot;"); break; case ''': sb.append("&apos;"); break; default: sb.append(c); } } return sb.toString(); } /** * 处理百分号，根据上下文选择适当的编码方式 */ public static String handlePercentSign(String str, boolean forUrl) { if (str == null) { return null; } if (forUrl) { // 用于URL或参数中 return str.replace("%", "%25"); } else { // 用于HTML内容中 return str.replace("%", "&#37;"); } } }

在Servlet中使用工具类：

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; out.println("<html>"); out.println("<head><title>工具类示例</title></head>"); out.println("<body>"); // 在HTML内容中使用 out.println("HTML内容: " + EncodingUtils.htmlEncode(discount) + "<br>"); // 在URL中使用 String urlParam = EncodingUtils.handlePercentSign(discount, true); out.println("<a href='nextPage?discount=" + urlParam + "'>链接</a><br>"); // 直接使用工具类方法 out.println("URL编码: " + EncodingUtils.urlEncode(discount) + "<br>"); out.println("</body>"); out.println("</html>"); }

技巧五：使用JSTL和EL处理特殊字符

在JSP页面中，可以使用JSTL和EL表达式来处理特殊字符，避免直接在Java代码中处理。

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> <%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %> <%@ taglib prefix="fn" uri="http://java.sun.com/jsp/jstl/functions" %> <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>JSTL示例</title> </head> <body> <% // 设置一个包含百分号的属性 request.setAttribute("discount", "折扣率: 50%"); %> <!-- 使用c:out标签自动进行HTML转义 --> <p>使用c:out: <c:out value="${discount}" /></p> <!-- 使用fn:escapeXml函数进行XML转义 --> <p>使用fn:escapeXml: ${fn:escapeXml(discount)}</p> <!-- 在URL中使用 --> <c:url value="nextPage" var="nextUrl"> <c:param name="discount" value="${discount}" /> </c:url> <p><a href="${nextUrl}">链接</a></p> </body> </html>

常见错误处理方法

错误一：未设置响应编码

问题描述：没有设置响应的字符编码，导致浏览器使用默认编码解析内容，特殊字符显示为乱码。

错误代码：

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // 缺少编码设置 PrintWriter out = response.getWriter(); out.println("折扣率: 50%"); // 可能显示乱码 }

解决方法：始终设置响应的字符编码。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // 设置响应编码 response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); out.println("折扣率: 50%"); // 正确显示 }

错误二：混淆URL编码和HTML编码

问题描述：在HTML内容中使用URL编码，或在URL参数中使用HTML编码，导致字符显示异常。

错误代码：

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; // 错误：在HTML内容中使用URL编码 String wrongEncoded = URLEncoder.encode(discount, "UTF-8"); out.println("<html>"); out.println("<head><title>错误示例</title></head>"); out.println("<body>"); out.println(wrongEncoded); // 显示为编码后的字符串，而不是原始内容 out.println("</body>"); out.println("</html>"); }

解决方法：根据上下文选择适当的编码方式。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; // 正确：在HTML内容中使用HTML实体编码 String htmlEncoded = discount.replace("%", "&#37;"); // 正确：在URL参数中使用URL编码 String urlEncoded = URLEncoder.encode(discount, "UTF-8"); out.println("<html>"); out.println("<head><title>正确示例</title></head>"); out.println("<body>"); out.println("HTML内容: " + htmlEncoded + "<br>"); out.println("<a href='nextPage?discount=" + urlEncoded + "'>链接</a>"); out.println("</body>"); out.println("</html>"); }

错误三：多次编码或解码

问题描述：对字符串进行多次编码或解码，导致原始内容被错误处理。

错误代码：

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; // 错误：多次URL编码 String encodedOnce = URLEncoder.encode(discount, "UTF-8"); String encodedTwice = URLEncoder.encode(encodedOnce, "UTF-8"); out.println("<html>"); out.println("<head><title>多次编码错误</title></head>"); out.println("<body>"); out.println("编码一次: " + encodedOnce + "<br>"); out.println("编码两次: " + encodedTwice + "<br>"); // 错误的结果 out.println("</body>"); out.println("</html>"); }

解决方法：确保每个字符串只编码一次，只解码一次。

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; // 正确：只编码一次 String encoded = URLEncoder.encode(discount, "UTF-8"); // 模拟接收端解码 String decoded = URLDecoder.decode(encoded, "UTF-8"); out.println("<html>"); out.println("<head><title>正确编码解码</title></head>"); out.println("<body>"); out.println("原始字符串: " + discount + "<br>"); out.println("编码后: " + encoded + "<br>"); out.println("解码后: " + decoded + "<br>"); out.println("</body>"); out.println("</html>"); }

错误四：不一致的编码设置

问题描述：在应用的不同部分使用不同的编码，导致数据在传递过程中被错误转换。

错误代码：

// Servlet中使用UTF-8 protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; // 错误：使用ISO-8859-1编码 byte[] bytes = discount.getBytes("ISO-8859-1"); String wrongString = new String(bytes, "ISO-8859-1"); out.println("<html>"); out.println("<head><title>编码不一致错误</title></head>"); out.println("<body>"); out.println(wrongString); // 可能显示乱码 out.println("</body>"); out.println("</html>"); }

解决方法：在整个应用中统一使用UTF-8编码。

// Servlet中使用UTF-8 protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); String discount = "折扣率: 50%"; // 正确：使用UTF-8编码 byte[] bytes = discount.getBytes("UTF-8"); String correctString = new String(bytes, "UTF-8"); out.println("<html>"); out.println("<head><title>编码一致</title></head>"); out.println("<body>"); out.println(correctString); // 正确显示 out.println("</body>"); out.println("</html>"); }

错误五：忽略容器配置

问题描述：忽略Web容器（如Tomcat）的编码配置，导致请求参数被错误解析。

解决方法：在容器的配置文件中设置URIEncoding。

对于Tomcat，在server.xml中配置：

<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" />

或者在context.xml中配置：

<Context> <Parameter name="URIEncoding" value="UTF-8" override="false"/> </Context>

总结

在Servlet中处理特殊字符，特别是百分号符号时，需要注意以下几点：

统一编码：在整个应用中统一使用UTF-8编码，包括Servlet、JSP、数据库连接和容器配置。
正确设置响应编码：在Servlet中始终设置响应的字符编码，确保浏览器使用正确的编码解析内容。
区分编码类型：根据使用场景选择适当的编码方式，URL参数使用URL编码，HTML内容使用HTML实体编码。
避免多次编码：确保每个字符串只编码一次，只解码一次，避免多次编码或解码导致的问题。
使用工具类：创建工具类封装常用的编码处理方法，提高代码复用性和一致性。
使用过滤器：通过过滤器统一设置请求和响应的编码，避免在每个Servlet中重复设置。
注意容器配置：在Web容器的配置文件中设置正确的编码，确保请求参数被正确解析。

通过遵循以上原则和技巧，可以有效解决Servlet输出特殊字符乱码问题，特别是百分号符号显示异常的情况，提高Web应用的稳定性和用户体验。