Sunday, June 17, 2012

How to Enable UTF-8 Support on Tomcat

Overview:
This is a brief article on enabling the support for displaying UTF-8 characters (e.i: Japanese font) on JSP/HTML pages of a web applications on Tomcat. We can achieve this with THREE easy steps.

To make UTF-8 working under Java, Tomcat, Linux/Windows, it requires the following:
  1. Update Tomcat's server.xml
  2. Define a javax.servlet.Filter and Update the Web Application's web.xml
  3. Enable UTF-8 encoding on JSP/HTML
Update Tomcat's server.xml
This handles GET request URL. With this configuration, the Connector uses UTF-8 encoding to handle all incoming GET request parameters.

 <Connector   
         . . .  
         URIEncoding="UTF-8"/>  

 http://localhost:8080/foo-app/get?foo_param=こんにちは世界  

e.i: request.getParameter("foo_param") // the value retrieved will be encoded with UTF-8 and you'll get the UTF-8 value as it is("こんにちは世界").

IMPORTANT NOTE: POST requests will have NO effect by this change.

Define a javax.servlet.Filter and Update the Web Application's web.xml
Now, we need to enforce our web application to handle all requests and response in terms of UTF-8 encoding. This way, we are handling POST requests as well. For this purpose, we need to define a character set filter that'll transform all the requests and response into UTF-8 encoding in the following manner.

 package org.fazlan.tomcat.ext.filter;
  
 import javax.servlet.Filter;  
 import javax.servlet.FilterChain;  
 import javax.servlet.FilterConfig;  
 import javax.servlet.ServletException;  
 import javax.servlet.ServletRequest;  
 import javax.servlet.ServletResponse;  
 import java.io.IOException;  

 /***  
  * This is a filter class to force the java webapp to handle all requests and responses as UTF-8 encoded by default.  
  * This requires that we define a character set filter.  
  * This filter makes sure that if the browser hasn't set the encoding used in the request, that it's set to UTF-8.  
  */  
 public class CharacterSetFilter implements Filter {  

   private static final String UTF8 = "UTF-8";  
   private static final String CONTENT_TYPE = "text/html; charset=UTF-8";  
   private String encoding;  

   @Override  
   public void init(FilterConfig config) throws ServletException {  
     encoding = config.getInitParameter("requestCharEncoding");  
     if (encoding == null) {  
       encoding = UTF8;  
     }  
   }  

   @Override  
   public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {  
     // Honour the client-specified character encoding  
     if (null == request.getCharacterEncoding()) {  
       request.setCharacterEncoding(encoding);  
     }  
     /**  
      * Set the default response content type and encoding  
      */  
     response.setContentType(CONTENT_TYPE);  
     response.setCharacterEncoding(UTF8);  
     chain.doFilter(request, response);  
   }  

   @Override  
   public void destroy() {  
   }  
 }  

The filter ensures that if the browser has not set the encoding format in the request, UTF-8 is set as the default encoding. Also, it sets UTF-8 as the default response encoding.

Now, we need to add this to our web application's web.xml to make it work.
 . . .
 <filter>  
   <filter-name>CharacterSetFilter</filter-name>  
   <filter-class>org.fazlan.tomcat.ext.filter.CharacterSetFilter</filter-class>  
   <init-param>  
     <param-name>requestEncoding</param-name>  
     <param-value>UTF-8</param-value>  
   </init-param>  
 </filter>  
 <filter-mapping>  
   <filter-name>CharacterSetFilter</filter-name>  
   <url-pattern>/*</url-pattern>  
 </filter-mapping>  
 . . .

Enable UTF-8 encoding on JSP/HTML
JSP Pages
All JSP pages that needs to render UTF-8 content needs to have the following on top the page declaration.

 <%@ page contentType="text/html;charset=UTF-8" language="java" pageEncoding="UTF-8" %>  

HTML Pages 
All HTML pages that needs to render UTF-8 content needs to have the following in their header section.

 <?xml version="1.0" encoding="UTF-8"?>  
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">  
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">  
 <head>  
 <meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />  
 ...  
 </head>  

Summary: 
The article looked at how to support UTF-8 content in your web application deployed on Tomcat.

No comments:

Post a Comment