Published on Aug 24 2012 in Java Tomcat

To avoid Java/Tomcat unicode issues after moving to a new environment you need to verify locale settings, especially LC_ALL.

After migrating a complete Tomcat based site as cPanel tarball to another host we lost ability to download files containing Unicode characters in their names. These were mostly static resources - images.

Any try to access a file resulted in 404 Not found error and related entry in localhost_access_log:

GET /images/396596_%E5%BC%A0%E6%97%A5%E6%B4%B2%E5%8C%97%E9%A9%AC.jpg HTTP/1.1" 404 1174

Access to a file not containing UTF-8 charactes in its filename in the same directory was successful.

As everything was working on the old host and Tomcat was just copied (and not freshly setup) it was not the common issue of missing connector attribute URIEncoding="UTF-8" in Tomcat's server.xml that gives similar effects. Finally, we went down to comparing all (sorted) Java system properties (diff -Nu oldhost newhost) and discovered file.encoding and sun.jnu.encoding mismatch. As these properties are set based on LC_ALL variable from system environment, we checked locale on the problematic account:

    user@tomcat [~]# locale

Well, that's not what we expected. And here is a quick JSP test (list.jsp - see code below) that shows all files in ROOT/images directory and gives links so that we could quickly test accessibility. It also displays file.encoding and sun.jnu.encoding. The bad values were shown here.

    396596_.jpg /home/tomcat/tomcat/webapps/ROOT/images/396596_.jpg --> exists? false

Acessing the image link results in Tomcat's message below. Here you can see a series of EF-BD-EF.

    type Status report
    description The requested resource (/396596_%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.jpg) is not available.

Appending -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 to JAVA_OPTS does not help. You can also verify it with java ShowSystemProperties | grep encoding (code below). The working solution is to add export LC_ALL="en_US.UTF-8" to environment (e.g. in ~/.bashrc), relogin or reread environment, check locale output and restart Tomcat. Setting LANG instead of LC_ALL seems to work fine too. Now list.jsp reports:

396596_.jpg home/tomcat/tomcat/webapps/ROOT/images/396596_.jpg --> exists? true

and the image displays correctly when link clicked.

lc_all java utf
Contents of list.jsp:

<%@page import="*" %>
<%@page contentType="text/html;charset=UTF-8"%>
<% ServletContext servletContext = getServletContext();
String contextPath = servletContext.getRealPath(File.separator);
contextPath = contextPath + "/images";
out.println("file.encoding=" + System.getProperty("file.encoding") + "</br>");
out.println("sun.jnu.encoding=" + System.getProperty("sun.jnu.encoding") + "</br>");
out.println("" + System.getProperty("") + "</br>");
out.println("file.encoding.pkg=" + System.getProperty("file.encoding.pkg") + "</br>");
File f = new File(contextPath);
String[] children = f.list();
if (children != null) {
     for (int i=0; i<children.length; i++) {
     String filename = children[i];
     out.print("<a href='" +filename + "'>" + filename + "</a>&nbsp;");
     File cf = new File(contextPath + "/" + filename);
     out.println(cf.getAbsolutePath() + " --> <b>exists?</b> " + cf.exists() + "</br>");
} %>

Contents of

import java.util.Properties; 
    public class ShowSystemProperties {
    public static void main(String args[]) {
    // Get all system properties
    Properties props = System.getProperties(); 
    //Properties props = System.getProperties();