How locale setting can break unicode/UTF-8 in Java/Tomcat

To avoid Java/Tomcat unicode issues after moving to a new environment you need to verify locale settings, especially LC_ALL.

After migrating a complete Tomcat based site as cPanel tarball to another host we lost ability to download files containing unicode characters in their names. These were mostly static resources – images.

Any try to access a file resulted in 404 “Not found” error and related entry in localhost_access_log:

GET /images/396596_%E5%BC%A0%E6%97%A5%E6%B4%B2%E5%8C%97%E9%A9%AC.jpg HTTP/1.1" 404 1174

Access to a file not containing UTF-8 charactes in its filename in the same directory was successful.

As everything was working on the old host and Tomcat was just copied (and not freshly setup) it was not the common issue of missing connector attribute URIEncoding=”UTF-8″ in Tomcat’s server.xml that gives similar effects. Finally, we went down to comparing all (sorted) Java system properties (diff -Nu oldhost newhost) and discovered file.encoding and sun.jnu.encoding mismatch. As these properties are set based on LC_ALL variable from system environment, we checked locale on the problematic account:

user@tomcat [~]# locale
LANG=
LC_CTYPE="POSIX"
...
LC_IDENTIFICATION="POSIX"
LC_ALL=

Well, that’s not what we expected. And here is a quick JSP test (list.jsp – see code below) that shows all files in ROOT/images directory and gives links so that we could quickly test accessibility. It also displays file.encoding and sun.jnu.encoding. The bad values were shown here.

file.encoding=ANSI_X3.4-1968
sun.jnu.encoding=ANSI_X3.4-1968
sun.io.unicode.encoding=UnicodeLittle
file.encoding.pkg=sun.io
396596_���������������.jpg /home/tomcat/tomcat/webapps/ROOT/images/396596_���������������.jpg --> exists? false

Acessing the image link results in Tomcat’s message below. Here you can see a series of EF-BD-EF.

HTTP Status 404 - /396596_%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.jpg
type Status report
description The requested resource (/396596_%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.jpg) is not available.

Appending “-Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8″ to JAVA_OPTS does not help. You can also verify it with ‘java ShowSystemProperties | grep encoding’ – code below. The working solution is to add
export LC_ALL=”en_US.UTF-8″
to environment (e.g. in ~/.bashrc), relogin or reread environment, check locale output and restart Tomcat. Setting LANG instead of LC_ALL seems to work fine too. Now list.jsp reports:

file.encoding=UTF-8
sun.jnu.encoding=UTF-8
sun.io.unicode.encoding=UnicodeLittle
file.encoding.pkg=sun.io
396596_张日洲北马.jpg home/tomcat/tomcat/webapps/ROOT/images/396596_张日洲北马.jpg --> exists? true

and the image displays correctly when link clicked.

lc_all java utf

Contents of list.jsp:

<%@page import="java.io.*" %>
<%@page contentType="text/html;charset=UTF-8"%>
<% ServletContext servletContext = getServletContext();
String contextPath = servletContext.getRealPath(File.separator);
contextPath = contextPath + "/images";

out.println("file.encoding=" + System.getProperty("file.encoding") + "</br>");
out.println("sun.jnu.encoding=" + System.getProperty("sun.jnu.encoding") + "</br>");
out.println("sun.io.unicode.encoding=" + System.getProperty("sun.io.unicode.encoding") + "</br>");
out.println("file.encoding.pkg=" + System.getProperty("file.encoding.pkg") + "</br>");

File f = new File(contextPath);
String[] children = f.list();
if (children != null) {
    for (int i=0; i<children.length; i++) {
    String filename = children[i];
    out.print("<a href='" +filename + "'>" + filename + "</a>&nbsp;");
    File cf = new File(contextPath + "/" + filename);
    out.println(cf.getAbsolutePath() + " --> <b>exists?</b> " + cf.exists() + "</br>");
    }
} %>

Contents of ShowSystemProperties.java:

import java.util.Properties; 
public class ShowSystemProperties {
    public static void main(String args[]) {
    // Get all system properties
    Properties props = System.getProperties(); 
   //Properties props = System.getProperties();   
   props.list(System.out);  
  } 
}

This entry was posted in Java, Tomcat. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


7 × two =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>