Sunday, March 18, 2012

Alphabets Conversion

In one of company's last projects we supposed to implemented support for usage of Latin and Cyrillic alphabet. This could sounds like an easy task, the Java have a real good support for the multi language application, but what is with languages which have multiple alphabets.
When I said supporting more alphabets, I mean on possibility of choosing between different keybord layouts, not only on showing messages from different properties files, also content saved in database should be transformed from one to other alphabet.
To accomplish these requirements, we made few decisions. We've decided to save data in Latin form and then transform it in Cyrillic if it is needed. This also mean that complete input from user side will be transformed to latin alphabet.
I've almost forgot to mention technologies used in this project. It is a web application, devloped with next technologies:

  • JSP + Servlet for front end
  • Spring + Hibernate as backend
  • and it is running on Tomcat 6

In the next few lines i will explain how we accomplish this task. First of all we've implemented character converter, after that we've implemented transformation of user parameters, at end we implemented convertion for the front end. So I will start with character convertion.

Character conversion

Next lines are reserved for conversing to alphabet. In this application we've used UTF-8 encoding, so complete process of converting is transforming characters from one character table to other.
The first two UTF tables contains all characters needed for Latin alphabet (this is case for serbian latin) and for representing of Cyrillic characters we've used table with characters from U+0400 - U+04FF.
Base idea for transforming from one to other alphabet is creating mappings between Latin and Cyrillic representation of same characters. And during this process we need to take care about handling of special cases, such as conversion of chracters "Lj, Nj, Dž".
In the next code block you can see method for converting chracters.


So from this code snippet, you can see how the conversion is implemented.
Next code snippet showing how converter is initialized.


From this code you can see, that converter take a properties, and create two maps which are used for conversion. The first map is used for converting from Cyrillic to Latin, and second one for transofming from Latin to Cyrillic. 
And at end i will show you additional method, used for easier conversion, which can hadels different kind of objects. Main purpose of method is for transforming Lists and Maps which contains Strings.


And also I said that we are using Spring so here is bean definition for converter.


That is all about character convertion. Next what I will show you how to integrate this in web application.


Integrating converter in web application


Already I have explain that we need to convert everything what user type in corresponding alphabet. On the top i mention that this is a web application so integrating character converter can be implemented by intercepting user request and wrapping in our custom request or we can do it by using aspects. We are decided for aspects and we've used AspectJ to do conversion for us. 
Here is code snippet of doFilter method of CharacterConversionFilter:


As you can see it is just creating new wrapper and this wrapper do not do anything special, we just create it because we need a place for creating join points for aspects.


Now is good place for showing aspect used for character conversion.


Method "convertCharacters" is used for converting values which need to be shown to the user.
Method "convertCharactersToLat" is used for converting to Latin alphabet before persisting values in data base.
Method "convertParametersToLat" is used for converting parameters to latin alphabet.
Method "procedWithControllerInvocation" is used for converting results of controllers call to corresponding alphabet.
When you decided to use aspect you can do it on two ways using compile time or load time weaving. We've decided for the second solution. 
So the last lines of this blog i will use for explaining how to configure Tomcat, Spring and AspectJ to work together.
To getting load time weaving to work you need to define new class loader for your application, you don't need to make new one you just need to tell tomcat to use existing one which is a part of spring instrumentation library. 


Also we need to provide aop.xml file, which is used for instructing AspectJ what need to be weaved and which aspects to use.


And also at end we need to tell Spring to use AspectJ load time weaving.


I already mention that we use spring, and springs cams with good support for AspectJ. At  Spring 3.0 documentation you can find more details about how to configure Spring with AspectJ.
So that's all. Now, after doing this few tomcat, spring configuration stuff, you are ready for trying character converter.
Advantage of using ApsectJ for resolving this problem, is that code for conversion don't have any impact on the rest of application code. Converter can be easily removed from application without need to change anything in existing code.