Code-Generation Techniques for Java

    September 11, 2003

Working in Java either means writing a little bit of complex code or writing a lot of gruntwork code. J2EE is a prime example; implementing the persistence for a single database table takes five classes and two interfaces using EJBs, and almost all of the classes are clerical work. We have to write them, but we don’t have to do it by hand. Code-generation techniques can make building high-quality EJB code a breeze.

Will code generation revolutionize computing and change the way we develop forever? Yes, but it will take a while. Software engineering has always concentrated on increasing our level of abstraction. In the beginning, we hand-wrote machine code; then we created assemblers and macro assemblers. After that, we created Fortran and compiled our code into assembler. Then came structure programming, and after that, object-oriented programming. With each step, we have increased our level of abstraction and, thus, our ability to create higher quality applications with more functionality, more quickly.

What is Code Generation?
What is this panacea for developers called code generation? Code generation is the technique of writing and using programs that build application and system code. To understand code generation, you need to understand what goes in and what comes out. What goes in is the design for the code in a declarative form: “I need two tables named book and author with these fields.” What comes out is one or more target files. It could be Java code, deployment descriptors, SQL, documentation, or any type of controlled output.

Figure 1 shows the basic form of today’s code generators:

Figure 1. The process of code generation

The components can change slightly between the different models, but the song remains the same. The code generator reads in the design, then uses a set of templates to build output code that implements the design. The separation between code generation logic in the generator and output formatting in the templates is akin to the separation between business logic and user interfaces in web applications.

Code generators are not wizards. Wizards are passive generators. They write code once, and then it’s up to you to maintain the code forever. Code generators are active. They continually maintain code over multiple generation cycles. As the designs change, the input to the generator changes, and new code is created to match the design. This is a key advantage – when have you been on a project where the requirements don’t change?

What Are the Benefits?
Before we get into specific examples of code generators for Java, let’s make sure we have the end goals firmly in mind. One way to approach this is to think about the qualities we want in an optimal generator.

  • Quality: We want the output code to be at least as good as what we would have written by hand. Thankfully, the template-based approach of today’s generators builds code that is easy to read and debug. Because of the active nature of the generator, bugs found in the output code can be fixed in the template. Code can then be re-generated to fix that bug across the board.
  • Consistency: The code should use consistent class, method, and argument names. This is also an area where generators excel because, after all, this is a program writing your code.
  • Productivity: It should faster to generate the code than to write it by hand. This is the first benefit that most people think of when it comes to generation. Strangely, you may not achieve this on the first generation cycle. Thankfully, the real productivity value comes later, as you re-generate the code base to match changing requirements; at this point you will blow the hand-coding process out of the water in terms of productivity.
  • Abstraction: We should be able to specify the design in an abstract form, free of implementation details. That way we can re-target the generator at a later date if we want to move to another technology platform.

Now that we understand that benefits that we want, and how those are addressed by code generation techniques in general, we should understand what we expect to use code generation for in the Java context.

What We Expect the Generator to Handle
The output files of a generator are called the target files. There are several generation targets within the Java enterprise application stack. Figure 2 shows the stack:

Figure 2. J2EE generation targets

All four of these elements of the stack are potential generation targets, but some are more common than others. From the bottom to the top:

  • Database: Given Java’s object-persistence approaches to database work, there isn’t much call for direct generation of SQL for database code or stored procedures. However, if this is your architecture, you can use the custom approaches listed below to generate the required code.
  • Persistence: Database persistence code is the most common generation target in the Java environment. All of the generators I refer to in the sections that follow build persistence code. Why? It’s generally redundant grunt code. Generated database-persistence code also is an excellent foundation for a solid application, because it is consistent and relatively bug-free.
  • Business Logic and User Interfaces: Only MDA and custom generators build production business logic and user interfaces. The critical factor in generating this code is building on top of a stable, predictable platform, ideally a generated persistence layer.

It’s obvious that code generation is powerful and can build useful code, but does it have drawbacks?

What to Look Out For
Code generation is not without pitfalls and detractors. One of the most common complaints is that code that was once active is now being hand-modified and thus cannot be re-generated. One trick is never to check the generated source into the code base. This ensures that engineers will always be required to use the generator as part of the compilation process. This keeps the generator alive and keeps engineers from modifying the output code.

Another problem is that engineers who have been around for around since the early 90s liken code generators to Computer-Aided Software Engineering (CASE) tools. The comparison is mistaken because code generators are developed bottom-up by engineers for engineers. CASE tools were developed as a top-down replacement for programming languages and for engineers.

There are more reasons that engineers are skeptical about generation. Some issues are technical and others are cultural. Some times it comes down to simple job preservation. These tend to be situation-specific and boil down to simple issues: trust, teamwork, and education. In order to successfully deploy a generator, the team must trust the tool. They must feel that they have some control over the tool and its implementation. They also need to know how the tool is used both at a basic level (e.g., How do I run it?), and at a specific level (e.g., How do I specify when I need a table with a compound primary key?).

Perhaps the biggest drawback of code generation is that it falls to the implementer of the tool to ensure successful adoption within the team. If you put a copy of the code generator on the server and expect that people will immediately understand its use and the compelling value, then you are sure to fail. Education and empathy are key.

Given an understanding of which Java application components we can generate and what we have to look out for, let’s talk about the generators that build them.

Code-Driven Approach: XDoclet
The most popular code generator for Java is XDoclet, and for good reason. It’s easy and pragmatic, and it fits a need. XDoclet builds database-persistence beans to match the requirements specified in special JavaDoc comments within the Java entity bean code. We call this the “code driven approach” because it uses source code as the design input source.

Given a single entity bean with some markup, XDoclet will create the session beans, interfaces, and data access object required to complete the functional set. It’s a pretty sweet deal for someone looking to get some work done quickly without having to go to the effort required by other code generation solutions. Figure 3 illustrates how XDoclet relates to the application stack:

Figure 3. XDoclet and the application stack

XDoclet has grown beyond just bean generation. It now acts as a generic code-generation platform for solutions that use JavaDoc markup as a source for design information. There are XDoclet modules for all types of outputs and you can easily create your own.

XDoclet’s only drawback is its level of abstraction. Because the design is described in JavaDoc tags embedded in the code, code and design are bound tightly together with implementation specifics. Given this binding. it would be difficult to use XDoclet markup to generate complete code in a different language (e.g., C#).

VDoclet is an XDoclet clone that uses Velocity as the template language.

Model-Driven Approach: Custom
The alternative to the code-driven approach is to build code from an abstract model of the design. This model-driven approach comes in two flavors: MDA and custom. We will start with the custom approach and then get into MDA.

Using tools like XSLT, Velocity, and Jostraca, we can build textual output from an input specification. We can use these tools to build code by specifying a model of the code as input, using the template to specify the code.

  • XSLT: The design is specified as XML, and XSLT templates are used to create any number of output files. Generally, there is one system entity (e.g., class or database table) per XML input file.
  • Velocity and Jostraca: The templates read the design specification directly and then output code to match that specification.

The advantage over the code-driven approach is that while today these templates build EJBs, they could easily build JDO classes tomorrow, or C# the day after that. Keeping the model abstract makes portability a reality.

One downside to this approach is that each template is completely self-contained. There is no central code generator that is responsible for the interpretation of the design. This means that one template could interpret a date as just a date stamp, while another could interpret it as a date and time stamp. This is akin to the problems experienced with two-tier application servers where the business logic is not properly factored away from the display.

Another downside is that you are building a custom solution that will require team education. However, given the developing and fluid state of code-generation solutions today, even if you go with an existing solution, you will not often find engineers with extensive generation experience.

Model-Driven Approach: MDA
Model-Driven Architecture (MDA) is the Object Management Group (OMG) three-letter-acronym (TLA) initiative for code generation. I’m only slightly kidding; there are several TLA standards within MDA. The central idea is simple: turn a model in UML (Unified Modeling Language) into code (no, that’s not a four-letter acronym).

Figure 4 shows the flow of an MDA generator:

Figure 4. The flow of an MDA generator

We start with the Platform Independent Model (PIM), created in a UML editing tool, like Poseidon for UML from Gentleware. (The PIM can be in an exported XML format called XMI.) A Platform-Specific Model (PSM) is then created using a transformation. Templates are applied to the PSM to create the output code.

It’s easier to understand the difference between a PIM and a PSM in context. The PIM specifies the application business logic, for example, a table named book with these five fields. The PSM is a model of the implementation on a particular platform. In the EJB world, this is the set of UML models of the entity and session beans required to implement the book table.

The separation between the PIM and the PSM creates a well-factored generator that properly separates the design from the implementation specifics.

Some of the more popular MDA solutions are:

  • OptimalJ, an MDA generation tool from Compuware. They have just released a new version and started a major marketing push, including a study done by the Middleware Company where two teams developed the same application, one with MDA and one without. The MDA group finished 30% faster, even though they had to learn OptimalJ first. Impressive stuff.
  • AndroMDA, an open source MDA generator that reads XMI files and uses cartridges to build the various types of Java code. Changing persistence mechanisms, for example, is merely a matter of changing a cartridge.
  • MDE, a pragmatic MDA tool that goes directly from the PIM to code using a series of customizable generation components.
  • ArcStyler, an MDA generator from Germany that can build both Java and .NET code from the same design. The architect of ArcStyler is also the author of Convergent Architecture (Richard Hubert, Wiley, 2001), a book that integrates MDA into an entire design philosophy.

The value of MDA is that it is a set of standards that we can agree on and then work to improve. At the moment, though, MDA has some issues. First, the standards are more conceptual than standard, and are subject to interpretation. Second, UML is not complete enough to create business logic or to create efficient SQL schema, so it must be hinted at the PIM level. At the coding level, the current crop of MDA generators use “safe zones” in the code. These are specially marked sections of the code that are preserved between generation cycles. In this way you can extend the code directly to implement business logic that UML cannot specify.

Code generation is another link in the evolutionary chain of increasing abstraction. With it, you will quickly produce higher quality code, and thus be able to respond to changing requirements with ease. This is the true power of modern code generation.

Code Generation Resources
Outside of the direct links to the various generators, there are some more general online resources for code generation:

as well as some good books on generation:

  • My book, Code Generation in Action (Jack Herrington, Manning, 2003) covers a wide variety of code generation approaches at a practical level.
    XDoclet in Action (Craig Walls and Norman Richards, Manning, 2003) covers every aspect of XDoclet in depth.
  • The Pragmatic Programmer (Dave Thomas and Andrew Hunt, Addison-Wesley, 1999) has several sections on active code generation.
  • MDA Explained (Anneke Kleppe, et al, Addison-Wesley, 2003) covers the MDA standards in a clear and succinct manner with the aid of a practical example.
  • Model Driven Architecture (David S. Frankel, Wiley, 2003) covers the MDA standards and provides an overview of MDA and code generation throughout the development lifecycle.

*Reprinted with the permission of the O’Reilly Network.

Jack Herrington is the author of Code Generation in Action and the editor of the Code Generation Network. He is a software engineer with over twenty years of experience on numerous platforms and languages. He lives with his wife, Lori, and daughter, Megan, in Union City California.