Measuring Application Performance

    February 12, 2002

When considering a benchmark to determine the overall performance of a Java application, bear in mind that bytecode execution, native code execution, and graphics each play a role. Their impact varies depending on the nature of the specific application: what the application does, how much of it is bytecode versus native code, and how much use it makes of graphics. How well a JVM will perform for a given application depends on how the unique mix of these three functional areas maps onto its capabilities. Given these variables, the best way to benchmark a JVM is against your own application. Since thats not possible before the application has been written, you must find those benchmarks that are most relevant to the application you intend to write.

Sorting through Java benchmarks to find the ones that are relevant for embedded applications can be confusing. SpecJVM98, for example, provides a relatively complete set of benchmarks that test diverse aspects of the JVM. Sounds good-but Spec-JVM-98 runs in a client/server environment and requires a minimum of 48 MB of RAM on the client side for the JVM. That excludes it from any relevance to most embedded applications. In addition, it cant be used with precompiled classes.

Other benchmarks have different pitfalls. VolanoMark, for example, is a chat server implementation and is therefore relevant only for benchmarking applications with the same set of requirements as chat servers. The JMark benchmark assumes that the application includes the applet viewer and a full implementation of Javas Abstract Windowing Toolkit (AWT). This benchmark can be irrelevant for the many embedded applications that have no graphics or have limited graphics that dont require full AWT support, such as devices running a PersonalJava minimal-AWT implementation.

Embedded CaffeineMark (ECM), the embedded version of the CaffeineMark benchmark from Pendragon Software (it has no graphics tests), is easy to run on any embedded JVM, since it requires support for basic Java core classes only, and it doesnt require a large amount of memory. More importantly, theres a high correlation between good scores on this benchmark and improved bytecode performance in embedded applications.

To get the most meaningful results from ECM, you must use exactly the same hardware when testing different JVMs. You must also pay attention to implementation differences among the JVMs youre testing. If, for example, youre comparing a JVM with a JIT compiler against a JVM without one, its important to run the JVM that has the JIT with the java -nojit option on the command line to ensure an apples-to-apples comparison.

ECM will typically make any JVM using compilation look good, no matter the type of compilation, because it includes a very small set of classes and always repeats the same small set of instructions. Dynamic compilers just cache the complete translation of the Java code in RAM and execute next iterations of the tests in native code. Ahead-of-time compilers can easily optimize the loops and algorithms used in ECM, too.

Although the industry abounds with other Java benchmarks, like Java Grande, SciMark, jBYTEmark, Dhrystone benchmark in Java, and UCSD Benchmarks for Java, there is no “ultimate” benchmark that can give you certainty about Java and JVM performance in embedded applications. The best strategy is to identify a suite of benchmarks that seem most relevant to your application and use the combined results of those benchmarks to help predict Java performance in a particular system environment.

Furthermore, the existing benchmarks may not measure other aspects of your application code. Tuning Java applications to meet performance goals may require addressing many program functions besides bytecode execution. Some of those functions-for example, thread management, synchronization, method-to-method calls, class resolution, object allocation and heap management (including garbage collection), calls to native methods, bytecode verification, and exception handling-occur within the JVM. Because few if any benchmarks address such functions, it falls to you to conduct an in depth study of a JVMs internals to understand how its design may affect crucial aspects of your application. Writing special programs that exercise critical aspects of a JVM can help you evaluate it for the application. If, for example, your application uses a heavy mix ofJava and C code, you can benefit by writing a program that tests native method call performance. Other functions, including nativecode execution and such factors as network latency, may occur outside the JVM.

Graphics Performance
What if your application includes graphics? To start, there are two major factors that affect graphics performance in Java applications: Does the applications graphics display driver use graphics coprocessor hardware acceleration? Is the application configured with a lightweight (faster) or a heavyweight (slower) implementation of the Abstract Windowing Toolkit? (See the figure.) In addition, like any other high-level Java service, graphics performance is affected by the way that the graphics services integrate with lower-level native libraries.

Wind Rivers Personal JWorks includes a good benchmark for evaluating graphics performance in embedded systems. The benchmark targets the PersonalJava AWT with a set of 39 tests of images, buttons, scrolling, text, and basic 2-D graphics.

Real-World Performance
Finally, you need to consider the performance of your CPU. To help you identify CPU-bound performance, you should supplement simple benchmarks by running real-world applications that exercise large amounts of different, complex Java code. Such test code must meet a number of requirements: It should contain a large number of classes that reflect an estimate of the real application (20-plus is a good ballpark). It must also be large (thousands of lines, at least) and have no file system access and no graphics. Some existing programs meet all those criteria. The GNU regular expression package, regexp, for example, comprises about 3,000 lines of code and more than 21 classes, providing a large number of expressions to parse and match. Another program, the Bean Shell interpreter, is a simple prime number sieve that has 70 classes and several thousand lines of code. JavaCodeCompact, Suns PersonalJava ROMizing tool, also would make a good test program.

The result of running these programs as test cases illustrates the wide variance in the meaning of benchmark scores. For example, a JVM using a JIT compiler may run Embedded CaffeineMark up to 30 times faster than when the nojit option is turned on (thus running in pure interpretation mode), but the same JVM runs the Bean Shell and regexp tests only about one and a half times faster when using the JIT compiler. (The apparently impressive thirtyfold speedup on a simple benchmark like Embedded CaffeineMark is achieved through caching techniques that the compiler uses on the small amount of code and classes in ECM.) The difference in results clearly demonstrates that high benchmark scores may not translate into a commensurate level of performance improvement in real world applications.

Actually, SpecJVM98 and JMark yield results that most closely approximate those for real-world applications. They do suffer, though, from the limitations discussed above. In particular, the usefulness of the former in the embedded space depends greatly on your ability to overcome the problems associated with your test infrastructure requirements.

Vincent Perrier is Product Marketing Manager, Java Platforms, for the Wind River Platforms Business Unit in Alameda, CA. He can be found at