Quantcast
750×100
Read WebProNews
With Friends!

XML Parsing and Handling Guidelines

Get the WebProNews Newsletter:

If I had a quarter for every time I’ve had to convert a big bulk of XML formatted data into some other sort of datatype, well then, I’d have a few dollars. However, these far and few between projects have had an exponentially higher return of headaches. I’m glad I’m not the only one. Waldo Jaquith wrote about running into some of the same XML gotchas. Although Waldo breaks issues down into four concepts (encoding errors, changing realities, inconsistencies, and missing data), let’s look at the checkpoints that nearly all XML (and other data formats) architects can utilize to eliminate a high percentage of the long term problems.

Embarrassing XML story? Share it with us in the comments.

Validation

My third grade class had a daily mad minute math worksheet to complete. Those who correctly answered all the math problems in a minute received a silver star on the chart. The one who finished first with all correct answers received a gold star. At the end of each week, the student with the most gold stars got to drink a soda at lunch. I didn’t validate my answers, and thus, I never got a soda. Likewise, our tendency to complete projects in a timely fashion often lead to cutting corners and/or prematurely accepting a project as complete. Although every good textbook says testing and validation are crucial elements of computer programming, these aspects, unfortunately, are often overlooked in the real world where testing and validation equal time, time equals money, etc. etc. However, if your project is generating any type of reusable data format, then validation should become absolutely mandatory. If your program creates improperly formated XML, then any requirement regarding exporting and/or sharing of data is most certainly not met.

There are some great validation tools for XML, so don’t be shy using them. Another, easier, test, is to simply export your data and try to import it using a standard library, like SimpleXML for PHP.

Common Sense Architecture

Not all data can be easily modeled. However, you should always take the extra time to find the best possible way to represent a given dataset. Failure to do so creates inadequacies when your modeled data has to be used elsewhere. Changes to your data structure? Don’t add an expansion to only the third floor of the building – there’ll be nothing underneath to support it! Instead, look for ways that make sense to expand. Sometimes, that means adding or rebuilding the foundation. Although lives are not always at risk based on how you may model a given dataset, attempt to treat it like lives are on the line. I can promise you that if I ever have to deal with your poorly modeled data, your life may, in fact, be on the line. If you’re building a structure for your family to live in, and it doesn’t seem safe to add or transform the structure, then don’t do it that way. All these analogies and metaphors dance around this: don’t be lazy. Use common sense when both initially modeling and making changes to your data from not only your perspective and the end user’s perspective, but also the perspective of anyone that might have to use the raw data model.

Jaquith ran into this problem when trying to normalize different states laws.

There are at least a few states that violate their standard state code structure. They might structure their code by dividing it into titles, each title into chapters, and each chapter into sections — except, sometimes, when chapters are called “articles.” Why do they do this? I have no idea.

Validate and use common sense, and you’ll save yourself, myself, and programmers around the world like Waldo a ton of headaches.

About Michael Marr
$hobbies = array( 'husband', 'father', 'poker player', 'basketball fan', 'nerd' );
$bio = 'Michael enjoys being a ' . implode(', ', $hobbies);
echo preg_replace('/(.*), ([A-Za-z]*)$/', '$1, and $2', $bio);
Twitter: @mikemarr33     |   Google: Google+
Top Rated White Papers and Resources

What do you think? Respond.

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>