Enterprise Resource Planning

ERP Journal on Ulitzer

Subscribe to ERP Journal on Ulitzer: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get ERP Journal on Ulitzer: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

ERP Journal Authors: Progress Blog, Automic Blog, Janakiram MSV, Louis Nauges, Jason Bloomberg

Related Topics: XML Magazine, Java Developer Magazine, ERP Journal on Ulitzer, C++ Developer

XML: Article

XML for C++ Developers

XML for C++ Developers

To newcomers to the XML world, it might seem as if XML and Java are somehow connected at the hip. There are certainly synergies between the two technologies, largely because they've come of age at the same time. Consequently, many of the new developments in XML are first implemented in Java, and we're now seeing new Java developments leveraging the standardization of XML. In the real world, however, most new code is still written in C++ and often involves interaction with existing applications.

XML-Enabling Existing Applications
Many organizations now face the task of XML-enabling existing applications as quickly as possible. Some of these projects are trying to achieve better application integration, while others are just trying to achieve buzzword compliance. Whatever the motivation, you'll find that most of the information you read in journals and online sources tends to be directed toward enabling existing applications for XML with minimal changes to existing source code.

It's important to keep in mind that when I use terms like old or legacy, I don't necessarily mean those half-million lines of 20-year-old COBOL in the accounting system back-end. For many developers the "old" code that now requires an XML interface is more likely the ERP system that finally got rolled out last summer. Odds are, the application and associated integration components are written in an object-oriented language like C++ or Java.

A variety of techniques are being used to XML-enable existing code, but the common thread among all of them is the translation of XML messages into some other format that's already understood by the application. For instance, to XML-enable your ERP system you might employ an adapter that translates XML documents to SAP's iDoc format (see Figure 1).

Another approach might be to translate XML messages into EDI transactions using a middleware broker (see Figure 2). The fundamental goal of these approaches is to meld XML capabilities onto existing applications without changing the applications themselves. If packaged products aren't available to perform the translation, you can always develop your own conversion logic using open standards like DOM, SAX, XSL, XT and others.

In this first wave of the XML-enabling of existing applications, Java has carved out a significant market. You'll find that most of the XML brokering products are implemented in Java. At their cores are data format filters that convert XML to and from other data formats, including other dialects of XML.

In the short term, data format translations offer a quick solution to providing XML support in existing applications. In the long term, however, they have the potential to significantly add to the complexity of the overall system. That translates into increased cost of ownership and decreased performance.

Integration vs Translation
If the first generation of XML proliferation is to XML-enable existing applications, then the second must be to add XML support to applications that produce and consume XML data, not translate and forward it. This is a logical next step since adding integral XML support to these applications makes many of the data translation issues disappear. Figure 3 depicts a fully XML-enabled enterprise. In this environment XML is not only the data format used for communication with external entities via the Internet, it's also the native format for enterprise application interoperability.

What does it mean to add integral XML support? In layman's terms it means anticipating and designing for XML rather than adding XML as an extension of the application after the fact. It means propagating schema data types through to the application rather than doing data type conversions during an XML import/export process. And it means mirroring the XML document structure into the class structure of the application in a friendly and object-oriented way.

It's in meeting these types of requirements that standards like DOM and SAX start to fall short. These are great tools if you're doing XML data translation because they provide a generic interface to any XML document. They're independent of the problem domain of whatever application will ultimately process the data encapsulated in the XML document. But if you're adding integral XML support to an application, the last thing you want for your application data is a generic API. It's much more productive to have an object-oriented class structure that directly represents the data and methods defining your problem domain. These classes can then have methods for generating or parsing compliant XML documents. With this type of solution you can spend time solving your business problems instead of dealing with the intricacies of detailed XML programming.

Shortcomings of DOM
Okay, so I've made some accusations against DOM (and SAX). Now I'll try to back up my assault on this standard. Before I do, let me say for the record that the only reason I'm not also describing the advantages of DOM is because I consider them to be well documented and readily available in publications like this one and through online resources. There's plenty of information floating around that will help you understand where DOM is appropriate, but not much that will help you figure out where it isn't.

By far the most significant shortcoming of DOM, one that makes life hard for C++ programmers, is that it fails to leverage the underlying principle of XML: "meaning, not markup." XML tags not only delineate data elements, they also provide an organization for the data by logically encapsulating related and derivative data items. XML schema designers invest a lot of time and brainpower in their designs, and mapping the XML document structure to C++ OOP concepts will reuse this intellectual capital.

DOM, on the other hand, provides a generic API that has no meaning in relation to any specific document structure. Its API is designed with XML document-processing tasks in mind, such as validation or translation. Also, DOM doesn't allow you to access XML data elements using the data type information available in XML schema standards. All XML data is presented to the calling program as string data, which makes DOM more susceptible to cast errors, constraint errors and simple typos when comparing string literals.

The schema for a simplistic purchase order in Listing 1 provides an example of these limitations. Listing 2 gives the code necessary to populate a DOM structure for generating valid XML for the purchase order schema in Listing 1.

Notice first that most of the code in Listing 2 deals with XML-isms, such as creating and manipulating nodes, setting attributes and managing child lists. These interfaces aren't relevant to the problem domain (i.e., issuing a purchase order) and greatly increase the complexity of the code.

Since the DOM interfaces aren't related to the XML document structure, you'll notice the use of string literals throughout Listing 2. For example, the following code creates the Seller element:

DOM_Node Seller = PODoc.createElement("Seller");

The element tag is set with the string literal "Seller" and is therefore susceptible to typos in the code. Compiling the code from Listing 2 provides virtually no error detection in the context of the purchase order DTD that defines the intended XML output.

Finally, the DOM parser used in Listing 2 provides its own classes for Unicode support, namely DOMString. While a pretty powerful class, it doesn't integrate well with string classes from other frameworks such as std::string from STL or CString from MFC.

C++ Classes for XML
Implementing a DTD or schema with C++ classes eliminates most of the problems associated with DOM usability. C++ classes can map directly to the XML document structure, thereby providing a context relevant interface to the XML data. By deriving the C++ class hierarchy from the XML structure, you reuse the work of the schema designers and provide a common language between the schema designers and the developers. Of course, for this to work well, the schema designers need to consult with the programmers to make sure their schemas lend themselves to proper OOP concepts such as inheritance, encapsulation, strict data typing and more. My experience is that a good schema design makes for a good class hierarchy and vice versa, so having these people collaborate is a win for everyone.

Another advantage of the C++ classes approach is easier integration with the rest of the application. For example, most C++ DOM parsers include their own libraries for handling code page translation for internationalization. While these libraries may perform well, most likely they're not the same libraries you may be using in the rest of your application. If you're coding for Windows NT, for instance, you're likely using the native NLS APIs provided by Windows NT. Trying to use the native APIs in your main application, while simultaneously using a third-party library in the DOM parser, can cause integration problems if the same set of code pages isn't supported by each package. Integration is also easier with C++ classes that are written using the same application framework (such as MFC or STL libraries) as the rest of your application.

Since DOM is defined as an interface, not an implementation, the ability to do derivations of DOM classes varies greatly by implementation. In general, derivation of DOM classes is difficult because the DOM API was designed with parsing XML documents in mind, resulting in many read-only interfaces. If you have application-specific data associated with XML elements, such as a database key, DOM doesn't make it easy to associate this data with DOM nodes. With schema-derived C++ classes, however, you can quickly derive classes from the schema-derived base classes and encapsulate your application data.

Compare Listing 3, which generates XML output via schema-derived C++ classes, to Listing 2, which uses DOM. You'll see that the code in Listing 3 is more readable and better encapsulated, and has better data type checking (both compile and runtime) than the DOM code in Listing 2. Most important, interfaces are specific to the problem domain of the application rather than XML-ese.

As I mentioned above, the quality of C++ classes derived from an XML schema is directly proportional to the quality of the schema itself. You can do things in a schema that will wreak havoc on the classes. The good news is that these pitfalls are also bad schema design in themselves, irrespective of the C++ integration.

It's important that you make use of the new draft XML schema standard as soon as it's practical to do so. C++ classes derived from DTD files lack data typing and other advanced features available in XML schema. If you're working in a Windows-only environment, then the schema standard supported by the MSXML parser from Microsoft, commonly known as XML-Data Reduced (XDR), is a viable alternative until the new XML schema standard matures. As the XML schema standard evolves with the introduction of custom data types, data constraints and inheritance, the mapping of schemas to C++ classes will be more complete.

In this article we've discussed the benefits of creating custom C++ classes for XML generation as opposed to generic interfaces such as DOM or SAX. C++ classes derived from the XML schema definition provide a more object-oriented approach and will make your code easier to understand and maintain. Your code won't be littered with XML logic that's tangential to the business problems you're attempting to solve.

This focus of this article has been on generating XML data from C++ classes, not parsing XML input data. My next article will describe a unique methodology for parsing XML data in C++ classes, which will provide all of the object-oriented benefits described here and increased performance compared to traditional XML parsers.

More Stories By Ken Blackwell

Ken Blackwell is the chief technical officer of Bristol Technology, Inc., where he oversees product architecture and research in XML, middleware and transaction analysis technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.