Enterprise Resource Planning

ERP Journal on Ulitzer

Subscribe to ERP Journal on Ulitzer: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get ERP Journal on Ulitzer: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


ERP Journal Authors: Progress Blog, Automic Blog, Janakiram MSV, Louis Nauges, Jason Bloomberg

Related Topics: Java EE Journal, ERP Journal on Ulitzer

J2EE Journal: Article

Integrating Enterprise Information on Demand with XQuery, Part 1

Integrating Enterprise Information on Demand with XQuery, Part 1

Since the dawn of the database era more than three decades ago, enterprises have been amassing an ever-increasing volume of information - both current and historical - about their operations. For the past two of those three decades, the database world has struggled with the problem of somehow integrating information that natively resides in multiple database systems or other information sources (Landers and Rosenberg).

The IT world knows this problem today as the enterprise information integration (EII) problem: enterprise applications need to be able to easily access and combine information about a given business entity from a distributed and highly varied collection of information sources. Relevant sources include various relational database systems (RDBMSs); packaged applications from vendors such as Siebel, PeopleSoft, SAP, and others; "homegrown" proprietary systems; and an increasing number of data sources that are starting to speak XML, such as XML files and Web services.

During the past two decades, a number of research and commercial systems have been built in attempts to solve the EII problem. These systems have been known by a variety of names - heterogeneous distributed database systems, multi-database systems, federated database systems, data integration systems, and now enterprise information systems. But the problem itself has persisted, and it remains a very real problem.

Solutions to the data integration problem involve choosing a common data model into which all the existing data sources are (virtually) mapped, then using a query language designed to work with that model to extract the desired data from the set of mapped data sources. Many data models and languages have been invented and/or tried over the years - including relational (SQL), functional, logical, object-oriented (ODMG/OQL), and semi-structured approaches - but each has fallen short. The two biggest impediments to their success have been the challenge of naturally mapping the data from all the sources of interest into the chosen model and the lack of industry consensus on an appropriate and acceptable model into which to map the data.

Fortunately, the XML age is upon us, and with it has come a set of technologies that are uniquely suited to solving the EII problem. Much as the simplicity of HTML and HTTP led to their rapid adoption, which in turn led to the rapid growth of the Internet, the simplicity of XML is leading to its rapid adoption as the generally accepted format for data interchange and application integration in the IT world today. Because of the rapid adoption of XML, the XML Schema standard is also rapidly gaining traction as the way to describe enterprise data for integration purposes. These trends are due to the simplicity and flexibility of XML - it is straightforward to express data from almost any enterprise data source in XML form without having to commit an "unnatural act." For similar reasons, Web services - based on XML, SOAP, and WSDL - are rapidly gaining traction as the way for applications to interact, either synchronously or asynchronously, for point-to-point communication (Curbera). It follows from these trends that a query language for XML, one capable of querying and reshaping XML data as well as invoking functions, such as Web services, would provide an ideal foundation for solving the EII problem.

Enter XQuery, the emerging XML query language being produced by the W3C XML Query working group. In this article, we provide an introduction to XQuery and explain how it enables true enterprise information integration - allowing not just database data, but also information from applications, Web services, messages, XML files, and other data sources, to be integrated into coherent reusable views and then used to meet the query demands of enterprise applications.

XQuery: A Query Language for XML
The development of the SQL language for querying and manipulating relational data was a major force in ushering in the database age in the late 1970s and early 1980s. The goal of the W3C XML Query working group has been to design a similarly high-level, declarative query language for XML data.

Why not SQL?
It's natural to ask why SQL, or a SQL derivative, isn't the right solution to the problem of querying XML. The answer is that there are just too many differences between XML data and relational data to make SQL a good candidate for this task:

  • Relational tables are flat, whereas XML data tends to be hierarchically structured, often several levels deep.
  • Relational tables are highly uniform, while XML data tends to be more highly variable. Structural variations, typing variations, and missing data are more the norm than the exception with XML data.
  • Relational data is naturally unordered, while order often has an important meaning in XML data (particularly for document data!).
  • Tables have relatively static schemas that can be difficult to evolve, while XML Schemas tend to be more extensible, and the self-describing nature of XML blurs the data/meta-data distinction. Moreover, XML data may or may not have an associated schema, while relational data cannot exist in the absence of a schema.
  • Finally, in the XML world, textual information can be intermixed freely with structured (i.e., tagged) information.

    As a result, the W3C has been designing a new query language tailored to the unique needs of manipulating XML data. The result of that work is the language now called XQuery. Although XQuery is a work in progress, it is nearing completion at the time of this writing, and it is likely to become an official W3C Recommendation in late 2003.

    XQuery basics
    At the heart of the semantics of XQuery, and also of XPath 2.0, lies the XQuery data model. Just as the relational model laid the foundation for SQL, the XQuery data model lays the foundation for XQuery. Because XML data is naturally ordered, the XML data model is based on the notion of ordered trees. Central to the XML data model is the notion of a sequence. XML queries consume and produce sequences that consist of atomic values (based on the primitive types of XML Schema) and/or of XML nodes (element, attribute, text, document, and so on).

    XQuery is a functional, side-effect-free language. Like many other functional languages, a program (a query in the case of XQuery) consists of a prologue and a body, where the body is an expression. The result of a given query is the result of evaluating its body expression in the environment defined by its prologue. Expressions in XQuery can be simple expressions like primitive constants (e.g., "John Doe" or 1.3), variables, arithmetic expressions, function calls, or path expressions (familiar to users of XPath). They can also be combined to form more interesting expressions via operators, functions, and syntactic constructs including FLWOR expressions (discussed shortly), typeswitch expressions, and node constructors.

    The XQuery language is rich enough to support navigation within an XML input document, the combining of data from multiple XML inputs, and the generation of new XML structures from one or more XML inputs. To generate new XML structures, XQuery takes a JSP-like approach. A subset of the XML syntax itself is part of the XQuery language, enriched with XQuery expressions that are executed dynamically and replaced inside the XML structures with their results. One can switch between literal XML and query expressions via curly braces.

    From the standpoint of the EII problem, the most important expression in XQuery is the FLWOR (pronounced "flower") expression, which is roughly analogous to SELECT-FROM-WHERE-ORDER BY queries in SQL. The components of a FLWOR expression are:

  • A for clause that generates one or more value sequences, binding the values to query variables. The for clause in XQuery plays a role similar to the FROM clause in SQL.
  • A let clause that binds a temporary variable to the result of a query expression. The XQuery let clause is similar to support for temporary views in some dialects of SQL.
  • A where clause that contains Boolean predicates that restrict the FOR clause's variable bindings. The where clause in XQuery serves the same purpose as the WHERE clause in SQL.
  • An order by clause that contains a list of expressions that dictate the order of the FLWOR expression's XML output. XQuery's order by clause is directly analogous to SQL's ORDER BY clause.
  • A return clause that specifies the query's desired XML output. The XQuery return clause is analogous to the SELECT clause in SQL, but the structures that it can specify are much richer than those expressible in SQL. (For example, this is where XQuery's JSP-like XML node construction syntax can be found.)

    For data handling, XQuery has the richness of SQL and more - XQuery includes support for subqueries, union, intersection, difference, aggregate functions, sorting, existential and universal quantification, conditional expressions, user-defined functions (that may even be recursive), and static and dynamic typing, in addition to various constructs to support document manipulation (e.g., query primitives for order-related operations). The biggest thing that XQuery lacks relative to SQL today is support for updates; XQuery 1.0 is strictly a functional data access language, with update support being targeted for a later revision of the standard.

    Using XQuery for Enterprise Information Integration
    To show how XQuery can be applied to solve the EII problem, as well as illustrate the power of some of the main constructs of the language, let's consider a simple yet illustrative business scenario. A large consumer electronics retailer wants to organize its IT infrastructure to make its staff more productive and its business more effective. The retailer has both in-store customers and online customers, and it both sells and services home entertainment systems, computers, and other consumer electronic devices. To encourage customer loyalty, consumers receive reward points for their purchases.

    The bottom layer of Figure 1 shows what the electronics retailer's IT infrastructure looks like today. Its customer relationship management (CRM) data, such as information about customers and credit cards, is stored in an RDBMS. Order management is handled through an ERP system (SAP), and as a result, order information is available via a J2EE-CA adapter developed to access the ERP system's API. The adapter API provides calls like getOpenOrders( ), which takes a customer ID as input and returns a list of that customer's open order information. Service data is also stored in an RDBMS, but in a different one than the customer data. Finally, the electronics retailer utilizes an external service for performing customer credit checks. The external credit service provides a getCreditRating( ) Web service call that takes a social security number - formatted differently than in the electronic retailer's RDBMS - and runs a credit check on the specified individual.

    The electronics retailer's line of business managers have asked the company's IT department to create customer portals for three different sets of users. The three desired portals and their data provision requirements are:

  • An online customer self-service portal that will be directly accessed by customers via the Internet. This customer self-service portal should show the customer's profile information, registered credit cards, orders, and service cases, but it should not show the customer's credit rating information.
  • A credit approval portal that will only be accessed by credit approval personnel. This credit approval portal should show the customer's basic profile information, registered credit cards, and credit rating information.
  • An internal product service portal that will be used by clerks in the electronic retailer's service department. This service portal should show just the customer's basic profile information and service case information.

    All three portals require information about the same core business entity - namely the customer. However, each line of business manager wants a different view of the customer. The electronics retailer's IT department wanted a solution that would enable rapid development and provide high reusability of their initial data integration efforts as well as subsequent low maintenance. Fortunately, their data architects realized that they could achieve these goals by creating a single, integrated base view of the customer and then creating three application-specific views on top of the base view. This way, their data integration effort is spent on creating the base view, and the application-specific views are then easily created without concern for disparate data models, differing data source APIs, or other integration snafus. Later on, changes in underlying data source schemas can be dealt with by maintaining the base view; the application-specific views are shielded from most such changes.

    With XQuery, the solution sketched above can be implemented by viewing the enterprise's different data sources all as virtual XML documents and functions. XQuery can stitch the distributed customer information together into a comprehensive, reusable base view. That is, the base view definition can be expressed using XQuery, respecting the hierarchical nature of the data, given appropriate default XML views of the enterprise's data sources. As indicated in Figure 1, the relational data sources can be exposed using simple default XML Schemas, and the other sources - SAP and the credit-checking Web service - can be exposed to XQuery as callable XQuery functions with appropriate signatures. In the middle of Figure 1, we see a sketch of the desired "single view of customer" - here, the desire is for all data about customers to be made available for easy querying from various applications. The developers of these applications then simply work against this unified view - which is an XML view of customers where each customer has some basic data, some credit rating information, an associated set of credit cards, a set of open orders (each with all their line item details nested inside), and a set of service cases.

    Listing 1 shows in full detail how XQuery can be used to define the desired single view of customer. The XQuery shown in the listing defines a single well-formed XML document with top-level element CUSTPROFILE. The outermost FLWOR expression uses the variable $Cust to "iterate" logically over all of the customers in the CRM database's CUST table. Its let-clause binds a second variable, $CredInfo, to the result of calling the credit Web service's method getCreditRating( ). Note that this call deals with the disparate social security number formats by reformatting the value being passed to the Web service. The top-level return-clause is where most of the action is, as this is where the desired result is defined and shaped. For each customer, the view will contain a CUST element with the basic customer data at the top level and a nested CREDITINFO element with the customer's credit rating from the Web service. It will have a CREDITCARDS element containing subelements for each of the customer's credit cards, computed via a correlated FLWOR subquery (much like a nested query in SQL), and the view query has similar subqueries for computing the sets of ORDERS and CASES for each customer. In the case of ORDERS, notice that the subquery's for-clause ranges over the result obtained by calling the getOpenOrders( ) method of the ERP application adapter. Like the Web services call, this method appears to the view definer as another callable XQuery function.

    As shown at the top of Figure 1, there are three different queries to be written against the base customer view. One is CustomerSelfServiceQuery, a parameterized query that, given a customer ID, returns the information that the customer is allowed to see through the customer self-service portal. This query returns everything known about the customer except for the CREDITINFO element. Another query is CreditPersonnelQuery, for use by the personnel who handle credit approval requests. This query also takes a customer ID and returns customer information; however, it omits ORDERS and CASES, as they are not relevant for credit department use. The third query in Figure 1, ServicePersonnelQuery, is for use by the service department. This query takes a customer ID and returns basic information about the customer plus the set of open service cases for the customer. Listing 2 shows how simple it is to write the third query given the centralized customer view provided by Listing 1.

    This example, while it uses very simple data sources and schemas for clarity, illustrates a number of important points about the benefits of an XQuery-based EII solution. One benefit is that the data integration problem for a given business entity only needs to be solved once, when defining the centralized view. It can then be leveraged across multiple applications, and the queries or further views for those applications are vastly simplified (as shown by Listing 2). Another benefit is that the use of XML and XQuery provide a very natural basis for defining centralized views of real enterprise data. They make it simple to capture the naturally hierarchical nature of the data, particularly for data that lives within applications (as opposed to just flat RDBMS tables). XQuery also provides the power to deal with complications like key mismatches, either by calling a function to transform a key, as is done in the Web service call in Listing 1, or by incorporating a key mapping table or service into the base view query.

    It is important to mention that these benefits come with no requisite negative performance implications. When the XQuery-based EII system goes to process a query like the one in Listing 2, it will do inline-like expansion of the query's view reference (as has been done for decades in RDBMSs). This will result in a query that involves only the base data sources, a query in which predicates such as the customer ID parameter and the "Open" case status constant can be pushed all the way down to the appropriate data sources. Also, only those base sources that actually contain data needed for the query - the two RDBMSs in Listing 2's case, for example - will become involved in processing the query at runtime.

    In Part 1 of this article we have introduced the EII problem, provided a brief overview of XQuery, and explained XQuery's role in solving the EII problem. In Part 2, we will complete the picture by talking about two related technologies, namely EAI and ETL, and explaining how they relate to EII and XQuery. We will also describe an EII customer scenerio and explain how Liquid Data for WebLogic, an XQuery-based EII product from BEA, was used to tackle the data integration problems that this customer faced.

    References

  • Landers, T., and Rosenberg, R., "An Overview of Multibase." Proceedings of the 2nd International Symposium on Distributed Data Bases, Berlin, Germany. North-Holland Publishing Co., September 1982.
  • Curbera, F., et al. "Unraveling the Web Services Web: An Introduction to SOAP, WSDL, and UDDI." IEEE Internet Computing 6(2), March-April 2002.
  • XQuery 1.0: www.w3.org/TR/xquery
  • Comments (2) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    james 06/11/03 06:24:00 PM EDT

    assuming, of course, you consider jcl a language.

    but i know what you mean - and it isn't just that we have to know multiple languages/technologies, but multiple standards within those languages. xml is a perfect example. for something that is supposed to simplify my work, i sure do have to spend a lot of time keeping up with all of the changes and additions. but i'm not the sharpest knife in the drawer. i always thought cics was hard.

    Mike Plusch 06/11/03 06:02:00 PM EDT

    The goals are admirable, but having another special purpose language to
    solve a small piece of the problem
    doesn't seem to address the complexity
    that is rampant in XML and Web Services. Everyone seems to be saying
    that things are getting easier, but I just don't see it. I remember a time
    where you only needed to know a single
    language to get your job done.

    _Mike Plusch