Enterprise Resource Planning

ERP Journal on Ulitzer

Subscribe to ERP Journal on Ulitzer: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get ERP Journal on Ulitzer: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


The worst case would occur if separability is minimized. A benchmark that tracks that is provided by the Transactions Processing Council's analytical processing test. Results are not linearly comparable across database sizes but reasonable estimates suggest that the workload increases by about 3.3 times per transaction as the database grows from 300 GB to 1 TB. On that basis, an eight-processor Proliant with Microsoft Windows 2000 and SQL Server, which achieved 1,506 QphH on the 300-GB test, could be expected to produce about 456 QphH on the 1-TB test.

Sun didn't report a 300-GB test, but posts a QphH score of 4,735 for a 24 processor 6800 on the 1 TB test -- making that machine seem about 10 times faster than the Proliant. On a CPU basis alone this doesn't make sense because the Proliant offers 31 percent of the raw cycles of the Sun 6800. What made the difference was the 9.6-GB-per-second data exchange rate for the UltraSPARC CPUs in the 6800 versus the 2.4-GB-per-second for the Proliant.

Relative Power Ratings on 1-TB QphH benchmark


Since the reality is likely to fall between these two extreme cases we can conclude that the Solaris-based solution is likely to be feasible with presently available hardware -- and by implication that real costs two years from now will be lower than, and solution times shorter than, would be the case if we needed that capability today. We can, furthermore, conclude that the decision on trying to reap the cost savings available from the use of a distributed GRID-type solution to running Solver can be postponed until just before we scale up to meet the 100-airplane level -- by which time better data on algorithmic performance with respect to the problem as formulated will also be available.

Hidden risk of a big-box solution

There's a hidden danger to the airline in a big-machine solution. The Starfire choice invites downstream mismanagement because it makes it easier for the board to eventually choose a completely unsuitable CIO who will then destroy operational efficiency by doing all the things experience has taught him to do -- like hierarchal staffing, stove-pipe decision making, isolation of users from technical staff, and the imposition of rigid operational controls. These methods are appropriate to an MVS/XA environment but wholly counterproductive in a Unix one and will, if forcefully applied, first raise costs, then freeze adaptation to external change, and, eventually, kill the airline's ability to compete.

That risk comes about because machines like the Starfire 15K are key pieces in Sun's metamorphosis from Unix guerilla to data center gorilla. To get the dollars available from mainframe managers to whom a $5 million machine looks cheap, but who demand that it replicate all of their favorite VM/370 facilities, Sun has added things that use resources pointlessly but enable these people to treat the machines as if they were cheaper, faster, mainframes. As a result, the board will eventually be looking at resumes from people who are clueless about Unix, user relationships, or making money, but claim expertise with the Starfire along with 25 years or more of "progressively more senior experience" in airline data centers.

This is a much bigger issue than most people believe. Management methods are not independent of technology; the right organizational posture for an MVS-based operation is radically different from the right structure for a Unix-based solution. Resource-limited environments require careful management and control of user access to services. Unix-based infrastructures simply don't face those limits and so benefit from staffing strategies, leadership, and working relationships with the user community, that are anathema in traditional data shops. Please see my Unix Guide to Defenestration for a detailed discussion of what this means and how it works.

The core Solver application and its relationship to both the revenue cycle and operational systems constitutes the largest single component of the real-time airline operating system we're considering, but it isn't the only critical piece. SafetyJet will be sold in part on its merit as a safe airline. Part of the safety factor applies to passenger apprehension about hijacks and other malicious action affecting operations. The security systems are, therefore, both extremely sensitive and mission-critical.

The crew security plan functionally requires Solaris -- it's one reason each operating office will have a Sun V880 (or 280R) local computer. That's because Solaris lets us use Java cards with SunRays to make user identification both easier and more certain. That capability means that each local center will have locally booting SunRays that run their X environments on the local processor. That machine, in turn, will connect those X-servers to the client end running in the data centers. It is the need to make this process foolproof (and secure against man-in-the-middle attacks originating within SafetyJet) that makes the strongest argument for using the simplest possible processing architecture.

One of the things that this makes easy to implement, for example, is crew vouching and verification. The enabling SunRay feature here is that the user's card identifies a session history and is independent of terminal or location. Thus, a driver can pull his card from a SunRay in Omaha and resume exactly the same session when he plugs in again in Denver an hour later. Add biometric smarts to the card (in this case, a temperature-sensitive embedded fingerprint reader is envisaged) to uniquely identify its owner and the system becomes both hassle-free to its users and reasonably secure.

"Reasonably secure" is not, of course, quite good enough for people who are being given charge of 192 passengers and 80,000 pounds of jet fuel, so all team members with flight line responsibilities follow check-in procedures that require them to vouch for the identity of the other people checking in with them.

The security application used for this is one of only two applications in the airline that are not automatically mirrored between the two data centers. In both cases, the front-end machines used to run the SunRays randomly switch sessions between the two data centers. As a result it is essentially impossible capture a SafetyJet by stealth -- a well-designed ground assault can succeed, but the operations control center will be instantly alerted to the problem and the jet won't get off the ground even if the hijackers have trained people on board -- because the external wheel locks require that the crew chief get unlock codes from the operations center.

The other redundant application involves the passenger identification side of security and the protection is as much against official (and internal) misuse of data as it is against coordinated digital and physical attacks.

Each workstation used for pre-departure passenger check-in has both document and portrait cameras. When travel papers are required (e.g., identification for transborder flights) the departure clerk requests the document to copy key information from it into an on-screen form and, to facilitate that, puts it face up in a pre-determined position. As soon as the first field is entered on screen, the computer grabs images of both the document and the passenger for transmission to the customs service in the destination country.

Both images are also stored in our database and matched to passenger information with database access cross-reported to audit services in the two data centers as a control on misuse of the information.

The ramp-up processes

The systems development plan is based on four phases:
  1. Initial ramp-up and application development is expected to take about eight months and run in parallel with regulatory and financial negotiations.
  2. The 10-airplane operation is expected to prove the SafetyJet marketing concept and run from six to nine months.
  3. Airline ramp-up to 100-airplane operation will take about a year but systems ramp-up to that level must precede the addition of airplanes and destinations by at least a month.
  4. Team projections on full operation are to be based on a five-year financial planning horizon from the start of ramp-up.
At the end of phase-three ramp-up period (18 to 24 months from "go"), we want to have the following systems in place:
  1. An Office of the CIO to control and coordinate all IT-related activities across all three companies.
  2. About 65 IT staff split about equally between two fully redundant data centers. These will be co-located with the main operating offices for the Canadian and American airlines; i.e., most probably in Winnipeg and Minneapolis.
  3. Each data center will have a major transactions processor, around 10 terabytes of data storage, and the primary Solver but whether the OLTP and Solver operations run on one big machine or a mid-range coupled with a GRID-based Solver solution will not be decided until the hardware is actually needed.
  4. The data centers will be mutually redundant in real-time despite their separation distance. To ensure this, we will use Informix ODS with Tuxedo and route almost all transactions to both centers.
  5. Each local operating office will need 40 or so SunRay 17-inch workstations, eight to 10 printers, various scanners, radio-based local networks, eight to 10 of the 21-inch NCDs, and perhaps a dozen PDAs along with a V880 or comparable host.
  6. The distributed call center will grow to about 100 people and use 21-inch NCD smart displays with Sony video cameras in the homes of its staffers. These will have high-speed VPN connections to the local operating office host for boot and applications (OpenOffice.org) hosting purposes but open windows to their designated primary data center for reservations access.
  7. The major data centers will be connected via dedicated fiber with backup agreements in place for Internet backbone use in emergencies while all offices outside the two headquarters will be independently connected to both data centers through two separately contracted international data carriers with no common links on their routes to the centers.
  8. Almost all administrative desktops will be NC900 or similar smart displays with 21-inch screens although some, mainly in advertising, will be Macintosh G4 workstations. All passenger or crew processing stations will use 17-inch SunRay smart displays.
  9. Everyone in the company will have on-line access to Web pages (done with the statware QC E-server) showing real-time quality control charts on key indicators including average service times for the call centers, web sales, and departure desks, on-time performance, and revenue. Critical charts are displayed, live, 24 x 7, on 21-inch flat screens hanging in the entry areas of all executive offices.
  10. Full video conferencing gear -- to be used for all internal telephony and based on next generation SunForum with the real-time video streamer -- will be installed on all smart displays.
  11. Each operations control center will have redundant Ultra 60 or comparable workstations driving high intensity projectors that do nothing but graphically display the real-time GPS-reported position of all aircraft against a map overlay. If feasible, an on-demand facility will be added to allow display of real-time bus positions by metropolitan area.

(Part 2 of this story)

More Stories By Paul Murphy

Paul Murphy wrote and published 'The Unix Guide to Defenestration'. Murphy is a 20-year veteran of the IT consulting industry.

ERP Journal Authors: Amy Eager, Jason Bloomberg, Mat Mathews, Louis Nauges, Steve Mordue

Related Topics: ERP Journal on Ulitzer, Airline Information Technology

ERP Journal: Article

Virtual case study: How an airline can find efficiency with Unix

How can an airline fly today? By going way off scale with Unix. The story of SafetyJet International

(LinuxWorld) -- Editor's note: The following scenario is that of a consultant brought in early in the business planning cycle for a new airline. Unlike the other articles in this series, this example does not reflect the author's real-world consulting experience. Murphy worked for an airline, but nothing on this scale. SafetyJet is a wholly imaginary company constructed solely to illustrate three outstanding Unix characteristics:

Smart display explained
A typical smart display has a powerful graphics engine and runs Java/OS, Windows, or X clients, often concurrently.

The image shown is of a 21-inch NCD900. Companies like Sun, IBM, and Thinknic make others. Typical MTBF ratings are in the 300,000-hour range and there are neither moving parts nor user-accessible OS components to cause failures.

NCD smart display

  1. Massive scalability.
  2. Very high reliability.
  3. The ability to use maintenance-free smart displays.

In this case, the problems are real, the proposed solutions arguable, and the consulting assignment described is not based on real events.

A group of investors has reacted to recent upheaval in the airline industry by commissioning development of a business plan for a new airline. Their terms of reference to the design team, of whom I represent the IT end of things, are:

  1. The marketing message, and so the business design, will focus on safety, reliability, and ease of use.
  2. Their idea of starting small involves around ten aircraft and 1,000 people at start-up, but with very rapid growth to 100 airplanes and 10,000 employees serving every major center in the US and Canada.
  3. They expect the present state of the industry will let them wrestle significant regulatory concessions from the authorities in the United States, Canada, and at various airports. In particular, they expect to sidestep existing regulation on fare structures, freight shipment, and passenger loading procedures at airports.
  4. They expect the regulatory authorities to be willing to bend as a side effect of current airline problems and public concern -- but recognize that some changes cannot easily be made. Cabotage rules (governing the movement of passengers within a country on international flights) are set by international treaty and can't be easily waived or changed. As a result SafetyJet will legally be a resource company: SafetyJet International, that charters flights from two operating companies: Canadian SafetyJet International Charter Ltd. and American SafetyJet International Charter Inc.

On this basis, our team has agreed that:

  1. This airline will be an all-Boeing production with only 757 and 767 equipment.
  2. Preference will be given, everywhere in the company, to hiring servicemen and servicewomen. Military training and current Reserve or Guard certification will be mandatory for all drivers and senior officers. Crews will be assigned as units with, where possible, same shift returns to their home bases.
  3. The initial scope of operation will be defined, not by market surveys, but by the combination of airport agreement to proposed procedures and the availability of cheap, longer-term landing and takeoff slots as other airlines go under.
  4. SafetyJet will focus only on its own business with its customers. We will not forward checked luggage, make third party reservations, or pay commission agents.
  5. SafetyJet's aircraft will park and load in the open. Passengers will be responsible for luggage loaded onto, and taken off, aircraft loading dollies. Ground crew will externally release aircraft wheel brakes on pre-departure clearance.
  6. SafetyJet will provide transportation between destinations, not between airports. SafetyJet buses will shuttle most passengers between the aircraft and pickup points such as downtown hotels or the main passenger concourse as part of the ticket price.
  7. Where possible SafetyJet will fly into, and out of, secondary airports in major areas; using, for example, the airport at Hamilton instead of Toronto's Pearson airport as a way of avoiding congestion and related delays in or around the Toronto area.
  8. SafetyJet will be sold to the public as a fair pricing airline, not as a discount airline. Taxes will be reported as line items on ticket sales. Nominal fares will be set to yield an approximate 20 percent allocatable margin with a 33 percent revenue passenger load per flight, but actual fares will be adjusted, at time of departure, to reflect the actual load. For example, if a flight is expected to cost the airline about $22,500 inclusive, then the nominal pre-tax ticket cost would be $421.87 [ =1.2 x 22500/(.33 x 192)] but that price would be reduced to a pre-tax $183.67 when 147 people board the flight [=1.2 x 22500/147].

My job is to design and approximate the cost of an information architecture to support their vision.

Basic issues

The airline business, perhaps more than any other industry, is about large numbers. Some examples:

  1. It costs about $80 million to buy and configure one new Boeing 757-200 for mid-range operation with 192 revenue seats.
  2. Fully loaded, that 757-200 will cruise at 580 MPH while eating fuel at about 1.8 cents per passenger mile -- that's about one third the fuel cost of the average car moving at 60 MPH but adds up, for 192 passengers at 9.6 miles per minute, to just over $2,000 per hour.
  3. A simple three-legged flight from Hamilton airport to Minneapolis, to Chicago, and back to Hamilton must comply with approximately 90,000 individual regulations enforced at levels ranging from international aviation treaties to local ordinances.
  4. An extremely efficient, mid-size, airline operation with highly motivated staff will typically require about 100 people per operating aircraft.
  5. Death and taxes
    In Canada, for example, the federal government has announced a $12 per ticket "travel safety improvement" tax that perfectly illustrates this problem.

    There are several high-volume, limited-distance routes in Canada. The Edmonton to Calgary distance, for example, is about 180 miles downtown to downtown but only about 145 miles airport to airport. The local discount carrier's one-way ticket costs from $54 to about $95. Inclusive of parking and cab fares a typical day trip costs from $160 to about $220 before the new tax adds about 20 percent to the ticket price and 15 percent to the lowest net cost.

    For most people, most of the time, the end-to-end trip takes about three hours -- but that same average person can, especially if he remembers the inevitable police speed traps around Red Deer, make that same end-to-end trip by car in about 3.5 hours. At a re-imbursement rate of 27 cents per kilometer and $20 for destination parking, driving costs about $177. This compares to $184 for the lowest net cost for air transportation after the new tax is added.

    Since most people feel safer and more in control of their own schedules in their cars than in an airliner the effect of the new tax will be to push people onto the highway -- thereby penalizing the airline industry and raising the overall rate of death, injury, and property loss among people making the trip.

  6. Regulation and taxation often combine to produce bizarre results. The fuel tax rebate system, for example, combines with other federal, state and provincial taxes to make some routes markedly cheaper than others. It also produces arbitrage opportunities for airlines willing to treat fuel as paid-in freight for some cycles.
  7. Airport taxes, ticket surcharges, and related compliance costs represent, for profitable operation, the death of a thousand cuts. Depending on the specific airport and the jurisdictions it operates under, an airline can face 15 or more separate levies for each transit and up to 10 more that apply to each passenger carried. Some surcharges imposed by taxation or other authority are added as flat fees rather than being calculated as a percentage of ticket price. As a result, they disproportionally affect short haul routes where the customer might already be tempted to use alternate modes and so pull people off airlines and onto highways.
  8. Combined with sales taxes these add-on charges can often reach 45 percent of the ticket price on short haul flights and one third or more of the total on medium and longer-range services.

Airport authorities pose the biggest challenge to SafetyJet's business plan because they control access to their local markets and can invoke literally thousands of regulations to smother almost any change initiative.

Our plans will arouse their hostility because bussing passengers around airport delays reduces both their revenues and their control. This, I'm assured by the experts responsible for the project, will be the primary regulatory battleground on which the airline's potential for success or failure is going to hang.

Within this context the major IT challenges are:

  1. Managing the revenue cycle -- from customer interest to commitment, through ticket issuance, luggage tracking, pricing adjustment, making and tracking compliance payments.
  2. Resource scheduling -- everything it takes to initiate and complete a flight leg.
  3. System wide security -- including making sure that crew members are who they say they are, that all systems are clear, that traveler information is available to the authorities and connecting airlines, and that all financial and related information is fully and accurately reported.

None of this is very difficult when you have one or two small airplanes that zip back and forth on domestic routes of four hours or less. However, the resource-scheduling problem gets exponentially more complex as you scale up. By the time you get to eight airplanes, 45 daily flights, and 120 flight-crew, the problem has exceeded human capacity. At 100 airplanes, inefficiencies in the solutions used can add up to 5 percent or more of total operating costs -- more than the bottom line -- and it just gets worse as you get larger.

When airlines started to experience this complexity, in the 1920s, computers had not been invented and people simply did the best they could with manual means. By 1970, airlines had invested heavily in the use of computers for reservations but computerization, outside of military logistics planning, had yet to make inroads into scheduling. By 1980 that had changed with major airlines investing in Cray and other Supercomputer gear to attack this problem. The operations research groups created within airlines to do this were not, of course, up to solving the entire problem and so concentrated on specific subsets where they hoped to have the greatest short-term impact on profitability. As a result organizational structures and technical disciplines evolved around easily identifiable problem sub-sets. These are mainly maintenance scheduling, flight scheduling, crew rostering, and pairings (matching inbound and outbound routes to bring crews back to home bases).

Operations research explained
A NASA paper by John Usher of Mississippi State University provides a clear and easily understood problem description. The more mathematically inclined may want to check out informs.org or start with the MetaNeos project site at the Argonne National laboratories.

The classic text in the field of optimization is Harvey Wagner's Principles of Operations Research (Prentice Hall, New Jersey, 1969) although many people will find Claude McMillan's Mathematical Programming (Wiley, New York; 1970) rather easier to follow.

There has been tremendous progress in both the theory and practice behind the computation of actual solutions to various scheduling models. A problem run on a 300-MHz Sun UltraSPARC IIi takes about 83 hours to solve using one of the best available 1990 "Solvers" (CPLEX 1). The same problem with the same hardware now completes in less than 3 minutes using the latest CPLEX release. For larger problems susceptible to the best modern algorithms, the improvement on identical hardware over the decade is approximately 4,000 times.

Coupled with improvements in hardware, those algorithm gains make it possible to solve problems that were once considered unthinkably complex -- including the original integrated problem that, because it could not be solved, led to the segmented approach still deeply embedded in most airline organizations today.

Solutions

The most fundamental problems to be addressed by the technology solution are:

  1. Reliability
  2. Response speed
  3. Accuracy

    In this context, security is an aspect of reliability, and completeness an aspect of all three main requirements: reliability, speed, and accuracy.

Decisions

Initially, I expected to be able to break the IT components down into major sections each of which could then be dealt with separately through purchase and deployment of one or more commercial packages. Those beliefs, however, turned out to be extremely naive.

I had assumed, for example, that we could buy or license so called "revenue cycle" software and resource allocation software. That's the way most airlines do it but, on closer review, the two sets of problems turn out to be complementary and so logically part of one system.

The revenue cycle starts when a passenger makes a service request and ends when that service has been delivered and all consequent liabilities have been satisfied. The resource allocation process controls how that service is delivered. Basically this is just a very-large-scale ERP problem, but breaking these things up into the widely separated islands of automation needed to address them with 1970s solutions guarantees that attempts to fit them back together produce unnecessary inefficiencies.

What gradually dawned on me as we toured airline data centers and talked to both sellers and users of this software is the extent to which airline data processing traditions hold airline operators hostage. On the surface, the problems these guys deal with are huge and the solutions, experientially evolved over decades, hold together well enough. Look a bit deeper, however, and several things stand out:

  1. Data centers are often enormously expensive. Several data center operations we were toured through by hopeful vendors had staffs numbering into the thousands running hundreds of separate applications on warehouses full of fully configured mainframe gear.
  2. The defining application for the industry does not deal with reservations or resource allocation. The defining application is middleware. Everywhere we went, people pointed proudly to their MQ-Series or related implementations as switching data between half a dozen or more production MVS/XA applications and talked about using "connectivity glue" and "data marts" as if those related magically to the profitability of their airlines.
  3. The relative stability of airline data processing through bankruptcies, mergers, out-sourcing, and re-integration is striking. Many of the people we talked to had 25 years or more of experience in the industry and seemed to have done exactly the same things, with the same applications and gear, whether the data center they worked at was currently owned by an airline, an out-sourcer, or a bankruptcy trustee.
  4. Unix, although used extensively by those concerned with crew scheduling and related compute intensive work, is not widely accepted in the industry. Most of the data centers maintained a strong "them" and "us" relationship with the Unix users in scheduling and optimization.
  5. All of the available software embedded a 1950s-style view of the travel industry in which:

    • A knowledgeable travel agent mediates the relationship between airline and passenger.
    • Passengers generally have complicated multi-modal, itineraries.

    SafetyJet isn't in the business of fulfilling travel agent orders, nor do we need to route transcontinental passengers through a half dozen short hops. We work, instead, directly for the passenger and conceptualize the airline operation as nothing more than a fast link in a bus service between major downtown points of arrival and departure anywhere in Canada or the US. To SafetyJet, the fundamental service issue isn't fulfilling flight orders but getting passengers from where they are to where they want to go.

As a result, we eventually decided to recommend custom development instead of licensing in order to:

  1. Get a cheaper, easier to operate, fully integrated system.
  2. Embed a business model reflecting SafetyJet's operational plans.

That decision wasn't easy to make and will be even harder to defend if, or when, the bosses get the regulatory agreements they need to proceed and all of this suddenly acquires reality.

The key computing workloads to be addressed within the integrated database framework are:

  1. The on-line transactions processing systems, mainly for revenue cycle, maintenance, and operations (scheduling). The core OLTP problem is quite small. In full operation, SafetyJet will sell fewer than 100,000 seats per day and require something less than a million daily database transactions outside the Solver. Twenty years ago, that was a lot; today it's easily within range for a PC server.
  2. The security systems.
  3. The financial systems including tax compliance.
  4. The management information systems (including operational support systems).

There are, in addition, a range of smaller, limited-purpose systems that interact with the database framework at one remove -- i.e., with a filtration step. These include:

  1. EDI and related regulatory, competitor, and supplier links.
  2. Document management and training systems.
  3. Fuel management system.
  4. Audit services.
  5. The HR system.

The most basic of the ideas underlying the software development project is that the passenger is part of the optimization equation. Flying an airplane from one place to another, and all the organizational complexity it takes to deliver that flight, is a means to an end, not an end in itself. The airline's business is about moving people, not airplanes. In effect, the optimizer will direct minute-by-minute operations in an attempt to minimize cost while picking up, and delivering both people and freight according to a defined schedule.

In that context, our fundamental passenger story is:

  1. A passenger wants to be in some location at least until some specified date and time.
  2. He or she wants to be somewhere else at a later time and date.

The airline's job is to make that happen with a minimum of risk, effort, or complexity within the period set by the customer.

For example:

  1. A customer (or representative) checks the Web site (or phones the call center, or appears at a departure desk), chooses start/end points and a latest arrival time.
  2. The system offers one or more candidate itineraries including downtown pickup/drop off points and shows both nominal and average actual prices (before and after taxes).
  3. Customer chooses one, proffers a credit card (or contract number) for payment.
  4. Space is reserved on the buses and aircraft serving his route.
  5. Payment is cleared for the nominal total cost.
  6. Customer may be asked about luggage, food, allergy, load time, or security issues and appropriate resources allocated.
  7. Confirmations are issued and are printed, e-mailed, faxed, or held for pickup at a departure desk.
  8. Customer is offered links to third parties such as hotels and car rental agencies in the point, or points, of destination.
  9. Resources are cleared by service delivery or cancellation and re-assigned by customer-initiated change.

In a perfect world, this would be easy but, in the real world, complications can include:

  1. Passengers rarely want to get somewhere at 3:00 AM. Instead, they generally want to arrive during evenings or early mornings. That demand pattern imposes serious constraints on resource scheduling because the easiest way to meet it, having aircraft sit idle for a few hours between flights, is an airline's way of hemorrhaging cash. It costs about $25 per minute just to own the airplane. To make money, an airplane has to be in the air moving passengers.
  2. The cost of missing a 3-minute window
    The single most common cause of delay is another airline's inability to meet its landing or take-off slot commitments. In some cases that can be due to an airline using its control of specific airport traffic patterns to keep a competitor out of that airport. In most cases, delays are unavoidable side effects of "the system" at work.

    In theory, the Real-Time Airline Operating System (RTAOS) design can mitigate the overall impact of third-party delay.

    For example, a 25-minute take-off delay in a United Flight from New Orleans to Denver can intersect a 3-minute Denver landing window for a SafetyJet arriving from Winnipeg -- potentially delaying landing by 15 minutes and causing two crews, and possibly 384 passengers, to wait.

    This has obvious direct costs to the airline for things like fuel and maintenance but may also have the indirect effect of requiring that the incoming crew take a four-hour rest period -- because, without it, they would be over-time on arrival in Winnipeg. That means a different crew has to be sent -- putting three crews off-schedule and two out of place.

    If alerted early enough, Operations can burn slightly more fuel to get that aircraft into United's nominal slot -- avoiding the problem and saving money for both airlines.

  3. To keep crews functioning as integrated teams, build relationships between cabin crew and frequent flyers, and ease flight crew recruiting and retention, we need to ensure that they, to the maximum extent possible, start and end their shifts in the same place. Scale makes this much easier. It is virtually impossible to achieve with few planes and relatively long routes but quite easy to do if the airplanes are interchangeable from a crew qualification perspective and the airline has a dense mix of short and medium to long range flights.
  4. Things will go wrong every day for weather, mechanical, or human reasons. When that happens passenger schedules are affected and airline costs usually rise but flight revenues do not.

The obvious best solution, if technically feasible, to the cost trade-offs implicit in these processes is to integrate consideration of passenger (and freight) issues along with all other operating parameters in the dynamic programming model used for resource allocation.

If:

  1. The model produced results in near real-time;
  2. The airline was able to continuously track all air and ground vehicles;
  3. Good information on slot change by other airport users was quickly available to the operations center;
  4. Operations desk change orders had near real-time effect on the information provided at departure desks, on buses, and on-board the aircraft;
  5. The airline could generally expect good regulatory cooperation on minor flight plan change.

Then, operations should be able to maintain an overall schedule optimized in terms of passenger needs while giving up a minimum in cycle time, fuel, or other operational penalties.

Overall optimization for passengers does not, of course, mean individual optimization. The occasional passenger may find herself temporarily re-routed to Alaska or stranded in Saskatoon but the system would continually adjust itself to produce the best possible result for the majority of passengers the majority of the time.

As it happens, that's also the best possible result for the airline because minimizing passenger waiting times and ground travel distances is usually the same as keeping the fastest gear, i.e., the airplanes, busy earning money.

The critical design question is therefore clear: can the scheduling problem be formulated in such a way as to be sufficiently inclusive to generate usable results and yet solve in near real-time? Particularly if we define the latter as "generally less than one minute"?

Formulating the scheduling problem requires considerable expertise and an immense amount of data -- both of which should be available. Since there is no compelling reason to believe that the problem cannot be properly formulated we'll assume that it can be and concentrate on options for solving it.

The actual problem size is difficult to predict, inclusion of passenger concerns will add less complexity than might be expected because many passenger constraints are linearly dependent -- meaning that a full linear program might have 100 million rows and 200 million columns but the subset of interest will usually be at least an order of magnitude smaller on each dimension.

There are some givens in solving this. For example, the use of the Informix database with Tuxedo is a given in view of the reliability requirements for the transactions environment and the consequent need to keep the two data centers fully synchronized. Since this requirement also amounts to a Solaris specification for the primary transactions processing and database hosting jobs, the real architectural issue for the Solver lies between:

  1. Putting the scheduling problem, along with everything else that fits within the integrated database framework on a single machine in each data center.
  2. Putting the Solver on a Linux or Solaris GRID as a kind of co-processor to the main OLTP host.

    It is clear that the single machine approach would work with today's computing gear for a ten-airplane operation -- what SafetyJet will be at start-up. The real question, however, is what happens eight months or a year later: when second round financing enables explosive growth to the 100-airplane level? It would be suicidal to build the greatest piece of airline software ever, only to have to abandon it as unworkable if growth drives the problem size past hardware capabilities.

What I need to predict, therefore is whether or not the problem can be run, with a target solution time of one minute or less, on hardware available about two years from now.

The best guide to that is, of course, performance today but I don't have good information on the distribution of solution times as a function of problem complexity on this scale. It is relatively easy to predict how long it will take, for any given set of hardware, to load the problem and to run pre-solve (collapsing redundant row and column information) preparatory to invoking barrier or other algorithms to produce the optimal solution. Beyond that, however, the actual time-to-solution depends far more on the applicability of the algorithms used to the specific data set attempted than on the hardware.

The easy approach to scaling up is to add machines. There are several research projects aimed at making use of thousands of individual machines and even an off-the-shelf product such as Sun's GRID engine can be used to push the problem out across a network of cooperating machines. Our requirement for near-realtime answers means, however, that large-scale, Internet-based, compute sharing isn't going to be viable because:

  1. We can't be assured of predictable compute times.
  2. We can't be assured about computation accuracy or system availability in the presence of incompetence or hostile attack.

For us, therefore, a distributed approach means a Linux or Solaris/Intel compute farm with racks of dedicated processors.

Although the actual compute time under either hardware scenario depends more relationships in the data for a specific problem than on its size, we know that the primary predictor of relative efficiency between the two solutions is the number of iterations that require information from outside the local compute block to proceed. The more linkage has to be accounted for, the better the single-machine approach will look. Unfortunately, however, experience with smaller problems, on the order of a million rows, does not translate directly to problems with 40 million or more rows and so we won't know the answer to this until we run actual trials.

A benchmark that may describe a best case uses a real-world fluid dynamics program. This has reasonable complexity and scale along with known high separability between components and should, therefore, mark the upper limit on the effectiveness of a distributed processing solution. The benchmark data posted at Fluid.com may, therefore, provide a useful guide for our decision.

Although the numbers as presented need some interpretation, the results show that, for the largest problem benchmarked:

  1. We can assign a 1.4-GHz machine running Windows 2000 a relative score of 1.
  2. We then see that a network of 128 dedicated Linux machines, each running a 1-GHz P3, gets a relative score of 34.
  3. A single Sun Starfire running 72 CPUs at 900-MHz gets a relative score of 42.

In the time since this particular test was run, two significant changes have been made to the Starfire:

  1. Updated on-board microcode has improved memory management, including cache consistency, across multiple boards.
  2. Sun released its new MediaLib -- and pushed those results backward into the standard compilers to make far better use of the SIMD/VIS instruction set for linear algebra and related matrix operations than ever before on SPARC systems. Early results, on SunBlade workstations, indicate up to a 40 percent improvement on computationally intensive tasks -- like CPLEX or the fluid dynamics model benchmarked.

Since both of these address critical components of our problem, we can reasonably speculate that a newly configured Starfire 15K with all 106 possible CPUs installed should score around 65.This is about what you'd get if you replaced each of the gigahertz P3 boxes in the IBM X-series with a dual-CPU Xeon chasing a gigabyte of RAM at about 1.7 GHz.

Large Model Fluid Dynamic Computation -- Relative Power Ratings
Windows 2000 on 1.4-GHz P4 = 1
 
IBM Xseries = MYRINET Linux Cluster
128 Machines each with 1-GHz P3 = 34
 
Sun 15K, 72 CPUs [Actual] = 42
 
Sun 15K, 106 CPUs [Estimated; includes effect of MediaLib code] = 65
 
Data from: http://www.fluent.com/software/fluent/fl5bench/fullres.htm on 10/12/01
Windows 2000 on 8 Processor Proliant = 456 (estimated)
 
Sun 6800, 24 CPUs = 4735

 
Sun 15K, 72 CPUs [Estimated = 10,735]

 
Data from: http://www.tpc.org/ on 10/12/01

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.