The IT industry is good at inventing terms, but not good at defining them. If definitions are provided, they're often informal and may even be incompatible with the definitions used in formal research. Attempts to produce dictionaries of IT terms often suffer from the difficulty of keeping up with the public relations spin doctors' ability to innovate. This article considers the different ways some vendors use an important IT term: "Workflow."
Uses and abuses of the workflow term abound and definitions vary among vendors, practitioners, and standards bodies. Generally, though, there are three kinds of workflow:
- Stateless workflow or micro-flow
- Stateful, short-running workflow or macro-flow
- Human workflow.
Micro-flow is a key part of any integration offering; the other two are better thought of as applications. We won't confuse the issue by exploring the relationship between these kinds of workflow and what business management gurus refer to as a business process.
Stateless workflow has many aliases. IBM, in the product formerly known as MQSI, calls it message flow. Gartner calls it micro-flow. iWay calls it process flow, but a better name might be touchpoint flow. This kind of flow can be represented either by an XML document or by a boxes and lines diagram.
There are few general vocabularies for this kind of flow; it's not yet fashionable.
One example of such a workflow vocabulary is XML pipeline. Unfortunately, this hasn't caught on with vendors, even though it represents a completely standards-based approach to the problem of writing micro-flow.
The key thing about a boxes and lines diagram is that the whole flow occurs as a single unit of work; that is, every action (the boxes) has either happened or been rolled back when the flow has completed. The thread running the flow has to block and wait for actions to finish so the action is all synchronous (even with an asynchronous transport such as the IBM product formerly known as MQSeries).
There are three important scenarios for micro-flow:
- Complex adapters
- Legacy conversations.
An example of micro-flow would be a boxes and lines representation of an adapter for printing incoming messages formatted either as XML stylesheet language formatting objects (XSL-FO) or as Adobe portable document facility (PDF).
Some of the boxes are different shapes to indicate that they do different things. Each box represents something the flow can execute. MQSI calls these nodes; iWay calls them agents or executables; others call them activities or tasks. Some of the boxes are for flow control, such as a switch agent. Other agents process the document associated with the flow -such as the transform agent, which converts XSF-FO or PDF to PCL or PostScript (PS) format. There can be several documents associated with the flow if the receive is for a multi-part document. The flow we're describing here is simplified, as it only shows the actions, for the most part, for when things go right. A more realistic example would include all the things that could go wrong (for instance, if the document is neither XSL nor PDF).
The role of a touchpoint is to convert a coarse-grained business document (such as an OASIS BOD) into a series of finer-grained interactions with an application system (such as SAP R/3). This is needed because the application system wasn't necessarily written to handle the BOD. Here's how the touchpoint can be written as a flow for Oracle applications:
1. The UpdateInventoryBalance document is received at the listener, which triggers the execution of the global document, UpdateInventoryBalance.
2. The inventory adjustment transformation enriches the document by adding SQL statements to look up additional information required by the open transaction interface, but that isn't available from the OAG document, UpdateInventoryBalance.
3. Execute the enrichment by running a request against the Oracle database.
4. The next inventory adjustment transformation maps the enriched document into the appropriate columns of the Oracle confirmation and interface tables.
5. The Oracle application adapter stores the specified data into the appropriate tables and logs the response to these table inserts. Oracle's concurrent managers maintain the update and post the component issue into its system. After processing, the transactions are either removed from the interface tables or updated and marked as an error.
6. The ConfirmBOD processing takes over at this point. In one possible implementation, it will monitor a confirmation table and wait for completed transactions. Once complete, it will emit an OAG ConfirmBOD document back to the manufacturing system.
Flows are often needed when converting legacy applications to services. These applications were often written as a series of interactions with the user. Each of these interactions is a fine-grained transaction and the set of transactions is called a 'pseudo-conversation' because it represents a conversation with the user. (There are two names for a set of transactions representing a business unit of work: CICS calls it a pseudo-conversation [and differentiates a conversation from a pseudo-conversation. IMS calls it a conversation.])
To make this into a business unit of work represented by a business document, it's necessary to call the set of transactions in the pseudo-conversation in the right order and with the right data. What if one of the later transactions fails after the previous ones have been committed? That was always a possibility when the business unit of work was being managed by a person rather than implemented as a flow, so one approach is to do what the human would have done, which is to cancel the business unit of work. This is similar to canceling a multi-step wizard on a PC.
Another approach, which can be used with CICS programs, is to invoke them using what IBM refers to as 'extended calls'. When called from, for example, an adapter, a CICS program normally commits before returning. By using extended calls, the logical unit of work is kept until a call is made that explicitly commits the set of calls or rolls them back. This alternate approach is shown in Figure 1. The main difference in this approach (as contrasted with the IMS approach) is that success is emitted first and then the transaction is committed. If both the messaging and CICS calls are transactional, then this ensures that the entire transaction is atomic.
Micro-flow is a crucial part of what must occur to create transactional services out of existing legacy or application system capabilities. What about the other two kinds of flow? Macro-flow has also been given lots of other names such as 'business process workflow' or 'asynchronous workflow'. Unlike micro-flow, macro-flow isn't a single unit of work but rather a set of units of work. Each of these units of work involves an execution of one or more activities in a macro-flow instance, which then waits for the next message intended for it. Typically, when an activity completes, the state of the macro-flow instance (i.e., which activity should be executed next) must be stored somewhere so it can be executed when the next message for the macro-flow instance arrives. As there can be many instances of the same macro-flow running simultaneously, there must be some method of deciding which instance the next message is for. Such methods are called message correlation.
Figure 2 shows the general process of message correlation. The steps in message correlation are:
1. The message is read from the queue. This is a destructive read and starts a unit of work.
2. A worker thread is started that looks in the message for a correlation identifier to see if there already exists a macro-flow instance for this message.
3. The worker thread executes the next activity in the macro-flow.
4. At the end of the activity, the thread sends an output message. This is an update to the output queue and so is part of the unit of work, too. However, the message isn't physically sent until the unit of work commits.
5. Finally, the new state of the macro-flow is written to the process data store. This will contain such information as a correlation identifier, the next activity to execute, and business data that needs to be remembered. As this is an update, it's also part of the unit of work. If a different resource manager is used for the data store and the queue (usually the case), this implies that the unit of work must be two-phase to guarantee the integrity of the macro-flow instance.
There are two approaches to identifying messages for correlation. The first is to use a header with a correlation identifier. This is the approach used by e-business XML (ebXML) in its business process specification schema (BPSS) specification. The other approach is to require business data in the message to be correlated with state in the process. This is the approach taken by business process execution language for Web services (BPEL4WS). Both of these are XML standards for how to write the execution specification for business processes. Typically, standard identifiers are more often associated with business protocols (such as collaborations between trading partners in the case of BPSS) and business identifiers are more often associated with multiparty interactions.
There are some important differences between micro-flow and macro-flow:
- Macro-flow needs a database in which to remember state but micro-flow doesn't. This has implications for scaling as all the servers running a macro-flow need to share that store for read and write.
- The macro-flow needs, in general and on completion of each activity, to do a two-phase commit between the queue manager and the process data manager to ensure that recovery is automatic in the event of failure. This has implications for latency as well as scaling of the log manager.
- Macro-flows cannot use the techniques for backing out changes in the middle of the flow that were used in micro-flows. Instead, new compensating transactions have to be written to remove prior steps when a subsequent step fails. These have to be built into the definition of the process, which is why they tend to be rather more complex than micro-flows.
Figure 3 shows an example of a macro-flow, the process for trading stocks.
The stock trading process is an example of managing order flow between institutions that want to trade (such as asset managers or insurance companies) and a brokerage (such as Merrill Lynch). There are many parties to the process and multiple things happen simultaneously. The compensation scenarios for this are also extremely complex. Usually, to manage the complexity, the back office processing is separated from the front and middle office trading.
This process illustrates an important capability of a macro-flow: the ability to execute two or more activities simultaneously (in this case, the two checks for credit risk and desk limits). If both checks are passed, then the order executes. What isn't shown is what happens if the checks fail, if the order cannot be executed, or if the trade cannot be booked. Figure 4 LINK shows what the process looks like when these are added in. This is still too simple. What if the 'reverse execute' action fails or takes multiple executions? However, it's easy to see that macro-flow is as much about what happens when things go wrong as it is about the expected flow. There's one compensating transaction here: the reverse execute. It's the key to being able to run this flow.
Another example of a real business process is the accounts receivable process for credit card processing, which is shown in Figure 4. This process, again, only shows the process when all goes right. To make this an executable process, it would have to be enormously enhanced with all the compensation steps necessary to address what happens when things go wrong.
The final kind of flow is one that involves human interaction. Consider the stock order described previously. The processing shown was for the so-called 'straight-through' case where nothing goes wrong. However, in some cases (typically expected to be less than 10 percent in a good trading process), things do have wrinkles. For instance, the party ordering the transaction may have failed a credit rating. The desk limits for credit risk may have been exceeded. The exchange may not be able to fill the order. For each of these cases, it may be necessary for a human to review the order. The new process, including the human interaction, is shown in Figure 5.
There's only one new activity here: the review order. If the credit risk or desk limit checks fail, a manager is asked to review the order. If the manager agrees to the order, the process continues with execution; otherwise, the order is rejected.
Another example is the way Dun and Bradstreet fulfills company requests for credit ratings. If the request is made using an electronic document interchange (EDI) message or a batch of data, then a process is run to match the information for the company to be rated to a DUNS number in the database. If the number doesn't exist, then a new one is allocated. If a match is found, then that number is used. Both these cases are straight through, but sometimes more than one number is a possible match. In these cases, the set of matches is sent to a person to 'browse and review'.
Now, this process, usually called a human interaction, can take a long time. Whereas a micro-flow might execute in a fraction of a second and a macro-flow might execute in a few seconds, browse and review can take weeks. Perhaps the person allocated to the task is away or indolent. This is a difference of three or four orders of magnitude in the time that the dirty data might be in the data store and therefore capable of introducing unintended consequences.
Compensating transactions are a reasonable approach for backing out flow steps if the risk of unintended consequences resulting from the existence of dirty data is relatively low. Compensating transactions are often reasonable for macro-flows because macro-flows run fast enough that the dirty data is unlikely to be used inappropriately. However, they're just as often unreasonable for steps taken before a human interaction. For example, if the interaction takes place a day later, it may no longer be possible to back out the data for the prior steps. They will already have been integrated into the organization by overnight processing.
This leads to another big difference between macro-flow and human workflow. Because a human workflow can last weeks, there'll be many more of them outstanding at any time. One company that used human workflow for changing the supplier of a utility to a customer averaged more than 100,000 flows at any time. This leads to special problems in releasing new versions of the flow (each of the 100,000 instances has to be converted to the new flow or the two flows must run in parallel) and in migrating from release to release of the flow product.
The final major difference between macro-flow and human workflow is that there's now a need to be able to 'call' humans. Usually, this requires an activity manager, which can store messages for people until they logon. This kind of application needs to be able to escalate actions within departments and reallocate actions. It also needs to be able to link to the applications that enable the person to complete the action.
John Schlesinger is director of integration solutions for iWay Software. He has worked on middleware development since 1985, both at IBM where he worked on the CICS development team, and at iWay Software, where he worked on EDA/SQL and is now responsible for the iWay Business Services Engine.