Monday, October 20, 2008

The Anatomy of a Service (Part II)

With my last post, we began the journey into the anatomy of a service by restating the definition of a service in the context of SOA. In this post, we're going to dig a bit deeper into the elements that make up a service. These elements are illustrated below.


Note that the illustration above doesn't imply any kind of layering. There are three types of service elements depicted. The yellow elements are related to the service domain. This is the logic, information model and state that drives the behaviour of the service relative to the responsibilities it holds in the overall architecture.

The green elements relate to the service boundary. The service contract describes the messages managed by the service and the endpoints through which the messages are exchanged.

The blue elements are service resources. Note that the service contains human resources. This is because service logic may in part be executed by people. This is explained in more detail here. This is a very important point. Most discussions around SOA tend to describe services only as IT constructs. A service consumer is unaware and does not care whether the service functionality is provided by people at a keyboard or software running on a server.

In addition to the human resources, a service also will very likely contain software and hardware infrastructure. The hardware infrastructure refers to the physical hardware on which the service operates - e.g. servers, SANs and communication infrastructure.

The software infrastructure refers to software that supports the operation of the service, but doesn't implement the logic defined within the service domain. So for example, this software would include operating systems, application platforms and communication software.

The distinction between software infrastructure and service domain logic is an important one. Software infrastructure can be reused between services, whereas logic related to the service domain should not be. Domain logic should be implemented and deployed only once in one service. Why? Because otherwise we have low cohesion - a single concern addressed by multiple services - and that leads to coupling between services.

So why can we reuse software infrastructure between services? This is because such software is generic. It doesn't pertain to any specific service domain. Take UI rendering logic for instance. The logic required for rendering a window doesn't relate to any specific domain addressed by any service in your business.

Other examples of software infrastructure include rules engines and workflow systems. However the rules and workflows themselves would constitute service domain logic.

An interesting example is a CRM package. A CRM package comes with quite a lot of what would seem to be domain logic. For example, a CRM may be deployed in support of the Sales function of a business without a great deal of customisation. Here, a large number of the native application features will directly contribute to the service domain.

A CRM package does not however need to deal only with customers. Most CRM packages can be highly customised to hold custom entities with complex relationships. Custom logic can be added to the CRM package to implement specific business rules around these entities. In this case the CRM package is being leveraged as an application platform, and as such is not implementing the domain logic of the service.

Okay last but not least, we need to discuss the service information model and state. Pretty much every service will have state. That state will conform to the service information model and the service domain logic executes against that information model. To that end, the service information model, state and domain logic are all indelibly linked.

Note that any state leveraged by the software infrastructure is considered part of the infrastructure itself and not part of the service state in the above illustration.

Just like service domain logic, the service state and information model should not be shared between services as it introduces coupling. Services should share data only by way of message exchange, although this is not always possible when transforming an enterprise full of legacy applications to an SOA - at least initially; but more on that in a future post.

So everything in blue in the above illustration is reusable between services. The same people can participate in many services, and the same software and hardware infrastructure is reusable between services. Everything else should not be directly reused as it introduces coupling and reduces cohesion. Of course, domain logic within one service can be reused by other services through exchanging messages with that service.

Okay well that sums up the basic elements of a service. Stay tuned for my next post in this series!

Thursday, October 16, 2008

The Anatomy of a Service (Part I)

It occurred to me that to date I've been heavily focussed on defining SOA, techniques for defining service boundaries, contracts and responsibilities, and the various flavours of SOA that we see in the wild, without giving much attention to what goes on inside the service boundary. So I thought it appropriate to begin a series of blog posts on the anatomy of a service.

So let us begin by restating the definition of a service. A service (in the context of SOA) is an autonomous coarse grained unit of logic that exposes functionality by way of exchanging messages conforming to its service contract with service consumers via its endpoints.

The service contract describes the syntax (not semantics) of messages exchanged via each service endpoint, as well as the means by which messages are carried between each endpoint and service consumers. Each endpoint is located and uniquely identified by its address.

A service provider may also consume other services, and a service consumer may in fact also be a service provider. As such, the terms service provider and service consumer describe roles in a specific interaction. The rules governing communication with a service are described by its policy.

What do we mean when we say a service is autonomous? Well we mean a few things actually. Firstly, services are in control of their own state. Services are not instantiated by their consumers. A service exists as a single autonomous entity.

Secondly, we mean that services may be under different ownership domains. Those parties responsible for the service and its management may very likely be different to those responsible for other services. Services may be versioned and deployed independently of each other.

And thirdly, we mean that services should not be dependent upon the availability of other services in order to function without failure - even if that is in some limited capacity.

I'd also like to point out that there is the view that a service contract should not be devoid of semantic definition. There is the thinking that a service contract should also describe the semantics of the messages exchanged by the service, as well as the state the expected behaviour of the service.

Okay so now we've defined a service, what does a service look like? Well that's the beauty of SOA - because service implementations are encapsulated within their service boundaries, consumers don't (or shouldn't anyway) have any visibility or knowledge of the service provider's implementation.

But that doesn't really help us much as architects that need to design these things. That's the reason for this series of blog posts. We as architects need to understand the different ways to go about designing services - not just their boundaries and responsibilities, but their implementations as well.

Over the next few posts I intend to go through various different flavours of service anatomy in order to bring some clarity to the various options that exist for implementing services. So stay tuned!

Wednesday, October 1, 2008

SOA, EDA and MPI.NET

I had another good question from Miguel, this time referring to my post on SOA and EDA where he asked whether distributed systems built using MPI.NET conform to the SOA or EDA style of architecture.

MPI.NET is a platform for building a single distributed application where different parts of the application run on different nodes within a cluster of machines. Often, the same program runs on different nodes, but takes on different roles on each node. The role each program instance takes on each node can be determined by a unique rank assigned by the MPI environment to each MPI process.

Messages can then be passed between different processes to coordinate their activities. Often the program detects if it is rank zero and if so takes on the "root" responsibility, handing data and control messages to the other nodes, coordinating their activities and aggregating their results.

The message exchange patterns offered by MPI.NET are point-to-point, all-to-one (gathering data), one-to-all (broadcast) and all-to-all. These message exchange patterns differ from the message exchange pattern inherent to EDA, publish-subscribe.

With publish-subscribe, systems subscribe to specific topics or topic specifications. Messages published onto a given topic are routed to all subscribers that have subscribed to a matching topic specification. Alternatively, subscriptions may be defined based on message content-based rules.

With MPI.NET, there are no subscriptions as such. Messages may be broadcasted to every node, which then decides whether the message is relevant. This is a subtle difference, and perhaps one might argue it represents a highly simplified form of EDA.

With regards to SOA, it is extremely bad practice for services to share the same business rules or data representation as it couples services too tightly. The fundamental principle of MPI.NET is that the same logic can/will run on multiple nodes. The purpose is quite different from SOA.

With MPI.NET, you are distributing execution of the same logic across multiple servers for the purpose of increasing performance. With SOA each service is responsible for its own logic, encapsulated behind its service boundary.

Furthermore, the number of nodes allocated in an MPI.NET application is dynamic by design. With SOA, each service has specific rules and responsibilities. The number of services cannot be dynamically increased or decreased.

Certainly within a service the number of nodes allocated to handle messages could be dynamically adjusted based on load. However, these nodes are not directly addressable. They are not visible to the consumers of the service.

Another difference is that MPI.NET nodes do not have the concept of endpoints. One cannot deliver messages to an MPI.NET node across a variety of different bindings via different endpoints.

MPI.NET should be seen as a way to implementing a single service internally across multiple machines for the purpose of improving performance.

I would say therefore that MPI.NET is not a platform for implementing SOA. It could be considered as supporting the EDA style of architecture, but only in a very simplistic sense.

Sunday, September 28, 2008

Business Agility

Business agility (along with business-IT alignment) is often touted as one of the key benefits of SOA. The problem is that more often than not no explanation is given for what business agility actually is, why it is important, or how SOA contributes to achieving it.

Firstly, it is important to note that business agility is a relative goal as opposed to an absolute one. Even the most agile business can strive for greater agility.

So what is business agility? Business agility is the degree to which an organisation can effectively innovate and respond to market forces.

In any given undertaking within an organisation, there are the traditional project management constraints of time, resource, scope and quality. For a fixed scope if we wish to decrease time to delivery, we must increase resource or decrease quality. For fixed resource if we wish to decrease time to delivery, we must decrease scope or quality.

Note that scope refers to the amount of work that needs to be done to deliver the agreed outcome. But as always, there is more than one way to skin a cat. The business objectives targeted by the project can be met in any number of different ways. A talented Solution Architect can design a solution to a business problem that requires less work to deliver.

Really what we are talking about here is contrasting business value with effort. If an organisation has a highly complex IT architecture that is very tightly coupled with a large number of interdependencies, then a change will take more effort to achieve, but deliver the same business value.

In order to have an agile organisation, we must have the ability to enact change with less effort. SOA helps simplify the IT architecture of an organisation by making systems more loosely coupled. Of course, there are varying degrees of loose coupling with different SOA implementations.

The SOA architectural style reduces coupling through message based interactions conforming to explicitly defined service contracts that encapsulate the implementation details of services away from consumers. This gives us the ability to make changes to service implementations without impacting their consumers.

We reduce coupling further by designing our services with appropriate granularity and cohesion, based on publish-subscribe messaging with stable service contracts.

The IT architecture of an organisation however is only one piece of the business agility puzzle. Once a business need has been identified, the business requirements must be extracted. The speed, efficiency and accuracy with which this activity is performed also contribute to business agility. There is very little business value in delivering a solution that doesn't meet the identified business objectives.

Furthermore once we have extracted our business requirements, assuming they are accurate, we need to produce a solution architecture that meets those requirements. So an organisation's proficiency in Solution Architecture is also significant here.

We also want to ensure that as each solution is delivered, the complexity of the business and IT architecture is not adversely affected. As such, we need the proper architecture governance processes in place to protect the broader enterprise architecture.

It is important to strike the right balance with governance however. Inefficient governance processes slow the rate at which projects can be delivered, thus harming business agility. Insufficient governance however will result in the gradual increase of architectural complexity, thus harming business agility in the long run.

Simply developing an effective solution architecture is still not enough. We must of course then go about implementing the solution. Here, the effectiveness of the organisation's project and programme management function is relevant, as well as the proficiency of the people that build and deploy the solution.

And finally once the solution is ready to deploy, there must be effective change management processes in place to ensure that affected workers are properly trained and informed to support the change. If the solution is not embraced by the people it affects, then the change will be ineffective.

So why all the fuss about business agility? Simply put, an organisation that doesn't respond to its changing environment will eventually become uncompetitive. Granted that some industries are more volatile than others, however all business environments change over time. Those businesses that are able to embrace this change in order to generate competitive advantage will be more successful.

Innovation is also a key consideration in generating competitive advantage. The ability to be able to take an idea and turn it into reality in less time, with less cost and lower risk certainly generates competitive advantage.

So in conclusion, SOA as a style of architecture contributes towards business agility through reduction of enterprise architectural complexity, but by no means guarantees it.

Friday, September 26, 2008

SOA and EDA

So far I've posted a large amount of material on SOA, pushing very heavily for an event driven approach - with specific attention to business services, where business-relevant events are surfaced as event messages published over a service bus.

There has been an ongoing discussion in the public forum around the relationship between SOA and EDA (Event Driven Architecture). Are they in fact separate architectural styles? Are they separate concerns? Do they complement each other? Does SOA subsume EDA?

SOA and EDA are in fact separate architectural styles. However they describe different concerns of an architecture. They each bring their own benefits. As such, it is possible (and in fact good practice) for the two styles to overlap. An architecture can indeed both be service oriented and event driven. Likewise it is possible for an architecture to be service oriented but not event driven, or event driven and not service oriented. This is illustrated below.


An architecture consisting of services with no endpoints with a publish-subscribe binding conforms to the service oriented style of architecture, but is not event driven. An architecture consisting of messages published over topics, but where there is no notion of services is event driven, but not service oriented. For example, EAI can be achieved by having a number of applications publishing and subscribing over a set of topics, without explicitly defining any services, endpoints or service contracts.

Where SOA and EDA come together, a topic corresponds to a specific endpoint of a specific service. A topic cannot be shared between endpoints or services, and messages published onto a topic must conform to an explicitly defined service contract. SOA brings benefits to EDA and vice versa. Therefore we get the best result when the two architectural styles are combined.

Wednesday, September 17, 2008

Service Composition (continued...)

In my last post we talked about service composition in SOA. Miguel posted a good question about whether composition in SOA is analogous with composition in Component Oriented Programming. This really got me to thinking about whether composition actually has any real meaning in SOA at all.

One of the key differences between components in Component Oriented Programming and services in SOA is that components must be instantiated. A client must instantiate a component before the component is used. Furthermore, multiple instances of a component can and usually do exist at the same time.

Composition makes a lot of sense in Component Oriented Programming because we have one or more child components supporting a parent component. Every new instance of the parent component contains new instances of the child components.

Without instancing, composition wouldn't really make a great deal of sense in Component Oriented Programming. Just because a component uses or references another component, doesn't make those components a composite. What makes a set of components a composite is the lifetime of the child components being indelibly tied to that of the parent component. The children are created with the parent and destroyed when the parent is destroyed. The children have no purpose other than to meet the needs of the parent.

SOA is a different ballgame however. There is only ever one and only one instance of every service. All services for all intents and purposes are always running (except for the odd moments of downtime) until those services are retired. So what is a composite in SOA terms? If I have a service A which consumes services B and C, is A a composite of B and C? I would say no. Otherwise all services in our enterprise would simply one big composite.

I would suggest that composition in SOA refers to the bringing together of lower level services in order to support one or more higher level services. It seems we need to have services at different levels in order for composition to make sense in SOA. So if services B and C are lower level services that form part of the implementation of service A, then I would say that services B and C have been composed to form service A.

As I mentioned in my last post, with all the hype around Layered Service Models, where lower level services (like task and entity services) are composed to form higher level process services, we see service composition taking centre stage in SOA discussions far more than it really should.

As I've previously discussed, the type of reuse central to Layered Service Models (functional reuse as opposed to reuse of events) doesn't really work very well in practice. Functions exposed at the service level generally are too coarsely grained to be reused in different contexts. As such, the notion of having a registry of lower level services that can be composed together in support of higher level services is somewhat flawed.

We may have lower level services such as integration services, UI services or B2B services that support a higher level business service, but those lower level services should not be reused between the higher level services. As such, in this case the service composition is really just an implementation detail of the higher level services and as such not really architecturally noteworthy.

So I say again, service composition in SOA is only really a noteworthy concept if you're pursuing a Layered Service Model which as I've said before is not ideal.

Friday, September 12, 2008

Service Composition

Quite often we hear of service composition as one of the key benefits of the service oriented style of architecture. With service composition, we are referring to the creation of new services from wiring up of existing services in new ways to deliver new value.

According to the standard reading material, service composition is best achieved through orchestration. This is usually achieved through the use of middleware such as an ESB or integration broker. A workflow orchestrates the invocations of a number of services in order to achieve some particular outcome.

As such, we find that a service may comprise some of its own logic (including the orchestration logic), as well as a number of lower level services.

This concept of service composition is central to the Layered Service Model where lower level task and entity services are composed into higher level process services. As you will have gleaned from my previous posts on the topic, I'm not a supporter of the Layered Service Model approach. One of the issues of this approach is that lower level services are reused by many higher level services, resulting in high coupling between those higher level services.

In the past, I have highly recommended that people pursue what I call a Self-Contained Process-Centric Service Model. Here, services are centred around autonomous business functions such as Sales, Marketing and Billing. What I probably didn't emphasise about this approach however is that it refers only to the top-level service model. Each business service may be composed of lower level services.

Note that these lower level services are not business services. They are services that serve a particular function in support of the business service. They are an implementation detail of the business service. In fact, they are actually really just distributed components or integration points. However if the means of communication with the distributed component or integration point is the exchange of messages via endpoints in conformance with a service contract, then we technically have a service by the strict definition of SOA.

In a green fields implementation where the software supporting each business service is built from scratch, there may be no lower level services involved. That is, there is no service composition. Alternatively, the service software may be implemented with a smart client application that interacts with some back end components via the exchange of messages in conformance to a service contract. Thus, those back end components technically expose a service to the smart client application.

Say in support of this business service, the back end components also interact with a Web service provided by another organisation. Now we have another service added to the mix. Here we can see services being composed in support of the larger business service.

Where a business service is supported by a number of legacy applications we see even more composition. When the business service receives a message, it must invoke one or more of these legacy applications appropriately. This may be achieved by invoking Web services exposed by these applications, thus constituting further service composition.

So we see that services certainly can be composed in support of larger services. However I would hesitate to name this as a benefit of the SOA style of architecture. It is merely an implementation choice of any given service. Service composition has only come to be viewed by many as a key benefit of SOA due to the reuse promised (falsely in my opinion) by the Layered Service Model approach.

Friday, September 5, 2008

Federated Identity Session Slides

Thank you to all those who attended the .NET Community of Practice session on Thursday evening. For those who missed it, you can download the presentation slides here.

Tuesday, August 19, 2008

Federated Identity Management in a Service Oriented World

Join me as I present to the Perth .NET Community of Practice on Federated Identity Management in a Service Oriented World. The session synopsis is below:

Gone are the good old days of siloed applications that identify users with a simple username/password combination stored in the application database. In today’s world of Internet based e-commerce where secure transactions occur over insecure open networks and in a service oriented world of composite applications where identity must be shared between systems hosted by different organisations on disparate platforms; in a world where increasing numbers of businesses are turning to hosting their applications in the cloud, and where users from partner organisations need to be securely granted access to enterprise resources, architects are turning to an ever increasingly complex array of security solutions to solve their identity woes.

How do we as mere mortals make sense of PKI, Kerberos, SAML, and a plethora of WS-* standards aimed at addressing these concerns? This session will provide a clear and practical description of how to apply today’s security technologies in order to effectively manage and share identity across applications, service and organisational boundaries.


Details below:

DATE: Thursday, September 4, 5:30pm
VENUE: Excom, Level 2, 23 Barrack Street, Perth
COST: Free. All Welcome

SOA and Platform Independence

Quite often I hear platform independence as not only a key benefit of SOA, but a defining characteristic. Let me begin by saying this is simply not the case. Platform independence in the context of SOA has two connotations:

1. Services can be hosted on any platform (Windows, Linux, .NET, Java, etc)
2. Services are interoperable regardless of the platforms on which they are hosted

Firstly, let us examine what a service is from a technical standpoint. A service is an autonomous coarse grained unit of logic which external parties can communicate by way of exchanging explicitly defined messages via its endpoints. The messages and endpoints are described by the service contract. Consumers must conform to policies stipulated by a service provider in order to consume the service.

Based on the above definition, any platform that can host a process that can exchange messages over a network is capable of hosting a service. This is not a miracle of SOA, it is simply the miracle of distributed computing which was around long before the emergence of SOA.

The second thing to note about the above definition is that it does not mandate any specific transport or encoding of messages. Messages do not have to be encoded in XML. They do not have to be transported over HTTP. Service contracts do not have to be specified in WSDL. Services do not have to be natively interoperable.

Now it is true that Web service technologies greatly improve service interoperability between platforms. But services in our SOA do not have to be Web services. Even Web services by today's definition do not mandate that messages are transported over HTTP. It is quite acceptable for a Web service to be exposed over a JMS or MSMQ transport. JMS and MSMQ are not natively interoperable.

It is also true that an intermediary such as an ESB can provide protocol translation which can make services that exchange messages with incompatible transports and encodings interoperable. But again, an ESB is not a prerequisite for SOA.

I guess the commonly held view that SOA and platform independence go hand in hand has emerged from the association between SOA and Web services. But once again just to set the record straight, SOA does not mandate platform independence.

SOA does however mandate that implementation details of a service are encapsulated behind the service contract. As such, SOA certainly makes platform independence easier to achieve. We just need to overcome any incompatibilities in message transport and encoding.

Sunday, August 17, 2008

Business-IT Alignment

We hear the term (or should I call it buzz phrase?), business-IT alignment thrown around a lot these days - especially in SOA and Enterprise Architecture circles. In fact business-IT alignment is commonly named as one of the key benefits of SOA (along with business agility, which is something I'll discuss in more detail in a future post).

So what is business-IT alignment, and why is it important? What is all the fuss about? Simply put, business-IT alignment is the extent to which IT investments are made in accordance with business strategy. In the other direction, it is also the extent to which new technologies can be harnessed in order to gain competitive advantage. Furthermore, it is also the extent to which the enterprise IT architecture aligns with and supports the enterprise business architecture.

A model for business-IT alignment is illustrated below.



The elements of this model are:

Business Strategy

Johnson and Scholes in their book Exploring Corporate Strategy define strategy as "the direction and scope of an organisation over the long-term: which achieves advantage for the organisation through its configuration of resources within a challenging environment, to meet the needs of markets and to fulfil stakeholder expectations.

Expanding on this, a business strategy identifies:

  • The long term strategic objectives of the business
  • The markets in which the business should compete and the business functions required to do so
  • The ways in which those business functions can be performed in such a way to gain competitive advantage
  • The resources (people, skills, IP, technology, finance, etc) necessary in order to support the business functions and compete effectively
  • The external/environmental factors that influence the business's ability to compete effectively
  • The concerns, objectives and expectations of the various stakeholders (both internal and external) that have the power to influence the success of the business

IT Strategy

As with business strategy, IT strategy involves identifying the long term view of the IT function of the business. It identifies the direction and scope of the IT function over the long term in order to achieve advantage for the organisation through its configuration of IT resources within a challenging environment to meet the needs of the business and to fulfil stakeholder expectations.

Business Architecture

Business Architecture is the process of defining the business functions, processes, capabilities, services, roles and reporting relationships that make up a business. Here, we are referring to Business Architecture applied at the enterprise level, rather than the solution level.

IT Architecture

IT Architecture is often expressed as Information Architecture, Application Architecture and Technology Architecture. I'll describe these architecture domains in more detail in future posts. In short, IT Architecture is the process of organising IT and information assets to support the business architecture and IT strategy.



Businesses that have strong business-IT alignment tend to have the following in common:

  • IT investments can be directly linked to specific strategic business objectives.
  • The business drives all major IT initiatives in conjunction and cooperation with IT.
  • The business has an explicitly defined IT strategy which is directly linked to the business strategy.
  • IT is generally seen as an investment rather than an expense.
  • IT has direct representation in the executive leadership team and as such is present during business planning and strategy sessions.

Most businesses today leverage IT only as a support mechanism. IT is viewed as a cost centre - a means to an end, rather than a strategy enabler. This is evidenced by the fact that many organisations do not have IT representation in the executive leadership team.

We see this commonly where the CFO belongs to the executive leadership team, and the CIO reports to the CFO. In some organisations, the IT function is buried even further down in the hierarchy. Business strategy is formulated without input from or support of IT. IT is then involved as late as possible in the process only as a means of executing business strategy, rather than being involved in its formulation.

As a result of this ethos, IT has little to no visibility of the business strategy. That is, IT has no visibility of the long term business plan or objectives. Consequently, IT is limited to adding value reactively rather than proactively. The IT function is directed to deliver systems with specific tactical scope, which although being part of a broader strategic vision doesn't give IT any visibility of that vision.

This results in the delivery of IT solutions that meet all the tactical requirements, but do not necessarily align with the long term strategic objectives of the organisation.

Furthermore, many organisations do not engage in Enterprise Business Architecture practices. As such, there is no enterprise-wide functional view of the business. Usually, there is only a structural view in the form of an organisation chart. We usually find various business processes that have been documented with varying levels of detail, scope and accuracy, but no view as to how those processes fit into the bigger picture or how those processes support the business strategy.

As such, IT does not have the necessary information in order to formulate an effective enterprise IT architecture. This only further forces IT down the road of acting tactically rather than strategically. If IT has no view of the enterprise business architecture, then it cannot structure its IT assets optimally so that IT systems have high cohesion and loose coupling.

A result of delivering IT systems with only a tactical focus over an extended period of time is an enterprise IT architecture that is an unstructured mess with duplication of functionality (often manifested as many applications performing similar functions across the enterprise), suboptimal distribution of data, data locked in application silos, large numbers of complex and brittle interdependencies between systems, and poor performance and reliability of business critical IT systems.

Consequently we see an ever increasing percentage of IT budget being spent on maintaining the status quo, rather than being spent on investing in new capabilities that deliver new value to the enterprise. Clearly this position is not sustainable in a competitive market.

One of the key ingredients in achieving business-IT alignment is an Enterprise Architecture program. Businesses are already seeing significant returns from this investment, and time is running out for businesses that have not yet commenced engaging in this practice.

Wednesday, August 6, 2008

How Many Services?

One of the questions I get asked rather frequently in my travels is how many services are appropriate in any given architecture? Well this is one of those unfortunate situations where the answer is, it depends. However there are some good rules of thumb that can lead you to determine whether you're on the right track in terms of defining your service model.

In the field, the number of services we see in any given enterprise tends to vary wildly. In some organisations there is only one service - the überservice. Other organisations have stated they have well over 10,000 services. But how good are these architectures? As I've previously stated just because an architecture is service oriented doesn't make it good.

We've discussed überservices before and they are clearly not a good thing. So we definitely want more than one service. But 10,000? How on earth did they achieve that figure? Well at the end of the day the number of services you'll end up with is largely determined by the granularity of your services. Fine grained services will result in more services than coarse grained services. We've discussed service granularity before too.

Ultimately the number of services you end up with will largely be determined by the flavour of SOA you pursue. If you take the JBOWS approach (not recommended), then you could end up with any number of services. You're building services in a completely uncontrolled way that doesn't conform to any particular architecture. Usually the longer you follow this path, the more services you'll end up with. Hopefully you stop before you reach 10,000.

If you go down the road of Service Oriented Integration then you'll end up with the same number of services as you have applications you are trying to integrate. If you have 400 applications in your enterprise, you'll end up with 400 services. Again, I wouldn't recommend this approach.

Layered service models have to take the prize in this competition. Due to the incredible fine granularity of services in this model you can literally end up with thousands of services! This is yet another reason why I wouldn't recommend this approach.

And finally we have self-contained process-centric service models. With this approach services are centred around process-centric cohesive business capabilities. The number of services you will end up with will depend on the overall complexity of your business model. But about 10-20 services is a good rule of thumb. Any more than 30 or 40 and I'd be raising the alarm bells.

Another thing to note is that the size of the organisation (in terms of staffing levels) is not a major factor in determining the appropriate number of services. The number of services will grow as the complexity of the business model grows. Keep in mind though that sometimes the complexity of the business must grow in order to cater for increased staffing levels.

Tuesday, August 5, 2008

Consumer Driven Contracts

A while back a reader sent me an email asking about consumer driven contracts and how they fit into my stance on SOA. I'd intended to share my thoughts on the approach with the rest of my readers but alas it slipped my mind. Better late than never though, so I thought I'd take the opportunity to discuss the concept now.

Consumer driven contracts as an approach is intended to provide inputs into the service provider's contract design to ensure relevance, as well as provide constraints on how a service provider's contract can evolve over time such that compatibility is maintained with consumers.

However the intention should always be that the service provider contract evolves in such a way as to better express the concepts of the provider. We don't want to reduce the semantic fidelity of communications with our services.

I would say that as long as you're sticking mainly to publish-subscribe messaging (that is, avoiding command messages), then the needs of the consumer are considerably less relevant. Consumers are only informed of events as they happen. Command messages produce a subtle form of coupling where the service consumer instructs the service provider what to do.

Services that publish events do not prescribe or assume what subscribed services will do (if anything) once a notification message is received. An event message describes only an event that occurred within a service. The subscribers should really have no influence on this. As such, they should not be influencing the schema of the event messages published by the service provider.

So when using publish-subscribe messaging, I would suggest that consumer contracts are useful only in providing constraints on what can be changed in the service provider's contract in order to maintain compatibility with its consumers.

That being said, service contracts tend to be quite stable with process-centric self-contained service models - so I've not personally found consumer driven contracts worth the effort.

Consumer driven contracts really come into their own with data-centric contracts and centralised data architectures. When we always need to go and get our data from elsewhere, and all our service consumers use that data for different purposes, consumers have a substantial dependency on the service provider's schema. They will likely need to influence the provider's schema quite regularly. That in turn will affect the other consumers.

This is yet another reason to pursue process-centric service contracts with decentralised data architectures. It keeps your service contracts stable by virtue of minimising dependencies between service providers and consumers. You can read more about service contract stability here.

Monday, August 4, 2008

The Service Oriented Car Wash

I recently had the pleasure of discovering that The Open Group have an SOA Working Group, the mission of which is "to develop and foster common understanding of SOA in order to facilitate alignment between the business and information technology communities." This is a very significant step in coming to a consensus on the definition of SOA in both the business and IT domains.

The group is putting together an SOA ontology, which will hopefully go a long way towards creating a standard definition for the main elements of SOA and their relationships to each other. The draft ontology in its current form makes reference to an example business scenario based on an imaginary car washing business. The scenario is outlined below.

"Joe has a one-man business. He stands on a street corner with a sponge, a bucket of water, and a sign saying "Car Wash $5". A customer drives up to him and asks him to wash the car. Joe asks the customer for five dollars. The customer gives him five dollars. Joe washes the car, then says, "That's all done now," and the customer drives away."

The Open Group at present states that this example identifies a single service, which embodies a single repeatable business activity performed by Joe - washing the car. The customer is identified as the service consumer. The customer "consumes" the service when he or she pays the five dollars for the car to be washed.

There are however some issues with this model. Firstly, the service identified in the model consists only of a single business activity. Services modelled this way are far too finely grained, thus resulting in low cohesion and high coupling between services. A business service represents an entire cohesive business function, which may contain many processes and activities.

Secondly, the service model completely omits the Sales function. Although we are looking at a one man business, there are two business functions here (each of which translate to a business service). There are two different processes at play here – Sell Car Wash, and Wash Car. The Sell Car Wash process involves two roles – the Customer and the Salesperson. The Wash Car process involves only one role – the Car Washer. Joe just happens to hold two roles – the Salesperson and Car Washer roles.

As the model as described by The Open Group consists only of one process, it is unclear as to whether their model follows an event driven paradigm. Modelling the business architecture around business services and events provides for a far more loosely coupled and maintainable business architecture than other approaches.

The example business scenario calls for a single end-to-end business process spanning two business services. This is best represented as two business processes separately defined, but connected via a single business event which we might call Sale Completed. This is the end event of the Sell Car Wash process and the begin event of the Wash Car process. This is illustrated in the EPC diagram below.


Modelling our services this way has our Car Wash service as a consumer of our Sales service. The customer is merely a participant in the Sales business service. He or she is not a service consumer.

Now although we could indeed model our business architecture as outlined by The Open Group, to do so would produce a Business Architecture with business processes spanning multiple business areas, thus having low cohesion. This would produce a more unstable business architecture description and provide a poorer footing for our IT architecture.

Thursday, July 31, 2008

Value Chain Analysis

Recently we discussed how business services are realised in terms of business process, people, information, data, applications, and technology. A business service can be seen as a cohesive business capability that does not share process, data or business rules with any other capability.

The interface (or service contract) between business services is expressed in terms of business events. Business services are not dependent on how other business services are fulfilled, only the business events they raise. Consequently, the business architecture description is very stable. We can update process implementations without impacting any other processes so long as the high level business events are unaffected.

Our technical service components are then aligned with our business services and business events, extending this stability and loose coupling to our IT architecture.

So business capabilities are the key ingredient in determining our service model. But how do we go about identifying business capabilities? Do we simply think about what a business does at a high level and decompose? Or is there a better way?

A while back Nick Malik wrote about the relationship between capability modelling and process modelling. He posed the question as to whether these two approaches conflict, or if they are in fact complimentary. If you've been reading my recent posts on the matter you would probably guess that I feel the two approaches are very much complimentary.

Nick suggests that there are two main approaches one may take in identifying business capabilities. One is process centric where capabilities are defined around process areas within an organisation. The other suggested approach is to align business capabilities with organisational structure. In this approach, capabilities mirror the structure of a business (such as divisions, departments, business units and teams), rather than its processes.

As you've probably gleaned from my previous posts on business architecture modelling, I very much favour the process centric approach. In fact, the premise of modelling business services around capabilities relies on the capabilities being process-centric. As I mentioned above, the interface between business services is expressed in terms of business events. Business events are raised by business processes.

If a single business process were to feature in multiple capabilities (as would be the case if we modelled our capabilities around organisational structure rather than process areas), then multiple capabilities would expose the same business events. Moreover, updating a single business process would potentially involve updating the implementation of multiple business capabilities.

So when using business capabilities as a means of identifying candidate business services we must take a process-centric approach. How then do we produce a process-centric capability model of our enterprise? The answer lies with Value Chain Analysis.

Value Chain Analysis involves identifying the top level process areas (or functions) of an organisation and mapping them as a value chain. The value chain illustrates the series of top-level processes an organisation uses to take inputs from the market, transform those inputs, and then deliver value added products and/or services back to the market at a profit. These processes are synergistic in nature. The total cost of these processes is less than the value added by the organisation, thus justifying the profit margin.

A value chain is modelled in terms of primary and support activities. Primary activities are those that feature in the value chain itself. They are the processes responsible for engaging with the market and directly transforming inputs into outputs. Support activities support the primary activities.

The value chain for a typical insurance organisation is illustrated below.


There are seven primary and seven support activities in the above example. Each activity contains a number of processes that support the activity. Note that not all processes within a primary activity necessarily directly contribute to the value chain. A primary activity must only contain at least one process that contributes to the value chain.

An organisation may also have multiple value chains. Multiple value chains may arise where the organisation serves more than one market. Different markets may be served with fundamentally different processes within the organisation.

For example if we were to take two companies supplying completely different products to completely different markets and a merger were to occur, then we would end up with one organisation with two value chains. Support activities in one value chain may be primary activities in another.

The value chain depicts an end-to-end process that executes left to right. In the above example, Marketing sits at the beginning of the value chain because the business must determine what products it is going to offer before doing anything else. After determining the product mix and pricing strategy, Risk Modelling must occur to determine the means by which premium will be calculated for each product.

Armed with this information, actual pricing (rate charts) can be determined for each product. Note that this occurs in the Marketing activity. So we have transitioned back from Risk Modelling to Marketing. Feedback loops are always going to occur within value chains. The point is that the first process that had to occur (determining the product mix) lay within the Marketing activity, and thus Marketing sits before Risk Modelling in the value chain.

Now the business is in a position to start selling its insurance products to customers. This is the Sales activity. The Sales activity involves quotations, proposals, risk assessment and commission calculation. Commissions are paid to all parties involved in the distribution channel.

Once a policy has been sold, it must be written. This is a Policy Administration activity. Once the policy has been written, the customer must then be billed. Once the premium has been paid by the customer, the customer may at some point register a claim. Note that this activity is optional. A customer may never register a claim. The Customer Service function is then responsible for serving the needs of the customer until his or her policy expires.

A value chain gives us a functional model of our organisation. That is, it models the functions an organisation performs without consideration for how they are performed. This is what gives the model its stability.

Later on as we drill down into the lower level processes supporting each activity in the value chain, we start getting into the implementation details. We start modelling the exact sequence of actions performed and the roles that are responsible for those actions.

It is through defining these lower level processes that we in fact are able to define the roles within the organisation and the responsibilities and KPIs for each role. Armed with this, we are then in a position to determine the organisational structure - that is, the reporting relationships between the defined roles.

This is illustrated below.


As lower level processes change within an organisation, so may the roles and responsibilities and by extension the organisational structure. This is further evidence of why we should model capabilities around business functions rather than organisational structure. It provides a considerably more stable model.

This activity of modelling the value chain has forced us to consider the entire business in a holistic and process centric way. As such it is an excellent tool for identifying our top level process centric business capabilities that we can then use to define our business service model.

Saturday, July 19, 2008

How Did You Get Started in Software Development?

Rhys Campbell tagged me recently asking how I got started with developing software. Although I don't personally write software these days, I've certainly done my share of it in the past and I currently manage Change Corporation's body of knowledge in the area.

So let's get started:

How old were you when you started programming?

I can't really remember it was so long ago. But I think I was about 16 years old. I started writing "demos", which were effectively programs that did various visual effects (like flying through tunnels and what have you). The demos were entered into demo competitions. Very nerdy really.

I also wrote a music player application that played music stored in the S3M format. If anyone can remember, that was the Scream Tracker file format. The player was written in 80386 assembler and natively managed Expanded Memory (if anyone can remember that from the days of the 286 computer), hardware interrupts, and direct memory access (DMA). The player did real time digital mixing.

What was your first programming language?

I first started writing programs in C, but soon moved into 80386 assembler, and C++. Eventually I got into Visual Objects, Visual Basic, Java and then C#. I've also dabbled in Smalltalk and Eiffel.

What was the first real program you wrote?

I suppose this question depends on the definition of "real". The first commercial application I wrote was a consignment management system I think for a freight company. It was so long ago I can't really remember - but it was written in Visual Objects.

If you knew then what you know now would you have started programming?

Well I'm not really programming at all these days, so I guess if I knew then what I know now then I'd be doing Enterprise Architecture. Although it would be hard to get businesses to take me seriously in that capacity as a 16 year old. If I knew I would eventually end up doing Enterprise Architecture, then yes I would indeed have started programming as it eventually got me to where I wanted to be.

What’s the most fun you’ve ever had... programming?

Hard to say. There's always been good times and bad. I can't say that I've ever found commercial programming fun exactly. I've always found it rewarding and enjoyable. I would say that the demo programs I wrote many years ago were fun to write. I wasn't being paid for it, so they must have been. :-)

Sunday, July 13, 2008

Implementing Business Services

In my last post we discussed business capabilities and how they form the basis of how we identify candidate business services. But what is the physical manifestation of a business service? Once we've determined the boundaries of our business services and their responsibilities, how do we go about implementing them?

A business service is realised as a set of business processes and business rules, the people that perform those processes, the applications leveraged by those people in support of their roles in those business processes, the information leveraged by decision makers within those business processes, as well as the information as inputs and outputs of those processes, the semantics and structure of data held within the service, and the technology platforms on which the IT systems supporting the business service execute.

Business processes in different services coordinate their activities via business events. These business events define the interface (or service contracts) between business services.

A business service may contain no IT systems whatsoever. Where there are IT systems supporting a business service, they may not at all support the interface with the rest of the business. A business event may be realised by a simple telephone call or a memo sent between people participating in different business services.

Where IT systems do support the interface with the rest of the business, we have software components in different business services exchanging data. In the SOA style of architecture, this data exchange is performed by way of exchanging explicitly defined messages in conformance with a service contract.

Where we design our software components around business services (rather than taking the JBOWS, Service Oriented Integration or Layered service Model approach), business events are realised as event messages published onto our service bus. The technical service contract then consists of the schema of the messages published by the service onto one or more topics.

Inside the boundary of our business service, we may have one or more software components that exchange data via messages. For example, a smart client application supporting a business service may interface with one or more back end components using SOAP based messages.

These interactions that occur behind the service boundary are an implementation detail of the service itself. These design aspects are local to the business service and as such do not influence the business service contract or play a part in the broader architecture.

In conclusion, the way in which business services are manifested in terms of IT systems depends on the flavour of SOA chosen. By aligning our technical service contracts with business services and business events we achieve considerably better structural alignment between the business and the IT systems that support it.

Tuesday, July 8, 2008

Business Capabilities

In my last post, I mentioned that we can achieve very stable service contracts by aligning our service boundaries with business capabilities. But this then begs the question, what is a business capability? In short, a business capability is something that an organisation does that contributes in some way towards the overall function performed by the organisation.

The advantage of business capabilities is their remarkable level of stability. If we take a typical insurance organisation, it will likely have sales, marketing, policy administration, claims management, risk assessment, billing, payments, customer service, human resource management, rate management, document management, channel management, commissions management, compliance, IT support and human task management capabilities. In fact, any insurance organisation will very likely have many of these capabilities.

This means that any organisation operating within the same vertical industry will have a remarkably similar capability map. This is a result of the fact that what these organisations do and the services they offer are fundamentally the same. What differs from one organisation to the next is how these capabilities are implemented.

That is, the business processes, applications, data, technologies and human resources (including their roles, skills and knowledge) that support each capability will differ from one organisation to the next; but the nature of the capabilities themselves, the value they deliver, and their core responsibilities will not vary.

This means that whether a capability is implemented as a series of completely manual processes executed by employees, whether the capability is completely or partially outsourced, or whether the capability is fully automated with IT systems does not change the nature of the capability itself. Other capabilities within the organisation are concerned only with the fact that the capability is performed, not how it is performed.

As such we can change the implementation of a business capability without affecting any other capabilities supporting the organisation. This makes for a considerably more stable business architecture description. One problem commonly faced by business architects is that the organisation continues to change and adapt during the process of mapping out the business processes.

This is in part due to the fact that process maps in a typical business architecture tend to span multiple business capabilities. This significantly increases the number of process maps that contain implementation details of a single business capability. Consequently when we change the implementation of a business capability, there is a larger number of process maps that need to be updated.

By limiting the scope of our process maps to a single business capability, we substantially reduce the number of process maps affected by a change in the implementation of a capability. As a result, our business architecture description is more stable and by extension more manageable.

Business capabilities are hierarchical in nature. We can decompose a business capability into smaller capabilities that support the broader capability. We only decompose a capability into smaller capabilities where there is sufficient distinctiveness between those sub-capabilities to warrant separate and distinct process maps to describe those sub-capabilities.

Of course there are business processes that span multiple business capabilities. However, those processes are realised as event driven process chains. Each business process that executes within a business capability has the ability to raise business relevant events. Other business processes that execute within other capabilities subscribe to these events in order to trigger the execution of these processes.

As such, end-to-end business processes are realised implicitly by virtue of the event publications and subscriptions between processes defined within each business capability. Note that this notion of business event publication and subscription is not a technology concern. It is simply a stable means of describing how business processes execute within an enterprise.

Now, this is a really nice way of modelling a business architecture. But what is the relevance to SOA? Well, when we apply the SOA style of architecture to the business architecture domain we are concerned with business services. A business service is modelled around a business capability that has an appropriate level of coupling and cohesion.

If we choose a business capability that addresses too many unrelated concerns, then we will have low cohesion. If we choose a business capability that is too finely grained, we will have tight coupling between services as related concerns will be distributed across multiple business services.

So here we see that business capability mapping is an extremely valuable tool for deriving the business service model for an organisation. It is important to note that this entire discussion falls within the business architecture domain. The technical components that support a business service fall into the domain of the application and technology architecture.

We find that the business service model is an incredibly valuable tool for achieving alignment between the business and IT domains. This is because this single model has equal relevance in both domains.

Monday, July 7, 2008

Service Contract Stability

SOA as a style of architecture reduces coupling between services by mandating that services have no knowledge of each other's implementations. Services communicate only by way of exchanging messages that conform to service contracts. As such, service consumers are dependent only on a service provider's service contract, not its implementation.

However if we do not take particular care in designing our service boundaries and responsibilities, and by extension its service contract, then we run the risk that our service contracts may themselves be highly dependent on a service's implementation. One such example of this is white box services.

Consequently, we want to align our service contracts with concepts that are very stable. With a JBOWS approach, services are exposed somewhat arbitrarily. They do not contribute towards any defined broader architecture. Therefore service contracts will have an arbitrary level of stability.

Next on the list is Service Oriented Integration. With this approach, service contracts are centred around applications. They are expressed in terms of the application with which we are integrating. Consequently if the scope of the application changes, or the application itself is exchanged for another (such as exchanging an Onyx CRM with MS CRM), the service contract is very likely to change.

So what about layered service models? In this case, we have atomic business tasks, business processes and data stores abstracted as services. These concepts are not at all stable. Businesses very often make changes in business processes that in turn require changes in how atomic business activities are defined and how data is represented. With this approach we are very likely to find ourselves changing our service contracts as business processes are updated.

But what about business capabilities? Business capabilities are by their very nature incredibly stable. Although a retail organisation may make regular changes as to how it goes about inventory management, the fact is that it will always have an inventory management capability. Moreover, other capabilities within the enterprise don't care how inventory management is performed. They only care that it is performed.

Consequently, aligning our service boundaries (and by extension our service contracts) with business capabilities gives us an incredible amount of stability. This is the basis of the self-contained process centric-service model introduced in an earlier post.

By defining our services around business capabilities we achieve greater business agility by way of being able to update how business functions are performed without influencing other services in our enterprise.

Sunday, July 6, 2008

Transactional Services (continued...)

In my last post we discussed the concept of transactional services and how they ensure business actions can be executed atomically within a service by enrolling local queue and/or topic resources into a distributed transaction. What I did not explain however is why queues and topics are essential for supporting transactional services.

Queues and topics are necessary to support transactional services because that both types of transports queue messages while they await delivery. Although the transport may not store messages durably, they are still stored somewhere until they are delivered. Neither the sender nor receiver holds onto the message during its transport.

So, would it be possible to implement transactional services using a non-queued transport? For example, let us assume we wanted to use WS-AtomicTransaction (WS-AT) over an HTTP transport. As it turns out, it is not possible to achieve this without spanning the distributed transaction across your service boundaries. As we all know, spanning transactions across service boundaries is not a good idea as it hurts performance and reliability.

The reason for this is that with queued transports, messages are delivered to the receiver using a pull based mechanism. Messages are first pushed to the receiving queue, after which they wait to be read by the receiver at a later time. This means that the sending service can complete and commit its transaction without concerning itself with the success or failure of the receiving service.

Moreover, queued transports will cache the outbound message locally making the availability of the receiver irrelevant in committing the local transaction. Without this mechanism for storing messages as they are transported between services, the successful transport of the message as well as the successful execution of the operation at the receiver become relevant as to whether the local transaction can be committed at the sender.

Consequently, the sender must wait for a response from the receiver before committing the transaction - during which time local resources will likely be locked as part of the local transaction.

So although transactional services are an incredibly powerful tool in service design, they do unfortunately require the use of a queued transport.

Tuesday, June 17, 2008

Transactional Services

In the realm of systems that manage a persistent store of some kind (for example a database, a queue, a topic or a file system), a transaction is defined as being an atomic (indivisible) unit of work. The transaction manager ensures that all data manipulation operations that occur as part of a transaction complete successfully or not at all.

Consider the classic example of transferring money from one account to another. This operation involves reducing the balance of one account as well as increasing the balance of another. It is critical that either both operations succeed or both fail. For only one to succeed would leave the persistent store in an inconsistent state.

Transaction managers also provide features to manage concurrency - that is multiple simultaneous transactions occurring in parallel. Without such mechanisms in place, data inconsistency may result from effects such as race conditions.

A race condition describes a flaw in a system where the sequence or timing of two or more concurrent transactions causes the data held in the persistent store to enter an inconsistent state. Considering our money transfer example again, a race condition could occur if two transactions simultaneously read the balance of the first account, deduct the transfer amount and then update account balance.

So for example, let's say the account has a balance of $100 and we wish to transfer $10 to another account. Both transactions first read the current balance ($100), deduct the transfer amount ($10), and then update the balance to $90. Obviously the balance should be $80 after both transfer operations complete.

Transaction managers prevent these conditions from occurring by enforcing isolation between transactions. This is most often achieved through the application of locks. In our money transfer example, the first transaction will apply an "update lock" to the account balance which will prevent the second transaction from reading the balance until the first transaction has completed.

The final property enforced by a transaction manager is durability, which ensures that data is not lost or left in an inconsistent state as a result of a system failure. When a transaction manager starts up again after a failure, all incomplete transactions are rolled back. The transaction manager ensures that all successfully completed transactions are committed to durable storage.

These properties of atomicity, consistency, isolation and durability are abbreviated as ACID properties. Sometimes it is necessary for these properties to be enforced across two or more transactional persistent stores. This is achieved by enrolling the transaction managers of these stores in a single distributed transaction.

A two-phase commit algorithm is often used to support distributed transactions. However, as this approach may involve locking resources in the various persistent stores involved in the distributed transaction in order to preserve ACID properties), it is not appropriate to be used across service boundaries.

Services are autonomous and as such cannot be relied upon to complete operations within a reasonable period of time. We cannot allow the resources of one service to be locked while waiting for another service to signal whether it has successfully or unsuccessfully completed its operation.

That being said, distributed transactions are extremely useful within the service boundary. Consider a service that persists its state in a database, receives messages off one or more queues and/or topics, as well as sends and/or publishes messages.

Quite often, a service will perform some updates in one or more databases, and then send or publish one or messages in response to receiving a message from a queue or topic. If a failure occurs anywhere during this process, we want to ensure that we don't lose the inbound message, all database updates are rolled back, and we don't have any outbound messages escape.

This is achieved by way of enrolling the queue or topic from which the inbound message was read, any databases where the service performed updates, as well as any queues or topics onto which messages were sent or published during the operation into a single distributed transaction.

Any message read off a queue or topic is placed back onto the queue or topic as a result of failure, any outbound messages are erased and all database updates are rolled back.

So this gives us a great deal of robustness when it comes to handling failures that occur as part of a single operation within a service. But what about workflows that occur across services? If one part of a workflow fails, we very likely will need to take appropriate action in other services involved in the workflow. This is known as compensation logic.

Transaction managers deal with failures by rolling back changes that occur during a failed transaction. At the cross-service level however this action would not always be appropriate. Consider a Shipping service responsible for mailing order items to customers.

If an action performed by another service as part of this workflow fails, we wouldn't want the Shipping service to erase all record of the shipment. The package has already been physically shipped. We can't roll that back!

As a result of this, we manually implement logic within services to compensate for failures within other services as part of the same logical workflow. The appropriate compensation logic more often than not is a matter for the business to decide.

The logic will be often different for every service and every scenario, so it must be explicitly defined in the business requirements. Different compensation logic may also be necessary as a result of different failure conditions.

The need for manual compensation logic is considerably reduced with a self-contained process-centric service model. This flavour of SOA means that services hold all data they need to service any request locally. As such, all data updates are local to the service and can be protected by a distributed transaction inside the service boundary.

So, ACID transactions are a fantastic tool to be leveraged within the service boundary to help us build services that are robust and tolerate failures. They should not however be applied across services. Here, we must rely on manual compensation logic.

Monday, June 16, 2008

Reliable Messaging

Recently, we discussed the use of idempotent messages as a strategy for achieving reliable message delivery over an unreliable transport such as HTTP. The goal of idempotent messages is to eliminate any side effect from receipt of duplicate messages so that the sending party can retransmit messages for which no receipt acknowledgement was received without fear of the retransmitted message causing problems at the receiver.

The problem with relying upon idempotent messages of course is the effort involved in writing the retransmission logic for every endpoint sending messages reliably over an unreliable channel. It can also in some cases take quite a lot of effort to design and write systems that compensate for operations that are not naturally idempotent such that they become idempotent.

As such, we want to leverage reliable transports where possible so we prevent idempotence and retransmission concerns leaking into our application logic. Reliable transports handle retries and eliminate duplicate messages for us as part of the communication infrastructure.

Furthermore in situations where it is possible that messages arrive out of order (perhaps as a result of being routed by one or more intermediaries), reliable transports are capable of reordering messages such that they are delivered in the order in which they were sent.

A problem with reliable transports however is that they tend to be platform specific (such as MSMQ, available only on the Windows platform). Fortunately a standard reliable messaging specification has been defined, WS-ReliableMessaging (WS-RM). This specification falls under the WS-* group of specifications.

The catch though is that it is left up to the WS-RM implementer to decide what kinds of delivery assurances the WS-RM stack supports and will enforce. For example, the number of attempts a sending party makes to send a message before giving up is a matter of configuration at the sender. What the messaging infrastructure does with messages that fail to be delivered is out of scope for the WS-RM specification.

Whether the receiving party holds out of order messages aside such that they can be dispatched to message handlers in order is matter of how the receiving WS-RM stack is implemented and/or configured. It makes no difference to the messages that are transmitted over the wire.

The same applies for whether messages are placed in a durable store at the sender before being sent, or whether they are placed in a durable store at the receiver before being dispatched to the message handler. This makes sense if you think about it. There is no way that a service provider could enforce that its consumers store messages durably before forwarding them onto the provider.

The best we can achieve is that the service provider and its consumers are able to make claims about delivery assurances. This is achieved with WS-Policy assertions. Although WS-Policy assertions have been defined for some delivery assurances, none have yet been defined to make claims about durable messaging.

So we need to be aware when using WS-RM that either endpoint may or may not be storing messages durably. This means that if a service provider or consumer process crashes, a message could potentially be lost.

Microsoft WCF does not support durable messaging with WS-RM at all. Durable messaging with WCF is achievable only by using the MSMQ transport. In my opinion, this severely limits the usefulness of WCF's WS-RM implementation.

Another limitation of WS-RM is that it is not at present universally supported by all SOAP stacks as it is a relatively new specification. Where it is supported, there are no guarantees of what delivery assurances are enforced by the interacting parties.

That being said, where reliable messaging is required between services on disparate platforms and WS-RM is available, it certainly beats a raw HTTP transport.

Thursday, June 12, 2008

Outsourcing Business Capabilities (continued...)

Continuing my recent post on outsourcing business capabilities to third parties, I wanted to extend the example to include a Shipping service and a Billing service which outsources its billing function to PayPal.

If you recall from last time, our online sales channel (part of the Sales service) was outsourced to eBay. When a customer places an order on eBay, eBay needs to inform us of the order details. We achieve this by setting up a local Web service which is invoked by eBay whenever an order is placed.

eBay however does not guarantee delivery of this notification message. They do however provide a Web service we can interrogate in order to retrieve order details on demand. This involves a synchronous request-reply message exchange over an HTTP transport.

PayPal also provides a notification mechanism whereby PayPal invokes a Web service hosted by our organisation. A notification is sent whenever a payment is processed. Unlike eBay, PayPal does guarantee notification delivery.

Unfortunately, HTTP is not a guaranteed message delivery transport. Connection failures may occur. As a result, PayPal will keep sending the notification message until it receives confirmation that the message was successfully processed in the response from our Web service operation.

If our response message back to PayPal is lost somewhere along the way, we'll end up receiving duplicate notification messages. So we need to make sure that the service logic that handles the notification is idempotent.

We want to abstract away the details of these Web service interactions from other services in our enterprise. As far as the online sales channel is concerned, this is achieved by way of publishing a NewSaleNotification message within our organisation when we receive a sale notification from eBay via our Web service. The Sales service also stores a record of the sale in its database.

The eBay notification however is not guaranteed to arrive, so we're going to have to find a way of dealing with that. Let's deal with that later though and look to the Billing service. PayPal sends us a notification which I'll assume will contain the order number and a payment number.

When we receive a payment notification PayPal, we save the payment in the Billing service database and then publish a PaymentReceivedNotification. In order to protect ourselves from duplicate notifications, before publishing our PaymentReceivedNotification we first check to see if we already have a payment with the given payment number in the database. If it is already present, then we disregard the message.

The Sales service needs to be subscribed to the "payment received" event. When it receives notification of this event, it checks to see if an order with the given order number is in its database. If not, it makes a request to eBay using the eBay Web service to retrieve the order details and then saves the order in the database. The service then stores a record of the payment against the order and raises an OrderPaidNotification.

By virtue of the "order paid" event, we have abstracted away all the complexities associated with compensating for inadequate service level agreements of third parties. We can then subscribe the Shipping service to the "order paid" event (which would contain the full order and payment details) so that it can arrange shipment of the order once it has been paid.

Note that with this architecture the contracts of the Sales and Billing services are devoid of any details concerning the third party organisations eBay and PayPal. This means we can replace these suppliers without impacting the remainder of our architecture.

It also means we decouple our other services from the Web service contracts exposed by third party organisations. This is very important as we will have little to no influence on whether or how often these contracts change. The abstraction layer limits the impact of these changes to the boundary of the service interacting with the third party.

Tuesday, June 10, 2008

Idempotent Messages

I know in my last post I said we'd be continuing our outsourcing example. However before doing so I need to explain the concept of idempotent messages (you'll understand why when you read my next post).

Idempotence is actually not so much a property of the message, but a property of how the message is handled by the receiving service. A message is idempotent if the service operation that processes it yields the same result regardless of the number of times the message is received.

Some operations are idempotent by nature, whereas others require special treatment in order to become idempotent. Read only operations by their very nature are idempotent because they don't have any lasting effect. An "update customer" operation is idempotent because no matter how many times you update the customer with the same information, it yields the same result.

Operations such as "transfer $100 from account X to account Y" however are not idempotent. If the same message is replayed 10 times, then $1,000 will be transferred over 10 transactions. In these cases we need a mechanism to detect the duplicate messages and ignore them.

In some cases duplicates are easy to detect. For example, if we receive a ShipOrderRequest message containing an order number and store the order number in the Shipping service database, then all we need to do when receiving a ShipOrderRequest message is check the Shipping database for the given order number and if found, disregard the request.

Some scenarios require a bit more effort from the service consumer. Consider the account transfer operation described above. In this case, there is nothing in the message to identify that we have already processed that message. We cannot differentiate between a duplicate and another legitimate request to transfer $100 between the same accounts.

In such cases what we do is require that the service consumer place a unique message ID in each request message. A GUID works well for this. The receiving service can then store the message ID against the resultant account transfer transaction record in the database. Before processing a message, the receiving service checks the transaction table to see if the given message ID is already present. If so the request message is discarded.

So why go to all this effort? Under what circumstances do we need idempotent messages? Well so far in our discussions to date I have assumed the use of a transactional guaranteed message delivery transport (such as MSMQ). Such transports handle the detection and removal of duplicate messages as part of the messaging infrastructure.

Furthermore a transactional transport allows us to remove a message from a queue or topic as part of a broader distributed transaction. This means that the message is not lost if the service fails to process it. A failure results in the message being placed back on the queue or topic. I'll cover transactional services in more detail in a future post.

However such transports are not always available. For example when integrating with third party organisations, we generally tend to rely on Web services over an HTTP transport. HTTP does not guarantee delivery.

The problem with this is that when a failure occurs (e.g. the connection fails), the service consumer cannot determine whether or not the request message was actually successfully delivered and processed. Now for some situations, losing a message isn't very important. For example if someone is sending us weather updates every minute, it may not matter if we lose one because there'll be another along shortly.

However for other situations, we require a guaranteed message delivery service level agreement. This is only achievable over an unreliable transport if the consumer resends the message over and over until it receives confirmation from the service provider in the form of a response message that the original message has been successfully processed.

Now this is fine if the message is lost en route to the service provider. But what if the message was successfully processed and the confirmation response message is lost on its way back to the service consumer? The consumer will resend the request message and the service provider will receive and process the message twice.

When the operation performed by the service provider in response to receiving this message is not naturally idempotent, the service provider must detect the duplicate message and disregard it.

Of course this is a lot of extra effort to go to when implementing your service logic. So use transactional guaranteed delivery transports where available and appropriate. They'll save you a lot of time.

Friday, June 6, 2008

Outsourcing Business Capabilities

One of the commonly cited benefits of SOA is it gives organisations greater flexibility in outsourcing business capabilities. This is by virtue of the fact that organisations can leverage Web services as a foundation technology for B2B communications across organisational boundaries.

However one common misconception that exists is that a Web service interface that sits at the organisational boundary coincides with the boundary of a business service. This is in fact often not the case.

Consider an online retail business that sells products via an online store. Let's assume that the business also accepts orders by mail (either by snail mail or email) and telephone. The Sales service would include the online store Web application, as well as some kind of internal application leveraged by call centre operators that process orders by mail and telephone.

A possible Sales service architecture is illustrated below.


So what happens if at some point the business decides it can get better value from outsourcing its online sales channel to eBay? Well clearly the entire Sales service has not been outsourced. We end up with an architecture similar to that illustrated below.


Here we have a single service spanning organisational boundaries. The interaction between eBay and the components still hosted in house occurs inside the Sales service, but across organisational boundaries. The service contract of the Sales service remains unchanged.

No other service in our business need know that the online sales channel has been outsourced. Just as importantly, no service is dependent specifically on eBay. If at some point we decided to replace eBay with another provider, this would constitute only a change in the implementation of our Sales service.

Moreover, if we decided to branch out and leverage multiple third party online sales channels, this would involve only a change in the implementation of our Sales service.

Just because eBay exposes a Web service interface as an integration point for retail businesses doesn't mean that we should expose that interface directly to our other services in our enterprise. eBay's Web service interface is designed as a point of integration. No more, no less.

Business capabilities are unique to each business. Before outsourcing to eBay, our organisation had its own distinctive sales processes that evolved independently of sales processes in other organisations. Although there may be similarities, there will always be subtle differences.

Furthermore, our organisation will wish to retain the ability to tailor and evolve its sales processes as it sees fit. The business certainly won't appreciate terms being dictated by eBay.

We also want to be able to control the service level agreements (SLAs) (such as performance and reliability) upheld by our Sales service. eBay's Web service interface is based on synchronous request-reply interactions over an unreliable network (the Internet), effectively meaning there are no guarantees of availability or performance. Obviously we cannot expose other services within our enterprise to such poor SLAs.

Something else to consider is that eBay's Web service interface is potentially subject to change. We need to shield our other services from such potential changes. As such, we certainly don't want to couple our services directly to eBay's Web service contract.

Furthermore, what if we wish to outsource our online sales channel to a business that isn't quite so technologically savvy, providing an interface in the form of CSV files transferred via FTP? We certainly can't directly expose that as a service within our enterprise. What if we wish to partner with an organisation that offers only a REST based interface?

The point I am making is that people get tempted to directly expose Web services (of either of COTS applications or partner organisations) to other services within their organisations, simply because they are Web services. Do not do this. The provision of Web services is entirely coincidental and solely a result of the need for interoperability.

What is needed is a layer of abstraction between the partner organisation's Web service interface and the service contract exposed within our enterprise. This layer of abstraction gives our organisation the flexibility to have control over its sales processes and SLAs.

More on this example in my next post, so stay tuned!