Wednesday, February 20, 2008

CRUD is Bad

We keep hearing that CRUD interfaces (that is interfaces with create/read/update/delete operations) are considered bad practice (an anti-pattern) in the context of SOA. But why? And how do we go about designing our service interfaces to avoid CRUD?

Let's start with the why. Services have both data and business logic. For reasons of encapsulation and loose coupling between services, we want to keep our business logic near the data upon which it operates. If our service contract permits direct manipulation of the data held within the service, this means that the business logic can leak outside the service boundary. It also means that the business logic inside the service boundary can be bypassed by direct manipulation of the service's data. All bad.

The same holds true in traditional OOP. Other classes cannot directly affect the state of a class. It is achieved through passing messages to (calling methods on) an object. This helps enforce loose coupling and high cohesion.

Even more compelling is the issue of updating multiple entities as part of a single logical atomic operation. CRUD interfaces usually will have create/read/update/delete operations for each entity housed by the service. But what if you want to update two different entities where either both updates succeed or neither?

You have the following options:

  • Use a distributed transaction
  • Implement compensation logic to handle failures yourself
  • Create new create/update/delete operations for specific combinations of entities

None of these options are satisfactory. The first option may not even be possible if the service stack doesn't support distributed transactions (e.g. ASMX, WSE). And even if it does support transactions, cross-service transactions are incredibly bad practice because services have the ability to lock each other's records which severely hurts service autonomy.

The second option is certainly not an easy task to do properly, and takes a lot of additional effort. And the third option isn't really practical. There are too many combinations of entities that may need to be updated in a single transaction, and it would take a lot of additional effort to implement them all.

Lastly, if a service must go to other services to pick up the data it is going to operate on, this means synchronous request/reply message exchanges between services. These are bad news because they are really slow and introduce temporal coupling between services (the service with the data must be available at the time the service without the data needs it).

So hopefully this is enough to convince you that CRUD is bad. But how do we design our services to avoid CRUD? Well, firstly we decentralise our data! This way all the data a service needs to operate on as part of a single logical operation is held locally within the service. Secondly, we make our service operations task centric, rather than data centric. The operations should be more like "make reservation" and "cancel reservation" rather than "retrieve reservation" and "update reservation". Udi Dahan has recently made a couple of posts discussing this very point.

A final point I'll make on this is that CRUD operations are fine inside the service boundary, so for example what you might see between a smart client and the service back end. But this point will be discussed in more detail in future posts. Stay tuned!

17 comments:

Anonymous said...

Totally Agree! One of the largest applications here in the WA government sector (not mentioning any names) has an elaborate hand-made web services layer which basically exposes CRUD to any client layer. This means that the emphasis is on the consuming client to implement and enforce business rules. Much better to expose services that implement and validate the business rules, and the client calls those business operations.

Bill said...

It seems from the situation you've described, that this is poor design within the service boundary, not at the service boundary itself.

These CRUD operations are exposed for use by the client application, which is internal to the service.

So as long as those service interfaces are not exposed to and leveraged by other services, it isn't really an SOA concern.

You are correct though in that it is better in general for the service back end to implement and enforce the business rules, rather than the client.

I'll be doing a series of posts next on defining services and the service boundary that should make things a bit clearer.

Unknown said...

I agree with your technical explanation around the subject but I would like to add that CRUD operations are simply stated: component-level interactions. Services should only expose an interface that makes sense in the business domain.

Bill said...

Yes I agree. Although one could make sense of CRUD operations in the business domain. We need to ensure that our public service interfaces are process centric rather than data centric.

Services themselves embody one or more cohesive business processes. With CRUD interfaces, we aren't accessing process level logic, we are just accessing data.

Anonymous said...

Hi Bill,

I am really enjoying your blogs.

I agree with what you are saying, however there must be a point where the client needs to retrieve data in a non-task centric way.

For example there could be a service that implements "Pay Contracted Service" that results in a series of invoices being generated.

At some point in time I am going to need to present a list of the invoices to the client so that they can determine what Contracted Services have been paid.

I understand that you could simply present this via the Datawarehouse, however sometimes this information is needed in next to real-time for subsequent task service processing.

Would it not be fair to present a operation like "Retrieve Invoices for Client" and provide control of the SLA around this operation.

Very keen to hear your thoughts.

Kind Regards,
Russ.

Bill said...

Hi Russ,

Glad to hear you're enjoying the blog!

Indeed, there are numerous instances where you need to present information to a user via a UI.

And absolutely, you wouldn't want to rely on a BI service for that purpose.

The thing to realise is that applications (tools leveraged by users) are an orthogonal concern to services. You can read more about that here.

So if a user is interacting with some kind of smart client application, and that is communicating with some kind of back end component; then the user, the smart client application and the back end component all actually sit behind the service boundary.

As such, any communication between the UI and the back end component is not considered to be at the service boundary. The back end component (although its functionality may be exposed as a Web service), does not directly participate in your overall SOA.

As such, synchronous request-reply data centric operations are okay. Just not at the service boundary.

You still want to have task-centric operations exposed by the back end component however for update operations. You need your updates to be atomic, and it's not a good idea to use distributed transactions between your smart client application and back end component.

So for example you might have a "cancel policy" operation, rather than update the policy state to cancelled locally in the smart client application and then send an "update policy" message to the back end component.

Hope that helps!
Bill

Anonymous said...

Hi Bill,

A very interesting perspective indeed.

I followed your view on tools leveraged by users are within the service boundary. However I have some conceptual problems with this (my problem not yours!!).

Take the Risk Assessment capability. What happens if they outsource part of this function (just the assessment part)?

Say that I have a Payment Service. This service boundary is concerned with a Completed Contracted Service Event and it publishes Invoices (in various states). Internally the service has Claims, Invoices and other entities. I need the consumer (interal and external) to be able to access this data in a command (request response) way.

Am I actually going to provide the R of CRUD for these entities as a separate boundary?

Hopefully some of this made sense?

Thanks for forcing me to think of these issues before it was too late!

Kind Regards,
Russ.

Bill said...

Hi Russ,

In terms of the Risk Assessment example, the service exposes a risk assessment capability to the enterprise.

The rest of the enterprise does not care how that service is implemented. That is, they don't care if it is fully automated, a smart client application, a Web application, or outsourced to another organisation.

Taking the outsourcing option, you would either provide a Web application for external risk assessors to use, or you may provide a Web service interface for an application to leverage in the other organisation.

Taking the Web service interface option; the Web service, smart client application and risk assessors all fall into the Risk Assessment service, regardless of the fact that the risk assessors are part of a separate organisation.

So in this case, we have a single service spanning organisational boundaries.

The trick here is to identify service boundaries based on capabilities provided by those services to your enterprise, rather than on technical or physical boundaries such as organisational boundaries.

Another way the risk assessment capability could be outsourced is if the other organisation hosted the entire service (front-end and back-end elements).

Where various components are hosted is irrelevant in terms of defining a service.

Just as with the Risk Assessment service described above, you can expose request-response style endpoints to be used by external parties.

Those external parties participate in the Payment service, so the request-reply endpoint is not at the service boundary. It is a private endpoint of the service that happens to sit at your organisational boundary.

Regards,
Bill

Bill said...

Russ,

One other thing to note is that the other organisation will not perceive your Web service interface as a service boundary either.

They should provide an anti-corruption layer between your Web service interface, and the interface they expose internally within their organisation to directly participate in their SOA.

Their internal interface would be expressed in their domain terms, and would be based on asynchronous messaging rather than the synchronous request-reply interface provided at your end.

And one other thing to note is that there may be a need for publish-subscribe communication between your two organisations.

For this you can develop your own WS-Eventing implementation, or potentially use Microsoft's Internet Service Bus (BizTalk Services) when that is released.

Unfortunately standards based publish-subscribe infrastructure is still relatively immature at this stage.

Regards,
Bill

Anonymous said...

Hi Bill,

Just discovered your blog and am going through some of the articles - all interesting stuff.

I'm currently working on my first large SOA implementation so lots of this is relatively new - but I have got considerable experience in EAI.

One of the things that struck me about this blog was that you finished by saying a solution was to design a service with 'make reservation' and 'cancel reservation' operations.
Forgive me if I'm being dumb but isn't that simply a rename of 'Create Reservation' and 'Delete Reservation', so actually isn't any different from CRUD at all?

Apologies if I've missed the point!!
Steve

Bill said...

Hi Steve,

The "make reservation" and "cancel reservation" operations are task-centric operations, as opposed to data-centric. This is a subtle but important distinction.

Take the "cancel reservation" operation for instance. This would very likely result in updating the status of the reservation to "cancelled". So in that sense, it is already different from a "delete reservation" CRUD operation.

Let's say hypothetically that we decided to implement an "update reservation" operation instead of a "cancel reservation" operation.

We would then be able to update any aspect of the reservation in any way we saw fit, completely bypassing the business logic of the service.

The business logic of the service would then leak out to consumers that would perform updates on service data based on their own rules.

Furthermore, the business rules of the service might be that when a reservation is cancelled, the availability schedule for the hotel room is updated.

With CRUD operations, the consumer must now perform two updates against the service - one for the reservation, and one for the hotel room availability schedule. Again, the business logic for this has leaked out to the consumer.

Furthermore, both updates would need to be made atomically, which means a distributed transaction between the consumer and provider, which is very bad practice.

With CRUD operations, you also lose the business context of the operation. The service knows an update occurred, but doesn't know why or as part of what business process.

And finally, your service contract becomes considerably less stable because updating the internal data representations will greatly impact the service contract.

Task centric operations relate to an atomic specific business-relevant operation. Data centric operations relate to entities. One task centric operation very likely will impact multiple entities.

Cheers,
Bill

Anonymous said...

Hey Bill,

Thanks for the clarification and for posting a response so quickly. What you've said makes a lot of sense now - told you I was new to this!!

Regards,
Steve

Bill said...

Always glad to help!

Unknown said...

Hi Bill,

I am very new to the whole concept of SOA architectures and best practices. Reading through some of your blogs has definitely opened my mind to new perspectives compared to other reading i've done.

My orginal plan was to implement an SOA archtecture and first point of call was to create a data access web service. This service would essentially be a gateway to the database for a number of applications (yet to be web service orientated).

Though when thinking about an implementation for this service I thought a very generic approach would be required so all applications would be able to access data through this service.

From what i've read this would essentially be a CRUD service as the interface would be exposing methods to manipulate data in a data centric way.

I was wondering what approach I should be focussing on? Should each application have it's own specific connection to the database with it's own DAL rather than utilising a generic DAS that facilitates the needs of numerous applications?

Bill said...

Hi Leigh,

The first thing you need to do is segment your organisation's business architecture into process areas that have high cohesion and loose coupling. You can utilise value chain analysis as a starting point here.

You then define a business service to support each cohesive process area.

Within each business service, you will have a number of applications that people use in support of the activities they perform within the processes in the process area of the business service.

Where you are designing new applications, you want to make sure any data used in support of one business service is not directly visible to another.

That is, services should not share data except by way of exchanging messages.

Unfortunately with SOA transformation of organisations with existing legacy applications, this constraint is rarely possible to honour completely.

However, as the architecture evolves, application instances can be chosen/developed/deployed to align with the business service model.

So what this means is that you should not have a data service which is reused across multiple business services. A data service would be an implementation detail of a business service.

Personally, for reasons of performance and transactionality I would stay clear of data services altogether.

Cheers,
Bill

Colin Jack said...

I'm loving your posts but wondered if you'd thought of putting together a "recap" one that linked to some of this sort of content, to tell the overall story of how you approach SOA?

Anyway couple of questions:


"The "make reservation" and "cancel reservation" operations are task-centric operations, as opposed to data-centric. This is a subtle but important distinction."

I was wondering how you feel about REST style where you might model this using POST to create a Reservation resource and use PUT (with cancelled set to true) to cancel it?

I get that we need the business meaning and need to avoid situatiosn where people bypass the business logic, but is the reservation itself not task oriented?



"Task centric operations relate to an atomic specific business-relevant operation. Data centric operations relate to entities. One task centric operation very likely will impact multiple entities."

I agree, but then updating/creating one resource could update other resources. For example placing a Reservation might also trigger the Customer to be seen as a preferred Customer (or whatever).


"The trick here is to identify service boundaries based on capabilities provided by those services to your enterprise, rather than on technical or physical boundaries such as organisational boundaries."

I was wondering how you personally go about this, do you recommend an approach similiar to the one Steve Jones' discusses in his InfoQ book?


"The first thing you need to do is segment your organisation's business architecture into process areas that have high cohesion and loose coupling. You can utilise value chain analysis as a starting point here."

Gues this relates to my last question, but wondering if you could recommend any resources on this topic?

Anonymous said...

Ive learnt so much from your blog, I'd love to see it compiled as a book.