Tuesday, July 30, 2013

Surely REST isn't the travelling salesman does design?

Occasionally I run across things on the web, this time tweeted by Mark Baker (@distobj), that make me read them several times.  The link he tweeted is to this page on a Nokia API and in particular this section...
Biggest advantages noticed is that the API itself becomes the documentation of the API
  • Without HATEOAS developers would get data, go to a documentation, and figure out how to build the next request
  • With HATEOAS developers learn what are the available next actions
Either of these would get me firing you I think.  I'm a big fan of documentation and I'm a big fan of design.  What I'm not a big fan of is people who treat the design process as a series of abstract steps when the only thing that matters is the next step.

Lets be clear, having clear documentation available is important.  What would be poor would be two things:
  1. Having to wait for someone to build an application before I can start writing a client for it
  2. Having documentation that is some sort of 'mazy of twisty passages' where you only see one step ahead
This to me is at the heart of the death of design in IT, the lauding of everything as architecture and the massive chasm that appears to be developing between that and writing the code.  I'm repeatedly seeing approaches which are code centric rather than design centric and the entire history of IT tells us that this isn't the best way forwards.  Don't try me on the 'I just refactor' line as if that is an answer, spending 1 day thinking about a problem and planning out the solution (design) is worth 5 days of coding and 20 days of subsequent refactoring.

I'd really like a developer to be able to map out what they want to do, be able to read the documentation in one go and understand how they are going to deliver on the design.  I don't want a developer context switching between API, code and pseudo-design all the time and getting buried in the details.

This is part of my objection to REST, the lack of that up-front work before implementation - if I have separate client and service teams I don't want to run a waterfall project of 'service finishes, start client' and if those are two separate firms I want to be able to verify which one has screwed up rather than a continual cycle of 'its in the call response' and 'it was different yesterday'.  In other words I'd like people to plan for outcomes.  This doesn't mean not using REST for implementation it just means that the design stage is still important and documentation of the interface is an exchangeable artefact.  Now if the answer is that you have a Mock of the API up front and a webcrawler can extract the API documentation into a whole coherent model then all well and good.

Because the alternative is the travelling salesman problem.  If I don't know the route to get things done and am making a decision on the quickest way one node at a time then I'm really not looking at the efficiency of the end-to-end implementation just the easiest way to do the next step.

This current trend of code centric thinking is retarding enterprise IT and impacting the maintainability of REST (and other) solutions.  This isn't a question of there being a magic new way of working that means design isn't important (there isn't) its simply a question of design being continually undermined and discarded as an important part of the process.  Both of the scenarios outlined in the article are bad, neither represents good practice.  Choosing whether your manure sandwich is on a roll or a sub doesn't change the quality of the filling.

Think first, design first, publish first... then code.



Monday, July 15, 2013

Minimum on the wire, everything in the record

I've talked before about why large canonical models are a bad idea and how MDM makes SOA, BPM and a whole lot of things easier.  This philosophy of 'minimum on the wire' helps to create more robust infrastructures that don't suffer from a fragile base class problem and better match the local variations that organisations always see.

One of the things I really like about IT however is how it starts putting new challenges and how simple approaches really make it easier to address those new challenges.  One of those is the whole Big Data challenge which I've been looking at quite a bit in the last 12 months and there is a new philosophy coming out of that which is 'store everything', not 'store everything in a big data warehouse' but 'store everything as raw data in Hadoop'.    There are really three sources of 'everything'
  1. Internal Transactional systems
  2. External Information & transactional systems
  3. Message passing systems
So we now have a really interesting situation where you can minimise what is on the wire but store much more information.  By dropping the full MDM history and cross reference into Hadoop you can use that to say exactly what the information state was at a point in time when a message was passed across the bus.  In other words you can have a full-audit trace of both the request and the impact that the request had on the system.

One of the big advantages of Hadoop is that it doesn't require you to have that big canonical model and again this is where MDM really kicks in with an advantage.  If you are looking for all transactions from a given customer you just take the system x-ref from MDM as your input and then you can do a federated Map Reduce routine to get the system specific information without having to go through a massive data mapping exercise.

Hadoop means there is even less reason to have lots flying about on the wire and even more justification for MDM being at the heart of a decent information approach.

Thursday, July 11, 2013

Google and Yahoo have it easy or why Hadoop is only part of the story

We hear lots and lots of hype at the moment around Hadoop, and it is a great technology approach, but there is also lots of talk about how this approach will win because Google and Yahoo are using it to manage their scale and thus this shows that their approach is going to win in traditional enterprises and other big data areas.

Lets be clear, I'm not saying Hadoop isn't a good answer for managing large amounts of information what I'm saying is that Hadoop is only part of the story and its arguably not the most important.  I'm also saying that Google and Yahoo have a really simple problem they are attempting to fix, in comparison with large scale enterprises and the industrial internet they've got it easy.  Sure they've got volume but what is the challenge?
  1. Gazillions of URIs and unstructured web pages
  2. Performant search
  3. Serving ads related to that search
I'm putting aside the gmails and google apps for a moment as those aren't part of this Hadoop piece, but I'd argue are, like Amazon, more appropriate reference points for enterprises looking at large scale.

So why do Google and Yahoo have it easy?

First off while its an unstructured data challenge this means that data quality isn't a challege they have to overcome.  If google serve you up a page when you search for 'Steve Jones' and you see the biology prof, sex pistols guitarist and Welsh model and you are looking for another Steve Jones you don't curse google because its the wrong person, you just start adding new terms to try and find the right one,  if Google slaps the wrong google+ profile on the results you just sigh and move on.  Google don't clear up the content.

Not worrying about data quality is just part of the not having to worry about master data and reference data challenge.  Google and Yahoo don't do any master data work or reference data work, they can't as their data sets are external.  This means they don't have to set up governance boards or operational process changes to take control of data, they don't need to get multiple stakeholders to agree on definitions and no regulator will call them to account if a search result isn't quite right.

So the first reason they have it easy is that they don't need to get people to agree.

The next reason is something that Google and Yahoo do know something about and that is performance, but here I'm not talking about search results I'm talking about transactions, the need to have a confirmed result.  Boring old things like atomic transactions and importantly the need to get back in a fast time.  Now clearly Google and Yahoo can do the speed part, but they have a wonderful advantage of not having to worry about the whole transactions stuff, sure they do email at a great scale and they can custom develop applications to within an inch of their life...  but that isn't the same as getting Siebel, SAP and old Baan system and three different SOA and EAI technologies working together.  Again there is the governance challenge and there is the 'not invented here' challenge that you can't ignore.  If SAP doesn't work the way you want... well you could waste time customising it but you are better off working to what SAP does instead.

The final reason that Google and Yahoo have it easy is talent and support.  Hadoop is great, but as I've said before companies have a Hadoop Hump problem and this is completely different to the talent engines at Google and Yahoo.  Both pride themselves on the talent they hire and that is great, but they also pay top whack and have interesting work to keep people engaged.  Enterprises just don't have that luxury, or more simply they just don't have the value to hire stellar developers and then also have those stellar developers work in support.  When you are continually tuning and improving apps like Google that makes sense, when you tend to deliver into production and hand over to a support team it makes much less sense.

So there are certainly things that enterprises can learn from Google and Yahoo but it isn't true to say that all enterprises will go that way, enterprises have different challenges and some of them are arguably significantly harder than system performance as they impact culture.  So Hadoop is great, its a good tool but just because Google and Yahoo use it doesn't mean enterprises will adopt it in the same way or indeed that the model taken with Google and Yahoo is appropriate.  We've already seen NoSQL become more SQL in the Hadoop world and we'll continue to see more and more shifts away from the 'pure' Hadoop Map Reduce vision as Enterprises leverage the economies of scale but do so to solve a different set of challenges and crucially a different culture and value network.

Google and Yahoo are green field companies, built from the ground up by IT folks.  They have it easy in comparison to the folks trying to marshall 20 business divisions each with their own sales and manufacturing folks and 40 ERPs and 100+ other systems badly connected around the world.