Wednesday, January 29, 2014

There is no Big Data without Fast Data

Which came first Big Data or Fast Data?  If you go from a hype perspective you'd be thinking Hadoop and Big Data are the first with in-memory and fast coming after it.  The reality though is the other way around and comes from a simple question:
Where do you think all that Big Data came from?
When you look around at the massive Big Data sources out there, Facebook, Twitter, sensor data, clickstream analysis etc they don't create the data is massive systolic thumps.  They instead create the data in lots and lots of little bits, a constant stream of information.  In other words its fast data that creates big data.  The point is that historically with these fast data sources there was only one way to go: do it in real-time and just drop most of the data.   So take a RADAR for instance, it has a constant stream of analogue information streaming in (unstructured data) which is then processed, in real-time, and converted into plots and tracks.  These are then passed on to a RADAR display which shows them.

At no stage is this data 'at rest' its a continue stream of information with the processing being done in real-time.  Very complex analytics in fact being done in real time.

So why so much more hype around Big Data?  Well I've got a theory.  Its the sort of theory that explains why Oracle has Times Ten (an in-memory database type of thing) and Coherence (an in-memory database type of thing) and talks about them in two very different ways.  On the middleware side its Coherence and it talks about distributed fast access to information and processing those events and making decisions.  Times Ten sits in the Data camp so its about really fast analytics... what you did on disk before you now do in memory.

The point is that these two worlds are collapsing and the historical difference between a middleware centric in-memory 'fast' processing solution and a new generation in-memory analytical solution are going to go away.  This means that data guys have to get used to really fast.  What do I mean by that?

Well I used to work in 'real real-time' which doesn't always mean 'super fast' it just means 'within a defined time ... but that defined time is normally pretty damned fast'.  I've also worked in what people consider these days in standard business to be real-time - sub-micro second response times.  But that isn't the same for data folks, sometimes real-time means 'in an hour', 'in 15 minutes', 'in 30 seconds' but rarely does it mean 'before the transaction completes'.

Fast Data is what gives us Big Data, and in the end its going to be the ability to handle both the new statistical analytics of Big with the real-time adaptation of Fast that will differentiate businesses.  This presents a new challenge to us in IT as it means we need to break down the barriers between the data guys and middleware guys and we need new approaches to architecture that do not force a separation between the analytical 'Big' and the reactional 'fast' worlds.

No comments: