Big Data – My Thoughts

Big Data is changing the way management makes decisions. No doubt. The senior management across organizations is emphasizing rightly on Evidence Based Management (EBM). Jumping onto the bandwagon are businesses, entrepreneurs and individuals, for their share of the Big Data pie. Big data is bringing businesses closer to customers and is greatly enhancing the interaction experience of customers. Agreed. Pausing to think – at what cost and at what trade-offs? All this hoopla around Big Data maybe more than what it is worthy of and we should definitely give it a second thought before hopping on.

Big Data, as we all know is characterized by the three Vs – Volume, Variety & Velocity. While volume brings in the challenges associated with the amount of data available (growing at ~ 2 Exabytes per day!) and the challenges associated with storing and retrieving data, the other two present challenges of their own. Variety, despite being the spice of life, complicates things here. The divisions of structured & unstructured data present a simple classification formula, but also present ambiguity into which kind of data goes into which bucket. Documents, spreadsheets, web pages, chats, emails, photos, videos and sensor data are some of the different sources which present challenges with the classification, representation and importantly semantic interpretation. The velocity is quite high. The rate at which data is generated and updated is mind-boggling. Facebook, LinkedIn and other social sites process petabytes of data and collect terabytes of data every day. Taking into factor two more Vs – Veracity and Value is important if one has to understand the challenges with a panoramic view. Veracity of the data is questionable because of the quality of data input and the interpretation applied when the data is being classified and processed. What’s the real worth of the data based on the usage context, calls the value parameter into question.

Isn’t big data the next level of Data analytics wherein we churn or mine data and come up with meaningful trends? Under the name of analytical reporting, we have been trying to make sense of swathes of data. In fact, businesses today are collecting more data than they know what to do with. Have we not been doing data analysis and reporting for years now? Taking our own body for instance, every time I respond to a particular scenario, isn’t my brain mining into my experiences and knowledge acquired till date and proposing an appropriate response – fight, flight or surrender as the case may be?

A Big problem behind the façade!

I read about Supermarkets, Banks and other institutions having a significant level of customer interaction, monitoring the buying, visits or transactions performed by a customer. They use this information to ‘predict’ the customer behavior and try to create the ideal customer experience. Big Data to me sounds like Big Brother, and I have this uneasy feeling that I am being watched, monitored and my actions recorded – every visit and every transaction. At this rate, this is going to lead to a ‘constructed reality’ as in the ‘The Truman Show’ with routine taking over the dynamism that we enjoy. And privacy isn’t really the biggest concern.

Coming to how prepared are we for Big data? Looking beyond the success stories extolling the virtues of Big data, we have quite a number of candidates on the wrong side. Mitt Romney’s fiasco on Election Day where a breakdown in the ‘Orca’ voter identification system cost him dearly in the race for the top spot is a classic example. This and the associated statistics go on to show that having the data isn’t nearly everything. In fact, we need

  • Readily accessible data of the scale one can call “big”
  • Analytical system with a rules engine that can sift through the vast data store and pull meaningful data
  • Availability of ‘Data Scientists’ who are the ‘brains’ behind the rules
  • High performance infrastructure in tune with information processing requirements
  • Contextual interpretation of data
  • Security standards and clear limits on data storage and sharing
  • Cultural shift that encourages de-siloing the data pockets and pooling in the information to come up with organization level recommendation and changes.
  • Management capable of making data driven decisions which may contradict with their gut instructs fuelled by years of work experience
  • And most importantly, reliable sources of data

Why are we worried about the reliability of data? Data gets to be big as a result of consolidation from various sources based on the utilizing industry. Data from sensors, data from assembly line systems, data from online browsing sequences, data from e-mails and social networks to quote a few. If the sensors or the systems are incorrectly calibrated, we have a wrong decision on our hands. A simple metrics standard mismatch cost NASA its $125 Million Mars orbiter. Garbage In Garbage Out (GIGO) is a much likely scenario here that is very well worth avoiding. Social networks are more representative of the younger generation who actually voice their thoughts – is that a worthy representation to take a decision that affects the entire populace? A data source with a pronounced bias towards an issue or a problem will come to represent the overall  ‘consensus’. Doesn’t data become susceptible to manipulation which can be used by organizations/individuals to their favor – after all, who can verify the claims without putting in a significant amount of effort?

Ash Mahmud, ex-head of CRM at Groupon has correctly remarked – “A business running without accurate data is running blind”.

Security of the data collected will remain a bugbear and recent incidents such as that at Target where the records of 70+ million customers were stolen leading to a sales decline of about 2.5% across its stores will remind us time and again of the aftershocks from security compromises.

Also today, we all want to use big data, but the techniques necessary for taking out useful data are rarely being taught even in statistics courses. In fact, in a survey conducted by SAS from around 750 senior executives from different sector, 41% of the respondents said that they did not have the right skills in the organization to process data. The problem with processing the available data has been touted as the most difficult aspect of a Big Data project. No wonder, that a significant amount of collected data remains unused. So, we should be prepared for a lot of new entrants masquerading as ‘data specialists’ who will promise ‘insight’ into the way you work which is much likely to be far from the truth.

Even if we manage to get in place all the pre-requisites, unless we change the decision making culture of the organizations, we will only be spending a lot more on a technology we don’t fully utilize. Companies need to switch to data-driven decision making, which is easier said than done amidst the rigid process-driven models we currently employ. Currently most important decisions are taken by people high up in the organizations or expensive outsiders who make decisions based on their experience and intuition even if they conflict with conclusions drawn from data.

The biggest misconception that has arisen with the advent of big data is the belief that subsequently our reliance on vision and understanding of human behavior is going to diminish. The truth however is that we will still need business leaders to spot opportunities and ask the right questions from the data. Successful companies of the future will only be those who are able to combine the two.

In a nutshell, there are still several challenges we need to overcome in order to harness the power of big data. So I’d be skeptical in joining the bandwagon and hopefully have given you enough reasons to take a second look.  Big Data could very well take the route of the ERPs of the past – Pyrrhic victories before getting it right.

[Source: My whitepaper that i wrote almost a decade ago – still relevant!]