IanJSutherland: The “BIG” tripwire and the “halflife” of data

Over my working career a number of prefixes have been hyped by consultants and suppliers as the next big thing. In the 80’s it seemed that everything that was good was prefixed with “global”. In the 90’s the prefix was “hyper” and in the 00’s we just added “e”. There was and still is some merit in these words that act as amplifiers, but most have been found to have a double edge to them. They are still used, but more judiciously.

It seems that the prefix for the 10’s may well be “big”. We have heard that scale is good and that some banks have been viewed more favourably post-crisis when it was judged that they were too “Big” to fail. More accurately they are probably too big to be allowed to fail as the consequences would be unthinkable.

We also hear a lot about “big data” or should that be “BIIIIIIIIIIIIG data”. This seems to be the latest thing that the IT Industry is telling us we should be thinking about and turn into value for our businesses. Cynically this does not seem to be anything new, but rather a marketing tag line.

I was reflecting on this word “big” and find that I am more inclined to use in it terms of too big to run or too big to fix in terms of banks. While far from alone, the example of HSBC’s much publicised fine for money laundering comes to mind. From listening to knowledgeable radio commentators and a little personal insight HSBC appears to have a few issues.

HSBC by its own claim had a good crisis, mainly because it was far more fragmented and less joined up than other banks of similar size and reach. Its model for growth up to the crisis could be typified as that of buying distressed banks in many countries, closing them Friday under one name and opening them Monday under the HSBC name, logo and letterhead and a high level management layer, but limited infrastructural integration. In the intervening years there have been limited attempts to effect further integration. This fragmented nature has been typified by the enormous number of “Global heads of this or that”; roles that were often built around an individual rather than on sound organisational design.

When one reads about the trouble with money laundering it suggests HSBC struggled to embed the UK ethics of knowing your customer (KYC) or anti money laundering (AML) in Mexico and raises hairs on my neck. I do feel a good deal of sympathy with those currently charged with standardising and ensuring proper KYC and AML across something like 89 (from recollection) countries and cultures. Is this actually too big to fix? And is HSBC alone? Pesonally I suspect the answers are “”Yes” and “No” respectively.

My other thoughts about “big” relate to the data question and start almost 20 years ago. That was when I accepted the challenge to re-engineer and build a global reference data function for an investment bank. The organisational and process architects had done their stuff in some rarified atmosphere and decided that in order to provide the single investment bank back office with high “straight through processing” (STP) capability the reference data function needed to have available a complete universe of relevant tradable (note tradable, not just traded) instruments that were maintained at 100% accuracy and available immediately. That was my initial brief.

It soon became apparent that the cost of sourcing a complete tradable universe, well as much as was available, would be extortionate. Adding the extra data for new issues and gaps would be substantial and the process for maintaining it all at 100% accuracy would be horrendous. And that was all before building the distribution capability to the various systems that would need it….immediately. This was nonsense when in truth our traders and brokers only traded a small, moving percentage of the instrument universe. I use the word moving as the percentage might stay the same but the components could and would change.

In resolving this I led a complete rethink about the approach to be taken, the processes required and the management of the service to end-users. It will be no surprise that we did not provide “big” data in the end, but rather what was needed in a quick and efficient way. The business could not have carried the cost implicit in the original design.

I have seen similar, more recent attempts to supply this nirvana of instrument data stumble on many of the same issues. In these cases, even twenty years on, “big” is probably still too big.

The cost of maintaining big data crossed my mind again the other day when I was listening to a talk by Gerry Pennell, CIO to the 2012 London Olympics. He was asked if LOCOG had done anything with the “big data” it had collected on all the visitors to the games and users of their applications and websites. I forget the actual numbers but he quoted the data collected and stored during the games in terms of petabytes (1 petabyte = 1000 terabytes, etc). His answer was “no” primarily because the duration of the games was so short and their focus was on the smooth operating of core systems.

What he did say was that there were efforts to leverage the ticketing data with other sporting bodies in order to target interested parties in future events. But here we encounter the issue of data degradation, ie data becomes less accurate and less useful over time. This is not because some computer gremlin goes in and changes things, but rather that people change, the move address, the change their phone number, they get married, they even die. As time passes more of the data you collected at a point in time becomes reduced in value.

In science there is the concept of the half-life of a radioactive element. This is the time it takes for half of the starting material to decay. I am sure there is or should be a similar concept in data. The measure of half or 50% is probably too high but it would be interesting to know/estimate how long before 10% of the data collected at a point is then inaccurate. Is it a year? Or ten years? That will change my perception of the data and my ambition for it.

I have living evidence in my wife’s work. She is employed by a company that supplies mailing lists for people with commercial real estate for sale or let. The clients buy a set of details that meet a criteria of location, size, function, but more than that they are guaranteed to be good contacts ie someone like my wife calls up every name on the initial list to confirm, correct or delete every name, address and email. Only this was is the data of real value.

So what? So, my advice is when you feel that you might be seduced by anything prefixed in marketing speak with the term “BIG”, think hard about who is promoting it and why and then be very clear if it is really in your interest to embrace the idea, or alternatively to think who you might better achieve the same result?

IanJSutherland

Wednesday, 19 December 2012

The “BIG” tripwire and the “halflife” of data

No comments:

Post a Comment

About Me

Blog Archive