Big Data and its Challenges

Digitally progressing towards a world with immense amount of data, businesses are constantly looking for a feasible and practical way to analyze the information so that this flood of the data can be utilized in a meaningful manner for growth and development. Data is being collected at the unprecedented pace, and it is coming from the gamut of resources, available as soon as it is generated. Big Data is a broad term involving initiatives and technologies that involve massive, diverse and continuously changing data. It can changed the way organizations are doing business, gaining insight, are dealing with their customers and are making decisions by offering a synergy and extension to the existing processes. Big data is also changing the way businesses are approaching product development, human resources and operations. It is touching every aspect of the society including retails, mobile services, life sciences, financial services and physical sciences. It can be touted as the biggest opportunity, as well as the biggest challenge for the statistical sciences because if the numbers are crunched accurately, Big Data can offer huge rewards.

Companies may know the types of result they are seeking but these might be difficult to obtain. Or, significant data mining might be required to obtain specific answers. For statisticians, the challenge is dealing with the data which is not only big, but also very different. They need to deal with “Look-everywhere effect” and extract meaningful information from a huge haystack of data. Additionally there are challenges with the algorithms as they often do not scale up as expected and can get extremely slow when gigabyte-scale dataset is involved. To improve the speed and theoretical accuracy, these algorithms need to be improved, or new algorithms need to be designed. The algorithm must be capable of handling next-generation functional data, and should be able to look through data for hidden relationships and patterns.

Another challenge is the analysis of too many correlations, several of which can be bogus that may appear statistically significant, and magnitude of the big data can amplify such errors. Additionally, big data is quite efficient in detecting subtle correlations, however, it is left to the imagination of the user which correlations are meaningful, and this may not always be an easy task. The statistical analysis cannot be a wholesale replacement to the scientific inquiry, and users must start with the basic understanding of the data. Also, once the users gain the understanding of the big data, it can easily be gamed. A good example could be “spamdexing” or “Google bombing” where companies can artificially elevate website search placement. At times, the results of the analysis may not be intentionally gamed, but they can be less robust than expected. Most of the big data comes from the web, which is a big data itself, and this increases the chances of reinforcing the error.

Undoubtedly, big data is a valuable tool and it has made a critical impact in selected few realms. However, it has proved its worth in analysing common things, falling short in the analysis of less commonly used information, not living up to the perceived hype. Big data should be here to stay, however it is not a silver bullet, and we need to be realistic about its potential and limitations.

Bloodless Revolution: Data Mass transforming the Masses

A cell phone in your pocket, a tab in your backpack, and an IT system in your back office, and off you are to unleash your own little revolution.

Data Revolution is the way ahead. Read it again. And you will discover a Data-driven revolution is actually a double whammy.

In the world of Version 1.0 and 2.0, data defined the world and knocked on our doorsteps with silos of info. But with the advent of social media and Version 3.0, all that tons of data started to get knocked around, here there and everywhere. That’s how data got smaller and smaller, as it got bigger and bigger, with giant servers serving up all the data to all the people and in all of 140 characters. (You can go ahead and tweet this gyan). That’s the revolution which is happening not once but almost in countless ways, and in ways we cannot even fathom. .

Data revolution? What’s the top-of-the-mind recall it gives? The Arab Spring in of all places, Egypt, was shaped largely by the social media, so much so that it is also described as a Twitter Revolution. The Arab Spring shook up the world’s richest sheiks through three million tweets and thousands of blog posts and YouTube videos.

Take a dipstick poll anywhere, any time among any people about the likeliest place where a bloodless revolution fueled by data could happen, and you will find that the Middle East would hardly find a mention in the poll. Right? Wrong. The Iranian Revolution for democracy is attributed to the Twitterati data warriors. It has even inspired calls to nominate Twitter for the Nobel Peace Prize.

For the pink press the Data Revolution is all about remaking big business, but that is the only visible part of the shakeup. What is invisible – or lesser known – is how it is transforming ordinary lives of ordinary people.

The Britannica Encyclopedia, the Bible of all Data, started eons back in 1768, is now history, felled by the revolutionary power of data. The PC era first pushed it to the precipice, and then the CD-Rom and an endless, me-too data devouring Wikipedias sent it to rest.

Who started The Huffington Post? Not Rupert Murdoch, but citizen journalists. What about Linux and Asterisk? They are powered by well, collaborative communities. How these little unknowns morphed into a kind of Big Boys Redux is what data revolution is all about.

This data revolution thing may be bloodless but is still scary – not so much for us, but for the big boys actually. If an iconic 250-year-old company can be sent packing by an upstart, imagine the mind-boggling extent to which the playing field has been leveled.

You can’t get more disruptive than this. Or maybe, the only way to get more disruptive is to get less disruptive…

Big data is not only disrupting our lives by the hour, but it is also redefining the rules of engagement globally and locally. Businesses don’t have to play by rules writ in stone anymore. Example: Facebook which has one in ten persons the world over logged in to it has no website of its own!

It might seem silly at the first cut, but the original revolutionary, Karl Marx may well have embraced this data-driven revolution as his own playbook. A cell phone in your pocket, a tab in your backpack, and an IT system in your back office, and off you are to unleash your own little revolution.

Looking ahead, we may have to find new ways to grapple with this data overload. Indigestion can afflict the mind too. While the walls between us have collapsed and have been replaced by “Windows” through which data can flit in and out, we have to run with what we have. In theory all this data is only making it that much more difficult to make sense of all this change, but it could still work in practice. As one anonymous wise man has said, “Wikipedia only works in practice. In theory, it can never work.”