Big Data and Data Visualization

In recent years, there has been a dramatic rise of unstructured data from different sources such as social media, videos and photos, and businesses are looking for relationships between data which can be viewed from multiple perspectives. This evolution of the way the data is being produced, processed and analysed is bringing drastic changes to the world around us.

Big data is a term describing large volumes of structured and unstructured data that can be analysed to gain business insights. According to Gartner, big data is a high-volume, high-velocity and high-variety information asset that demands cost-effective innovative forms of information processing for enhanced insight and decision making. In simpler terms, big data is lots of data produced rapidly in many different forms. This rapidly growing data could be related to online videos, customer transactional histories, social media interactions, traffic logs, cell phones, flip computers, tablets, cloud computing, Internet of Things, sensors etc., and global traffic is expected to reach more than 100 trillion gigabytes by 2025. Here is a hint what happens approximately in a minute on the internet, and the generated data continues to grow exponentially:

This huge volume of data needs to be parsed to discover useful threads that can uncover endless opportunities, and can be teamed with innovative ideas to decrease costs, improve overall customer satisfaction, increase revenue, and provide customer tailored offerings. The data requires quick analysis and information must be displayed in a meaningful way. It can be analysed for time reductions, cost reductions, smart decision making, optimizing offerings or new product development.

Big Data focuses on finding hidden trends, threads or patterns that might not be immediately or easily visible. The interpretations bring out insights that would otherwise be impossible to observe using traditional methods. This requires latest technologies and skill set to analyse the flow of information and draw results and conclusions. High powered analytics enable businesses to determine root causes of issues, defects and failures in real time, recalculate complete risk portfolios in just minutes, detect fraud, and so on. NASA, U.S.Government, and organisations like Wal-Mart and Amazon are using Big Data to recognize the possibilities that can help them capitalize the gains.

However, this huge volume of rapidly generating big data cannot be handled using traditional reporting process. To reap maximum benefits, data analytics needs to be done in real time instead of batch processing which fails to capture big data’s immediacy. Another challenge in handling big data is the increased availability of mobile devices. This requires decentralization of reports and adoption of cost-effective, faster and more democratized business intelligence model to improve collaboration and speed insights.

Data Visualization Tools

To make sense of the boring raw data and observe interesting patterns, organisations use visualization tools that help them visualize all their data in minutes. Data Visualization places data in the visual context such as trends, patterns and correlations, which helps organisations understand the significance of the data which may go undetected if this data was just text-based. This beneficial visual matter can help companies eliminate loss making products and increase revenue by minimizing waste. Data visualization can help identify areas that require attention or improvement, help understand product placement, clarify factors influencing customer behaviour and can predict sales volume.

Some of these tools are for developers and require coding, while others contain data visualization software products that do not require coding. Here are some of the commonly used data visualization tools:

1. D3.js (Data Driven Documents) uses CSS, HTML and SVG to render diagrams and charts. The tool is open-source, looks good, is packed with helpful features and is interactivity rich.
2. FusionCharts has an exhaustive collection of maps (965) and charts (90) that work across all platforms and devices, and supports browsers starting from IE6. It supports XML and JSON data formats, and can export charts in JPEG, PNG and PDF. For inspiration, there is a good collection of live demos and business dashboards. Although, the tool is slightly highly priced, it has beautiful interactions and is highly customizable.
3. Chart.js is an open source library that supports bar, line, polar, pie, radar and doughnut chart types. The tool is good for smaller hobby projects.
4. Highcharts offers good range of maps and charts right out of the box. It also offers a different feature rich package called Highstock for stock charts. The tool is free for personal and non-commercial use, and users can export charts in JPG, PNG, PDF and SVG formats.
5. Google Charts can render charts in SVG/HTML5 . It offers cross-browser compatibility and cross-platform portability to Android and iPhones.
6. Datawrapper is commonly used by the non-developers to make interactive charts. The tool is easy to use and can generate effective graphics.
7. Tableau Public is one of the most commonly used visualization tool as it supports variety of maps, graphs, charts and other graphics. The tool is free and can be easily embedded in any webpage.

Raw, Timeline JS, Infogram, plotly, and ChartBlocks are some of the additional data visualization tools. Excel, CVS/JSON, GooGle Chart API, Flot, Raphael, and D3 are some of the entry level tools which are good to quickly explore data or create visualization for internal use.

On the other end of spectrum, there are professional data visualization Pro tools that have expensive subscriptions. There are few free alternatives as well with strong communities and support. Some of these tools include R, Weka, and Gephi.

These data based visualization tools are focussed on the front end of the big data that enable businesses to explore the information and gain deeper understanding by interacting directly with the data. On the other hand, Apache Hadoop is an open source software associated with Big Data to support the back-end concerns such as processing and storage. There are several variants of Hadoop such as MapR, Hortonworks, Cloudera and Amazon. Google BigQuery is a cloud-based service.

Businesses seek most cost-effective ways to increase profitability by managing volume, velocity and variety of the data and turning that data into valuable information to better understand business, customers and marketplace. However, volume, velocity and variety are no longer sufficient to describe the challenges of big data, hence more terms such as variability, veracity, value and visualization have been added that broaden the realm of the big data scope. Big Data is exploding with innovative approach and forward thinking, and organisations can exploit this opportunity to gain market advantage and increase profitability.

Big Data and its Challenges

Digitally progressing towards a world with immense amount of data, businesses are constantly looking for a feasible and practical way to analyze the information so that this flood of the data can be utilized in a meaningful manner for growth and development. Data is being collected at the unprecedented pace, and it is coming from the gamut of resources, available as soon as it is generated. Big Data is a broad term involving initiatives and technologies that involve massive, diverse and continuously changing data. It can changed the way organizations are doing business, gaining insight, are dealing with their customers and are making decisions by offering a synergy and extension to the existing processes. Big data is also changing the way businesses are approaching product development, human resources and operations. It is touching every aspect of the society including retails, mobile services, life sciences, financial services and physical sciences. It can be touted as the biggest opportunity, as well as the biggest challenge for the statistical sciences because if the numbers are crunched accurately, Big Data can offer huge rewards.

Companies may know the types of result they are seeking but these might be difficult to obtain. Or, significant data mining might be required to obtain specific answers. For statisticians, the challenge is dealing with the data which is not only big, but also very different. They need to deal with “Look-everywhere effect” and extract meaningful information from a huge haystack of data. Additionally there are challenges with the algorithms as they often do not scale up as expected and can get extremely slow when gigabyte-scale dataset is involved. To improve the speed and theoretical accuracy, these algorithms need to be improved, or new algorithms need to be designed. The algorithm must be capable of handling next-generation functional data, and should be able to look through data for hidden relationships and patterns.

Another challenge is the analysis of too many correlations, several of which can be bogus that may appear statistically significant, and magnitude of the big data can amplify such errors. Additionally, big data is quite efficient in detecting subtle correlations, however, it is left to the imagination of the user which correlations are meaningful, and this may not always be an easy task. The statistical analysis cannot be a wholesale replacement to the scientific inquiry, and users must start with the basic understanding of the data. Also, once the users gain the understanding of the big data, it can easily be gamed. A good example could be “spamdexing” or “Google bombing” where companies can artificially elevate website search placement. At times, the results of the analysis may not be intentionally gamed, but they can be less robust than expected. Most of the big data comes from the web, which is a big data itself, and this increases the chances of reinforcing the error.

Undoubtedly, big data is a valuable tool and it has made a critical impact in selected few realms. However, it has proved its worth in analysing common things, falling short in the analysis of less commonly used information, not living up to the perceived hype. Big data should be here to stay, however it is not a silver bullet, and we need to be realistic about its potential and limitations.

Bloodless Revolution: Data Mass transforming the Masses

A cell phone in your pocket, a tab in your backpack, and an IT system in your back office, and off you are to unleash your own little revolution.

Data Revolution is the way ahead. Read it again. And you will discover a Data-driven revolution is actually a double whammy.

In the world of Version 1.0 and 2.0, data defined the world and knocked on our doorsteps with silos of info. But with the advent of social media and Version 3.0, all that tons of data started to get knocked around, here there and everywhere. That’s how data got smaller and smaller, as it got bigger and bigger, with giant servers serving up all the data to all the people and in all of 140 characters. (You can go ahead and tweet this gyan). That’s the revolution which is happening not once but almost in countless ways, and in ways we cannot even fathom. .

Data revolution? What’s the top-of-the-mind recall it gives? The Arab Spring in of all places, Egypt, was shaped largely by the social media, so much so that it is also described as a Twitter Revolution. The Arab Spring shook up the world’s richest sheiks through three million tweets and thousands of blog posts and YouTube videos.

Take a dipstick poll anywhere, any time among any people about the likeliest place where a bloodless revolution fueled by data could happen, and you will find that the Middle East would hardly find a mention in the poll. Right? Wrong. The Iranian Revolution for democracy is attributed to the Twitterati data warriors. It has even inspired calls to nominate Twitter for the Nobel Peace Prize.

For the pink press the Data Revolution is all about remaking big business, but that is the only visible part of the shakeup. What is invisible – or lesser known – is how it is transforming ordinary lives of ordinary people.

The Britannica Encyclopedia, the Bible of all Data, started eons back in 1768, is now history, felled by the revolutionary power of data. The PC era first pushed it to the precipice, and then the CD-Rom and an endless, me-too data devouring Wikipedias sent it to rest.

Who started The Huffington Post? Not Rupert Murdoch, but citizen journalists. What about Linux and Asterisk? They are powered by well, collaborative communities. How these little unknowns morphed into a kind of Big Boys Redux is what data revolution is all about.

This data revolution thing may be bloodless but is still scary – not so much for us, but for the big boys actually. If an iconic 250-year-old company can be sent packing by an upstart, imagine the mind-boggling extent to which the playing field has been leveled.

You can’t get more disruptive than this. Or maybe, the only way to get more disruptive is to get less disruptive…

Big data is not only disrupting our lives by the hour, but it is also redefining the rules of engagement globally and locally. Businesses don’t have to play by rules writ in stone anymore. Example: Facebook which has one in ten persons the world over logged in to it has no website of its own!

It might seem silly at the first cut, but the original revolutionary, Karl Marx may well have embraced this data-driven revolution as his own playbook. A cell phone in your pocket, a tab in your backpack, and an IT system in your back office, and off you are to unleash your own little revolution.

Looking ahead, we may have to find new ways to grapple with this data overload. Indigestion can afflict the mind too. While the walls between us have collapsed and have been replaced by “Windows” through which data can flit in and out, we have to run with what we have. In theory all this data is only making it that much more difficult to make sense of all this change, but it could still work in practice. As one anonymous wise man has said, “Wikipedia only works in practice. In theory, it can never work.”