Big Data and its Challenges

Digitally progressing towards a world with immense amount of data, businesses are constantly looking for a feasible and practical way to analyze the information so that this flood of the data can be utilized in a meaningful manner for growth and development. Data is being collected at the unprecedented pace, and it is coming from the gamut of resources, available as soon as it is generated. Big Data is a broad term involving initiatives and technologies that involve massive, diverse and continuously changing data. It can changed the way organizations are doing business, gaining insight, are dealing with their customers and are making decisions by offering a synergy and extension to the existing processes. Big data is also changing the way businesses are approaching product development, human resources and operations. It is touching every aspect of the society including retails, mobile services, life sciences, financial services and physical sciences. It can be touted as the biggest opportunity, as well as the biggest challenge for the statistical sciences because if the numbers are crunched accurately, Big Data can offer huge rewards.

Companies may know the types of result they are seeking but these might be difficult to obtain. Or, significant data mining might be required to obtain specific answers. For statisticians, the challenge is dealing with the data which is not only big, but also very different. They need to deal with “Look-everywhere effect” and extract meaningful information from a huge haystack of data. Additionally there are challenges with the algorithms as they often do not scale up as expected and can get extremely slow when gigabyte-scale dataset is involved. To improve the speed and theoretical accuracy, these algorithms need to be improved, or new algorithms need to be designed. The algorithm must be capable of handling next-generation functional data, and should be able to look through data for hidden relationships and patterns.

Another challenge is the analysis of too many correlations, several of which can be bogus that may appear statistically significant, and magnitude of the big data can amplify such errors. Additionally, big data is quite efficient in detecting subtle correlations, however, it is left to the imagination of the user which correlations are meaningful, and this may not always be an easy task. The statistical analysis cannot be a wholesale replacement to the scientific inquiry, and users must start with the basic understanding of the data. Also, once the users gain the understanding of the big data, it can easily be gamed. A good example could be “spamdexing” or “Google bombing” where companies can artificially elevate website search placement. At times, the results of the analysis may not be intentionally gamed, but they can be less robust than expected. Most of the big data comes from the web, which is a big data itself, and this increases the chances of reinforcing the error.

Undoubtedly, big data is a valuable tool and it has made a critical impact in selected few realms. However, it has proved its worth in analysing common things, falling short in the analysis of less commonly used information, not living up to the perceived hype. Big data should be here to stay, however it is not a silver bullet, and we need to be realistic about its potential and limitations.

DevOps – A Collaborative Approach

There are lots of different opinions about what encompasses the definition of DevOps. Speaking in very broad terms, born to improve the IT service delivery agility, DevOps facilitates collaboration, communication and integration between IT operations and software developers. DevOps environment consists of a team with cross-functional team members including QA, developers, business analysts, DBAs, operations engineers and so on. Incorporating DevOps helps companies get done more, and deploy code more frequently.

Businesses these days are facing some common problems. After application delivery, businesses are sceptical to change. The reason usually is the vulnerable and brittle software, and the platform which it sits on. Software is risky, prone to errors, and is unpredictable. Introducing new features or fixing application problems takes long time mainly due to bureaucratic change-management system. There is also risky deployment where no one is completely confident if the software will actually work in the live environment, if code will cope with the load, or if code will work as expected. The product is usually pushed out, and teams just hope to see if everything works. More often than not, the problem start manifesting after the project goes live. The developers use a system to develop the code, which is tested in completed different system, and deployed on entirely different machines, causing incompatibility issues due to different properties files. If the business units are siloed, the issues get passed between different teams. There can be siloisation within teams as well. If the silos are not in the same office, or city, this leads to “them vs us” mentality, making people more sceptical.

DevOps approach believes in handling businesses in a more productive and profitable manner by building teams and software to resolve these issues. The above mentioned problems can be addressed by DevOps approach where people with multidisciplinary skill set are happy to roll up their sleeves for multidimensional role. They make connections and bridge gaps, tremendously impacting the businesses. This builds cross-disciplinary approach within the teams with maximum reliability across different departments, leading to faster time to market, happier clients, better availability and reliability and more focussed team energy. The goals of DevOps approach are spread across complete delivery pipeline, improving the deployment frequency. DevOps promotes sets of methods and processes for collaboration and communication between product development, quality assurance and IT operations. It encourages understanding the domain for which software is being written, develop communication skills, and there is a conscious passion and sensitivity to ensure that the business succeeds.

In the non-DevOps environment, the operations team’s performance is measured based on the stability of the system, whereas the development team is gauged based on the delivered features. In the DevOps environment, a single whole team is responsible for the system stability and delivering new features. There is continuous integration, shared code, automated deploys, and test-driven techniques. The problems get exposed earlier in the application code, configuration or infrastructure mainly because software is not just thrown to the Operations once the coding is over. The change sets are smaller, making the problems less complex and as the team members do not have to wait for other team to find and fix the problem, resolution times are much faster.

Additionally, in a typical IT environment, people need to wait for other machines, other people, or updated software. Employees often get stuck in resolving the same issues over and over again, and this can become quite frustrating, leading to job frustration. It becomes essential for the organisations to remove the ungratifying part of their employees’ jobs so that they can add more value to the organisation, making it more productive and profitable. Standardized production environments and automated deployments are the main aspects of DevOps that make the deployments predictable, and this frees up the resources from the mundane tasks. This software development method acknowledges and utilizes the interdependence of IT operations, software development and quality assurance to help companies create new products faster, while improving the operations performance.

There are several technical and business benefits of this collaboration across different roles. This includes continuous software delivery, faster problem resolution, reduced complexity of the problems, more stable operating environments, faster feature delivery and more time to provide value addition rather than fixing or maintaining. The DevOps movement is yet to reach its full potential, and the statistics have shown that this is not just a fleeting fad. It promises a paradigm shift, a significant revolution in the software industry to blur the boundaries.

Amazon Web Services

Amazon Web Services or AWS is a cloud computing platform offering by Amazon providing a wide array of cloud services to the customers. Amazon AWS offers several cloud options including Amazon Simple Storage Services (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon SimpleDB, Amazon Virtual Private Cloud (Amazon VPC) and Amazon WorkSpaces. This group of remote computing services that make up the cloud-computing platform over the internet are known as Amazon Web Services. Most of these services are not directly exposed to the end users. Amazon S3 and Amazon EC2 are two of the most central and well-known offerings.

To enable the use of online services via REST, HTTP or SOAP protocols, Amazon first introduced Amazon Web Services in 2006. AWS launched two Amazon CloudFront edge location in India on 28th July 2013 joining 42 edge locations worldwide. This makes the services global, fast, flexible and cost-effective. Amazon Web Services is now a $6 Billion-a-year cloud-computing monster, and is considered Amazon’s most valuable asset. AWS generated the sales of $1.57bn in the first quarter of 2015, and firm’s total revenue for the quarter rose by 15% to $22.7 bn, which was much stronger increase than expected. AWS provides cloud computing services to several household names including Spotify, Dropbox, Uber, Netflix, CIA and Samsung.

AWS services include:

Cloud– Virtual Servers, Containers, Event-driven Computer Functions, Auto scaling and Load Balancing

Storage and Content Delivery– Object Storage, Block Storage, Archive Storage, File System Storage, and CDN

Databases- Relational, Caching and NoSQL

Networking– Virtual Private Cloud, DNS and Direct Connections

Administration and Security– Identity Management, Access Control, Key Storage and Management, Usage and Resource Auditing, Monitoring and Logs, and Service Catalogue.

Enterprise IT Applications from AWS include Desktop Virtualization, Email and Calendering, and Document Sharing and Feedback. Engineered for the most demanding requirements, AWS offers following advantages:

Secure– AWS cloud security infrastructure is one of the most secure and flexible cloud computing environments providing highly reliable and extremely scalable platform so that the data and applications can be deployed securely and quickly. Data and applications are not only protected by highly secure infrastructure and facilities, they are also protected by extensive security and networking monitoring systems that provide critical security measures such as password brute-force detection and distributed denial of service (DDoS) protection on AWS accounts. Additional security measures include built-in firewalls, secure access, private subnets, multi-factor authentication, unique users, dedicated connection options, encrypted data storage, centralized key management, security logs, perfect forward secrecy and so on.

Compliant– AWS’s Cloud Compliance ensures robust control to maintain data protection and security in the cloud. Compliance responsibilities get shared as the systems are built on top of AWS cloud infrastructure.

Private, isolated Resources– Enterprises adopting cloud computing require secure data and applications, increased agility, and reduced costs. However, some organisations also need to take into consideration the unique requirements for regulatory compliance and resource isolation. AWS offers private network, private compute, private storage, enterprise governance, and private encryption resources for completely private cloud experience.

Hybrid– AWS offers solutions and tools to integrate existing resources of the organisation with the AWS cloud to extend and enhance the capabilities and accelerate the adoption of cloud computing. Offering a wide range of options to develop architecture, hybrid architecture involves integration of storage, networking, management and access control. The capabilities include integrated networking, integrated Cloud back-ups, integrated access control, and Integrated resource management and workload migration.

Amazon Web Services have changed the game in several ways. You pay only for what you use, scale up or down easily within few minutes based on the demand. Servers can be added to different parts of the world to provider faster services to the customers. Additionally, Amazon periodically keeps adding features and services to their existing offerings, making it a preferred choice for organisations in different domains. Recent AWS add-on includes Amazon Machine Learning. However, with IBM, Google and Microsoft emerging to grab some of the market share, it needs to be seen how Amazon is able to hold its margins high over the long term.