Our world is a potential treasure trove for data scientists and analysts who can comb through massive amounts of data for new insights, research breakthroughs, undetected fraud or other yet-to-be-discovered purposes. But it also presents a problem for traditional relational databases and analytics tools, which were not built to handle the data being created. Another challenge is the mixed sources and formats, which include XML, log files, objects, text, binary and more.
"We have a lot of data in structured databases, traditional relational databases now, but we have data coming in from so many sources that trying to categorize that, classify it and get it entered into a traditional database is beyond the scope of our capabilities," said Jack Collins, director of the Advanced Biomedical Computing Center at the Frederick National Laboratory for Cancer Research. "Computer technology is growing rapidly, but the number of [full-time equivalent positions] that we have to work with this is not growing. We have to find a different way."
You can't have a conversation about Big Data for very long without talking about the elephant: Hadoop.
Hadoop is an open source software platform managed by the Apache Software Foundation that's very helpful in storing and managing vast amounts of data cheaply and efficiently.
But what makes it special?
Hadoop is more than just a faster, cheaper database and analytics tool. In some cases, the Hadoop framework lets users query datasets in previously unimaginable ways.
Basically, it's a way of storing enormous data sets across distributed clusters of servers and then running "distributed" analysis applications in each cluster.
Here's how Apache describes it:
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly available service on top of a cluster of computers, each of which may be prone to failures.
Introducing Apache Hadoop: The Modern Data Operating System
But what is BIG DATA?
Big data is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured. Much has been written on the big data trend and how it can serve as the basis for innovation, differentiation and growth.
In this video, Antony Wildey from Oracle Retail explains what Big Data is, and why effective management of data is vital to retailers in gaining actionable insight into how to improve their business. It includes how Oracle can help businesses to use data from social networking sites such as Facebook and Twitter, and use sentiment analysis seamlessly to provide insight on product demand.
Licenciada en Bellas Artes y programadora por pasión. Cuando tengo un rato retoco fotos, edito vídeos y diseño cosas. El resto del tiempo escribo en MA-NO WEB DESIGN AND DEVELOPMENT.
How To Use Varnish As A Highly Available Load Balancer On Ubuntu 20.04 With SSL
Load balancing with high availability can be tough to set up. Fortunately, Varnish HTTP Cache server provides a dead simple highly available load balancer that will also work as a…
htaccess Rules to Help Protect from SQL Injections and XSS
This list of rules by no means is a sure bet to secure your web services, but it will help in preventing script-kiddings from doing some basic browsing around. MySQL injection…
Must-Have htaccess Tips for you to Avoid Duplicate Content on Your Site
In order to be able to implement these tips it is necessary that your Apache server already has the mod_rewrite module activated. mod_rewrite and .htaccess are used together so that…
How to write real client IP address in error Log with Varnish 4 and Apache 2.4 in Ubuntu 16.04
In order to have Varnish 4 pass on the real client IP to your Apache 2.4 error log in Ubuntu 16.04 , you'll need to edit your Varnish configuration (/etc/varnish/default.vcl…
How to Configure the Mod_Security Core Ruleset in Ubuntu
ModSecurity is a Web Application Firewall, a program that can be used to inspect information as it passes through your web server, intercepting malicious requests before they are processed by your…
Install Apache, MariaDB and PHP7 on Ubuntu 16.04
Ubuntu 16.04 LTS Xenial Xerus comes with PHP7 by default so you don’t have to rely on third-party PPA to get PHP7 installed. In this tutorial, we are going to…
Setup SSL Certificate on Apache and Ubuntu 12.04
How to Create a SSL Certificate on Apache for Ubuntu 12.04 About SSL Certificates A SSL certificate is a way to encrypt a site's information and create a more secure connection. Additionally,…
Install and Configure Varnish with Apache multiple Virtual Hosts on Ubuntu 12.10
About Varnish Varnish is an HTTP accelerator and a useful tool for speeding up a server, especially during a times when there is high traffic to a site. It works by…
Install apache2 mod_security and mod_evasive on Ubuntu 12.04
This guide is intended as a relatively easy step by step guide to: Install and configure Apache2 ModSecurity and mod_evasive modules on Ubuntu 12.04 LTS server. Things have become much easier…
How To Uninstall Apache 2
This is a little tip that we can share to manage your linux box server, below you can see the consolle command to uninstall apache webserver sudo update-rc.d -f apache2 remove
Setting the Hostname (FQDN) in Linux Servers
Here we will show how to set your system's hostname and fully qualified domain name (FQDN). Your hostname should be something unique. Some people name their servers after planets, philosophers, or…
Million of visitors per day with a super cheap php mysql server using nginx and varnish
These instructions are the rather verbose, but hopefully easy enough to follow, steps to build a new Linux server using Varnish and Nginx to build a php application on a…