Big Data and Hadoop: an explanation

by Janeth Kent Date: 24-05-2013 big data hadoop databases apache oracle

Our world is a potential treasure trove for data scientists and analysts who can comb through massive amounts of data for new insights, research breakthroughs, undetected fraud or other yet-to-be-discovered purposes. But it also presents a problem for traditional relational databases and analytics tools, which were not built to handle the data being created. Another challenge is the mixed sources and formats, which include XML, log files, objects, text, binary and more.

"We have a lot of data in structured databases, traditional relational databases now, but we have data coming in from so many sources that trying to categorize that, classify it and get it entered into a traditional database is beyond the scope of our capabilities," said Jack Collins, director of the Advanced Biomedical Computing Center at the Frederick National Laboratory for Cancer Research. "Computer technology is growing rapidly, but the number of [full-time equivalent positions] that we have to work with this is not growing. We have to find a different way."

You can't have a conversation about Big Data for very long without talking about the elephant: Hadoop.

Hadoop is an open source software platform managed by the Apache Software Foundation that's very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what makes it special?

Hadoop is more than just a faster, cheaper database and analytics tool. In some cases, the Hadoop framework lets users query datasets in previously unimaginable ways.

Basically, it's a way of storing enormous data sets across distributed clusters of servers and then running "distributed" analysis applications in each cluster.

Here's how Apache describes it:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly available service on top of a cluster of computers, each of which may be prone to failures.

Introducing Apache Hadoop: The Modern Data Operating System

But what is BIG DATA?

Big data is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured. Much has been written on the big data trend and how it can serve as the basis for innovation, differentiation and growth.

In this video, Antony Wildey from Oracle Retail explains what Big Data is, and why effective management of data is vital to retailers in gaining actionable insight into how to improve their business. It includes how Oracle can help businesses to use data from social networking sites such as Facebook and Twitter, and use sentiment analysis seamlessly to provide insight on product demand.

by Janeth Kent Date: 24-05-2013 big data hadoop databases apache oracle hits : 4971

Janeth Kent

Licenciada en Bellas Artes y programadora por pasión. Cuando tengo un rato retoco fotos, edito vídeos y diseño cosas. El resto del tiempo escribo en MA-NO WEB DESIGN AND DEVELOPMENT.

How To Use Varnish As A Highly Available Load Balancer On Ubuntu 20.04 With SSL

Load balancing with high availability can be tough to set up. Fortunately, Varnish HTTP Cache server provides a dead simple highly available load balancer that will also work as a…

htaccess Rules to Help Protect from SQL Injections and XSS

This list of rules by no means is a sure bet to secure your web services, but it will help in preventing script-kiddings from doing some basic browsing around. MySQL injection…

Must-Have htaccess Tips for you to Avoid Duplicate Content on Your Site

In order to be able to implement these tips it is necessary that your Apache server already has the mod_rewrite module activated. mod_rewrite and .htaccess are used together so that…

How to write real client IP address in error Log with Varnish 4 and Apache 2.4 in Ubuntu 16.04

In order to have Varnish 4 pass on the real client IP to your Apache 2.4 error log in Ubuntu 16.04 , you'll need to edit your Varnish configuration (/etc/varnish/default.vcl…

How to Configure the Mod_Security Core Ruleset in Ubuntu

ModSecurity is a Web Application Firewall, a program that can be used to inspect information as it passes through your web server, intercepting malicious requests before they are processed by your…

Install Apache, MariaDB and PHP7 on Ubuntu 16.04

Ubuntu 16.04 LTS Xenial Xerus comes with PHP7 by default so you don’t have to rely on third-party PPA to get PHP7 installed. In this tutorial, we are going to…

Setup SSL Certificate on Apache and Ubuntu 12.04

How to Create a SSL Certificate on Apache for Ubuntu 12.04 About SSL Certificates A SSL certificate is a way to encrypt a site's information and create a more secure connection. Additionally,…

Install and Configure Varnish with Apache multiple Virtual Hosts on Ubuntu 12.10

About Varnish Varnish is an HTTP accelerator and a useful tool for speeding up a server, especially during a times when there is high traffic to a site. It works by…

Install apache2 mod_security and mod_evasive on Ubuntu 12.04

This guide is intended as a relatively easy step by step guide to: Install and configure Apache2 ModSecurity and mod_evasive modules on Ubuntu 12.04 LTS server. Things have become much easier…

How To Uninstall Apache 2

This is a little tip that we can share to manage your linux box server, below you can see the consolle command to uninstall apache webserver sudo update-rc.d -f apache2 remove

Setting the Hostname (FQDN) in Linux Servers

Here we will show how to set your system's hostname and fully qualified domain name (FQDN). Your hostname should be something unique. Some people name their servers after planets, philosophers, or…

Million of visitors per day with a super cheap php mysql server using nginx and varnish

These instructions are the rather verbose, but hopefully easy enough to follow, steps to build a new Linux server using Varnish and Nginx to build a php application on a…

Blog Categories

Latest news from Hi-Tech world

Optimizing the Robots.txt file for Google

The Robots.txt file serves to give information to Googlebot and other robots that crawl the Internet about the pages and files that should be indexed on our website. Although it…

Use the SRCSET attribute to improve your SEO

There is a new standard HTML attribute that can be used in conjunction with IMG called SRCSET. It is new and important as it allows webmasters to display different images…

What is a JWT token and how does it work?

JWT tokens are a standard used to create application access tokens, enabling user authentication in web applications. Specifically, it follows the RFC 7519 standard. What is a JWT token A JWT token…

How to generate an SSH key and add it to GitHub

In this short tutorial we are going to see how you can generate a new SSH key and add it to GitHub, so you can access your private repositories and…

How to securely access the Dark Web in 15 steps. Second part

Let's continue with the 2nd part of our article in which we try to give you some advice on how to safely and securely explore the dark web. Let's restart from…

How to securely access the Dark Web in 15 steps. First part

The dark web can be a pretty dangerous place if you don't take the right precautions. You can stay relatively safe with a good antivirus and a decent VPN. However,…

How the Internet has Influenced Businesses

The internet has transformed the way that people live their lives. You can access a wealth of knowledge from a device that fits in your hand. Yes, it’s used for…

How to recognise cyber-violence

Cyber-violence, i.e. the digital dimension of violence that mainly affects women and is closely linked to the violence that occurs in the 'real world', is a growing phenomenon that is…

The demise of Third-Party Cookies could decrease marketing effectiveness by up to 30%

In recent years, the digital advertising industry has been undergoing significant transformations. One of the most impactful changes is the impending demise of third-party cookies, which could potentially diminish marketing…

Transitioning from a Home Office to a Virtual Office

The traditional concept of the office has undergone a substantial transformation in recent years. With advancements in technology and changes in work culture, more professionals are embracing remote work options,…

The Best Free SSH Tabbed Terminal Clients for Windows

PuTTy is an emulator for the terminal. It allows you to log into another computer that can be on the same network or accessed via the internet. The basic program…

How to write our own Privacy Policy

In this article we will talk about Privacy Policy statements, how you can write one and implement it on your page. Why did it pop up? These days when we browse on…

Why businesses need to be familiar with APIs

APIs serve as intermediaries between software, allowing them to communicate with each other and perform various functions like data sharing or processing. APIs provide the protocols, definitions, tools, and other…

Examine the 10 key PHP functions I use frequently

PHP never ceases to surprise me with its built-in capabilities. These are a few of the functions I find most fascinating. 1. Levenshtein This function uses the Levenshtein algorithm to calculate the…

How to Write an Amazon Listing That Converts

If you are one of the 2.5 million sellers on Amazon, you’ll know that the platform has incredible potential for profits. However you’ll also know the competition is fierce and…

10 Best Free Ecommerce Solutions On The Market

As the digital landscape continues to evolve, ecommerce has become an essential part of businesses worldwide. Entrepreneurs, small businesses, and even established enterprises are seeking robust and costeffective solutions to…

Alternative tools for graphic design

There are many people today who only use the following for design purposes Canva as it is a really popular software and website and there is no denying that it…

How to make your life easier with ChatGPT?

We have already written several articles about the artificial intelligence that is revolutionising the world, but this time we will talk about how it can help you with everyday tasks…

The best and most amazing Alexa Hacks you should know about

The best and most amazing Alexa Hacks you should know about Alexa, Amazon's talking Artificial Intelligence contained in the Echo, can entertain you and your family if you know the right…

Top tools for social media management

Today we know that having a presence on social media is becoming increasingly important if you want to boost your business and reach a wider audience. But first of all, What is…

Donate Bitcoins

Help us survive and sustain ourselves to allow us to write interesting articles and content for free for you.

Click in the bitcoin logo or scan the code with your wallet app in your mobile phone

projects

Social

Ma-No Web Design and Development

Big Data and Hadoop: an explanation

Introducing Apache Hadoop: The Modern Data Operating System

But what is BIG DATA?

Janeth Kent

Related Posts