Big Data and Hadoop: an explanation

by Janeth Kent Date: 24-05-2013 big data hadoop databases apache oracle

Our world is a potential treasure trove for data scientists and analysts who can comb through massive amounts of data for new insights, research breakthroughs, undetected fraud or other yet-to-be-discovered purposes. But it also presents a problem for traditional relational databases and analytics tools, which were not built to handle the data being created. Another challenge is the mixed sources and formats, which include XML, log files, objects, text, binary and more.

"We have a lot of data in structured databases, traditional relational databases now, but we have data coming in from so many sources that trying to categorize that, classify it and get it entered into a traditional database is beyond the scope of our capabilities," said Jack Collins, director of the Advanced Biomedical Computing Center at the Frederick National Laboratory for Cancer Research. "Computer technology is growing rapidly, but the number of [full-time equivalent positions] that we have to work with this is not growing. We have to find a different way."

You can't have a conversation about Big Data for very long without talking about the elephant: Hadoop.

Hadoop is an open source software platform managed by the Apache Software Foundation that's very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what makes it special?

Hadoop is more than just a faster, cheaper database and analytics tool. In some cases, the Hadoop framework lets users query datasets in previously unimaginable ways.

Basically, it's a way of storing enormous data sets across distributed clusters of servers and then running "distributed" analysis applications in each cluster.

Here's how Apache describes it:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly available service on top of a cluster of computers, each of which may be prone to failures.

Introducing Apache Hadoop: The Modern Data Operating System

But what is BIG DATA?

Big data is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured. Much has been written on the big data trend and how it can serve as the basis for innovation, differentiation and growth.

In this video, Antony Wildey from Oracle Retail explains what Big Data is, and why effective management of data is vital to retailers in gaining actionable insight into how to improve their business. It includes how Oracle can help businesses to use data from social networking sites such as Facebook and Twitter, and use sentiment analysis seamlessly to provide insight on product demand.

by Janeth Kent Date: 24-05-2013 big data hadoop databases apache oracle hits : 6948

How To Use Varnish As A Highly Available Load Balancer On Ubuntu 20.04 With SSL

Load balancing with high availability can be tough to set up. Fortunately, Varnish HTTP Cache server provides a dead simple highly available load balancer that will also work as a…

htaccess Rules to Help Protect from SQL Injections and XSS

This list of rules by no means is a sure bet to secure your web services, but it will help in preventing script-kiddings from doing some basic browsing around. MySQL injection…

Must-Have htaccess Tips for you to Avoid Duplicate Content on Your Site

In order to be able to implement these tips it is necessary that your Apache server already has the mod_rewrite module activated. mod_rewrite and .htaccess are used together so that…

How to write real client IP address in error Log with Varnish 4 and Apache 2.4 in Ubuntu 16.04

In order to have Varnish 4 pass on the real client IP to your Apache 2.4 error log in Ubuntu 16.04 , you'll need to edit your Varnish configuration (/etc/varnish/default.vcl…

How to Configure the Mod_Security Core Ruleset in Ubuntu

ModSecurity is a Web Application Firewall, a program that can be used to inspect information as it passes through your web server, intercepting malicious requests before they are processed by your…

Install Apache, MariaDB and PHP7 on Ubuntu 16.04

Ubuntu 16.04 LTS Xenial Xerus comes with PHP7 by default so you don’t have to rely on third-party PPA to get PHP7 installed. In this tutorial, we are going to…

Setup SSL Certificate on Apache and Ubuntu 12.04

How to Create a SSL Certificate on Apache for Ubuntu 12.04 About SSL Certificates A SSL certificate is a way to encrypt a site's information and create a more secure connection. Additionally,…

Install and Configure Varnish with Apache multiple Virtual Hosts on Ubuntu 12.10

About Varnish Varnish is an HTTP accelerator and a useful tool for speeding up a server, especially during a times when there is high traffic to a site. It works by…

Install apache2 mod_security and mod_evasive on Ubuntu 12.04

This guide is intended as a relatively easy step by step guide to: Install and configure Apache2 ModSecurity and mod_evasive modules on Ubuntu 12.04 LTS server. Things have become much easier…

How To Uninstall Apache 2

This is a little tip that we can share to manage your linux box server, below you can see the consolle command to uninstall apache webserver sudo update-rc.d -f apache2 remove

Setting the Hostname (FQDN) in Linux Servers

Here we will show how to set your system's hostname and fully qualified domain name (FQDN). Your hostname should be something unique. Some people name their servers after planets, philosophers, or…

Million of visitors per day with a super cheap php mysql server using nginx and varnish

These instructions are the rather verbose, but hopefully easy enough to follow, steps to build a new Linux server using Varnish and Nginx to build a php application on a…

Blog Categories

Latest news from Hi-Tech world

Unlock Hidden SmartPhone Features with these Secret Codes

Unstructured Supplementary Service Data (USSD), sometimes known as "quick codes" or "feature codes", is an extra-UI protocol, which allows people to access hidden features. This protocol was originally created for…

How to Unlock Secret Games in Chrome, Edge and Firefox

Your web browser is full of secrets. I usually spend a lot of time studying new features that I can unlock through pages like chrome://flags and about:config in the browser,…

Secret iPhone codes to unlock hidden features

We love that our devices have hidden features. It's fun to learn something new about the technology we use every day, to discover those little features that aren't advertised by the…

Top best AI Image Generators: unlocking creativity with Artificial Intelligence

Artificial intelligence (AI) is revolutionizing not just business and healthcare, but also the creative industries by introducing a new era of AI-generated art. The accessibility of AI technologies and tools…

How Can Small Businesses Reduce Paper Usage?

Even in this digital age, many businesses are still using far more paper and cardboard than necessary. What’s more, not near enough companies or individuals are recycling their waste properly.…

How To Grow Your Small Business

A business that stays still will stagnate. To be successful and to have a business to be proud of, you need to ensure that it grows. This is not always…

The AI Revolution: How Are Small Businesses Beginning To Implement AI?

As the technological world moves forward with new advancements, so too does the business world evolve and adapt to integrate those advancements to optimise their operations. Artificial Intelligence has been…

Why businesses need to be familiar with APIs

APIs serve as intermediaries between software, allowing them to communicate with each other and perform various functions like data sharing or processing. APIs provide the protocols, definitions, tools, and other…

Suggestions to Improve the Efficiency of a Small Business

If you have a small business, it means that every resource matters. You have less room for error due to a lack of manpower as well as other aspects. So…

Google Dorks: How to find interesting data and search like hacker

Go the words Google and Hacking together? Well if you thought that we will learn how to use hack Google, you might be wrong. But we can Use Google search engine…

What is a JWT token and how does it work?

JWT tokens are a standard used to create application access tokens, enabling user authentication in web applications. Specifically, it follows the RFC 7519 standard. What is a JWT token A JWT token…

Infinite scrolling with native JavaScript using the Fetch API

I have long wanted to talk about how infinite scroll functionality can be implemented in a list of items that might be on any Web page. Infinite scroll is a technique…

PHP Recursive Backup of MySql Database

Snippet: This script can be used to make backup of your MySql database, you can use the script in conjunction with cronjobs $user = 'myuser'; $passwd = 'mypass'; $host = 'myhost'; $db = 'mydb'; //…

How to include a JavaScript file in another JavaScript file

Some time ago we wrote about how to Import one JS file into another in the plain JS, those techniques described were quite old and in the modern era javascript…

10 Best Free Ecommerce Solutions On The Market

As the digital landscape continues to evolve, ecommerce has become an essential part of businesses worldwide. Entrepreneurs, small businesses, and even established enterprises are seeking robust and costeffective solutions to…

Optimizing the Robots.txt file for Google

The Robots.txt file serves to give information to Googlebot and other robots that crawl the Internet about the pages and files that should be indexed on our website. Although it…

Use the SRCSET attribute to improve your SEO

There is a new standard HTML attribute that can be used in conjunction with IMG called SRCSET. It is new and important as it allows webmasters to display different images…

How to generate an SSH key and add it to GitHub

In this short tutorial we are going to see how you can generate a new SSH key and add it to GitHub, so you can access your private repositories and…

How to securely access the Dark Web in 15 steps. Second part

Let's continue with the 2nd part of our article in which we try to give you some advice on how to safely and securely explore the dark web. Let's restart from…

How to securely access the Dark Web in 15 steps. First part

The dark web can be a pretty dangerous place if you don't take the right precautions. You can stay relatively safe with a good antivirus and a decent VPN. However,…