Apache Spark Addresses Data Lake Challenges

By Paula Bernier October 12, 2018



Many organizations have taken the plunge with data lakes. But many of these efforts are still just treading water. But there’s good news. Apache Spark can help businesses realize the promise of big data lakes.

McKinsey & Co. notes that data lakes are appealing because “data are loaded in ‘raw’ formats rather than reconfigured as they enter company systems.” That means they can be used for more than just basic capture. However, the firm adds, integrating data lakes with other elements of the technology architecture, setting use rules, and finding appropriate talent can be challenges.

Meanwhile, author and consultant Dan Woods in a January article for Forbes has this to say about the subject: “With the data lake, while companies can store massive amounts and varieties of data, they have been unable to effectively manage that data and allow a large number of people with moderate expertise levels to explore the data, come up with useful queries, extract the signal through some regular production process that becomes part of the way a business runs.

Some companies, like Netflix, have managed to operationalize a data lake …. But in most other businesses, the data lake got stuck at the proof of concept stage. That is why in general, the data lake is now in need of salvation. The point of saving the data lake is to understand how we go from having a repository of data with signals to operationalizing that information to provide value to the business.”

One of the challenges is that many companies with data lakes are using expensive, proprietary solutions for data ingestion, integration, and transformation related to them. But now many of these same operations are putting a toe in the waters with Apache Spark. And that’s a good thing, because Apache Spark is a strong distributed computing framework that handles end-to-end analytics, data processing, and machine learning requirements.

In a webinar next week, Anand Venugopal, AVP and business head at StreamAnalytix, will offer more details on why Apache Spark is the answer.

Venugopal will present information about cloud-based IoT use cases with event-time, late-arrival, and watermarks. He’ll talk about Python-based predictive analytics running on Spark in cloud environments. And he’ll offer information on visual interactive development of Apache Spark Structured Streaming pipelines.

He’ll also talk about using Apache Spark related to on-premises data lakes. That conversation will explore on-premises advanced monitoring of Spark pipelines. He’ll also discuss data quality and ETL with Apache Spark using pre-built operators in on-premises environments.

Last year around this time Ian Pointer for InfoWorld wrote “From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, Facebook, IBM, and Microsoft.”


Executive Editor, TMC

SHARE THIS ARTICLE
Related Articles

How Businesses Harness Technology to Improve Shipping

By: Special Guest    3/5/2019

Technology has changed just about every industry. It is important to understand some of the ways the internet changed business shipping. It has affect…

Read More

CyberSecurity Law Gaining In Popularity

By: Special Guest    2/28/2019

When it comes to becoming a lawyer, many people would be surprised at just how many different areas of practice are out there. From wills and probates…

Read More

10 Trending Tech Gadgets for Home & Businesses

By: Special Guest    2/27/2019

Are you into technology and the cool new gadgets that come along each and every year? Trendy tech gadgets are all the rage because they help simplify …

Read More

How Technology is Helping Our Health

By: Special Guest    2/27/2019

There's no denying technology plays a role in our health. The question is whether it's for better or for worse. Technology has led to many things whic…

Read More

How the Internet Altered the Fashion Industry

By: Special Guest    2/22/2019

The internet is responsible for altering the ways numerous industries operate. The fashion industry is no exception. In the years between the time whe…

Read More