Apache Spark Addresses Data Lake Challenges

By Paula Bernier October 12, 2018



Many organizations have taken the plunge with data lakes. But many of these efforts are still just treading water. But there’s good news. Apache Spark can help businesses realize the promise of big data lakes.

McKinsey & Co. notes that data lakes are appealing because “data are loaded in ‘raw’ formats rather than reconfigured as they enter company systems.” That means they can be used for more than just basic capture. However, the firm adds, integrating data lakes with other elements of the technology architecture, setting use rules, and finding appropriate talent can be challenges.

Meanwhile, author and consultant Dan Woods in a January article for Forbes has this to say about the subject: “With the data lake, while companies can store massive amounts and varieties of data, they have been unable to effectively manage that data and allow a large number of people with moderate expertise levels to explore the data, come up with useful queries, extract the signal through some regular production process that becomes part of the way a business runs.

Some companies, like Netflix, have managed to operationalize a data lake …. But in most other businesses, the data lake got stuck at the proof of concept stage. That is why in general, the data lake is now in need of salvation. The point of saving the data lake is to understand how we go from having a repository of data with signals to operationalizing that information to provide value to the business.”

One of the challenges is that many companies with data lakes are using expensive, proprietary solutions for data ingestion, integration, and transformation related to them. But now many of these same operations are putting a toe in the waters with Apache Spark. And that’s a good thing, because Apache Spark is a strong distributed computing framework that handles end-to-end analytics, data processing, and machine learning requirements.

In a webinar next week, Anand Venugopal, AVP and business head at StreamAnalytix, will offer more details on why Apache Spark is the answer.

Venugopal will present information about cloud-based IoT use cases with event-time, late-arrival, and watermarks. He’ll talk about Python-based predictive analytics running on Spark in cloud environments. And he’ll offer information on visual interactive development of Apache Spark Structured Streaming pipelines.

He’ll also talk about using Apache Spark related to on-premises data lakes. That conversation will explore on-premises advanced monitoring of Spark pipelines. He’ll also discuss data quality and ETL with Apache Spark using pre-built operators in on-premises environments.

Last year around this time Ian Pointer for InfoWorld wrote “From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, Facebook, IBM, and Microsoft.”


Executive Editor, TMC

SHARE THIS ARTICLE
Related Articles

What Actually Matters to Employers when Hiring

By: Special Guest    11/7/2019

If you're looking for a new job, you're probably asking yourself what employers are really looking for and what credentials actually matter to them. W…

Read More

DDoS Attacks Enter the Terabit Era: Prepare for the Worst

By: Arti Loftus    11/6/2019

According to NetScout's 14th Annual Worldwide Infrastructure Security Report, the global max attack size increased 273 percent in 2018, even as we rea…

Read More

Mavatar Technologies Becomes Google Cloud Technology Partner

By: TMC    10/3/2019

Mavatar Technologies Inc. today announces that it has become a Google Cloud Technology Partner. Google Cloud customers can now buy the flagship mCart …

Read More

How Businesses Harness Technology to Improve Shipping

By: Special Guest    3/5/2019

Technology has changed just about every industry. It is important to understand some of the ways the internet changed business shipping. It has affect…

Read More

CyberSecurity Law Gaining In Popularity

By: Special Guest    2/28/2019

When it comes to becoming a lawyer, many people would be surprised at just how many different areas of practice are out there. From wills and probates…

Read More