Study Learn Grow
Learning Path: Data Science With Apache Spark 2

Learning Path: Data Science With Apache Spark 2


This Learning Path begins with an introduction to Apache Spark. We first cover the basics of Spark, introduce SparkR, then look at the charting and plotting features of Python in conjunction with Spark data processing, and finally Spark's data processing libraries.

Overview

Description
Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.

This Learning Path begins with an introduction to Apache Spark. We first cover the basics of Spark, introduce SparkR, then look at the charting and plotting features of Python in conjunction with Spark data processing, and finally Spark's data processing libraries. We then develop a real-world Spark application. Next, we enable you to become comfortable and confident working with Spark for data science by exploring Spark's data science libraries on a dataset of tweets.

Begin your journey into fast, large-scale, and distributed data processing using Spark with this Learning Path.

About the Authors

Rajanarayanan Thottuvaikkatumana

Rajanarayanan Thottuvaikkatumana, Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. He has lived and worked in India, Singapore, and the USA, and is presently based out of the UK. His experience includes architecting, designing, and developing software applications. He has worked on various technologies including major databases, application development platforms, web technologies, and big data technologies. Since 2000, he has been working mainly in Java related technologies, and does heavy-duty server-side programming in Java and Scala. He has worked on very highly concurrent, highly distributed, and high transaction volume systems. Currently he is building a next generation Hadoop YARN-based data processing platform and an application suite built with Spark using Scala.

Raj holds one master's degree in Mathematics, one master's degree in Computer Information Systems and has many certifications in ITIL and cloud computing to his credit. Raj is the author of Cassandra Design Patterns - Second Edition, published by Packt.

When not working on the assignments his day job demands, Raj is an avid listener to classical music and watches a lot of tennis.

Eric Charles

Eric Charles has 10 years’ experience in the field of Data Science and is the founder of Datalayer (http://datalayer.io/docker), a social network for Data Scientists. He is passionate about using software and mathematics to help companies get insights from data.

His typical day includes building efficient processing with advanced machine learning algorithms, easy SQL, streaming and graph analytics. He also focuses a lot on visualization and result sharing.

He is passionate about open source and is an active Apache Member. He regularly gives talks to corporate clients and at open source events. He can be contacted on Twitter on @echarles.

Basic knowledge
Requires basic knowledge of either Python or R

Course Information

Requires basic knowledge of either Python or R

What will you learn
Get to know the fundamentals of Spark 2.0 and the Spark programming model using Scala and Python
Know how to use Spark SQL and DataFrames using Scala and Python
Get an introduction to Spark programming using R
Perform Spark data processing, charting, and plotting using Python
Get acquainted with Spark stream processing using Scala and Python
Be introduced to machine learning with Spark using Scala and Python
Get started with graph processing with Spark using Scala
Develop a complete Spark application
Understand the Spark programming language and its ecosystem of packages in Data Science
Obtain and clean data before processing it
Understand the Spark machine learning algorithm to build a simple pipeline
Work with interactive visualization packages in Spark
Apply data mining techniques on the available data sets
Build a recommendation engine

Application developers, data scientists, or big data architects interested in combining the data processing power of Apache Spark will find this course to be very useful. As implementations of Apache Spark will be shown with Scala and Python, some programming knowledge on these languages will be needed. This course is for anyone who wants to work with Spark on large and complex datasets. A basic knowledge about statistics and computational mathematics is expected.
With the help of real-world use cases on the main features of Spark, this course offers an easy introduction to the framework. This practical hands-on course covers the fundamentals of Spark needed to get to grips with data science through a single dataset. It expands on the next learning curve for those comfortable with Spark programming who are looking to apply Spark in the field of data science.

• Lifetime Access to Each Course
• Certificate on Completion of Course
• No Extra Charges Or Admin Fees
• Easy Access to Courses
• High Priority Support After Sales.
• Big Discounts on Individual Courses

Course Specifications

IT and Computing courses are available to study on our learning platform. 

See All Courses

Adult education is the non-credential activity of gaining skills and improved education. 

See All Courses

Online education is electronically supported learning that relies on the Internet for teacher/student interaction. 

See All Courses

A short course is a learning programme that gives you combined content or specific skills training in a short period of time. Short courses often lean towards the more practical side of things and have less theory than a university course – this gives you a more hands-on experience within your field of interest.

See All Courses

Course duration is 24 hours.

See All Courses

Study Learn Grow

Related Jobs