Learning Spark, 2nd Edition pdf epub mobi txt 电子书下载 2025

简体网页||繁体网页

☆☆☆☆☆

Holden Karau是Databricks的软件开发工程师，活跃于开源社区。她还著有《Spark快速数据处理》。

Andy Konwinski是Databricks联合创始人，Apache Spark项目技术专家，还是Apache Mesos项目的联合发起人。

Patrick Wendell是Databricks联合创始人，也是Apache Spark项目技术专家。他还负责维护Spark核心引擎的几个子系统。

Matei Zaharia是Databricks的CTO，同时也是Apache Spark项目发起人以及Apache基金会副主席。

出版者:O'Reilly Media

作者:Tathagata Das

出品人:

页数:300

译者:

出版时间:2020-1-10

价格:USD 35.99

装帧:Paperback

isbn号码:9781492050049

丛书系列:

图书标签:

Spark
计算机科学
分布式
软件工程
数据分析
大数据
BigData

下载链接在页面底部

facebook linkedin mastodon messenger pinterest reddit telegram twitter viber vkontakte whatsapp 复制链接

想要找书就要到小哈图书下载中心

qciss.net

立刻按 ctrl+D收藏本页

你会得到大惊喜!!

Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark.

Updated to emphasize new features in Spark 2.x., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you’ll be able to:

Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets

Peek under the hood of the Spark SQL engine to understand Spark transformations and performance

Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI

Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka

Perform analytics on batch and streaming data using Structured Streaming

Build reliable data pipelines with open source Delta Lake and Spark

Develop machine learning pipelines with MLlib and productionize models using MLflow

Use open source Pandas framework Koalas and Spark for data transformation and feature engineering