Board logo

标题: Spark 2.0, high level concept [打印本页]

作者: look_w    时间: 2019-2-19 14:56     标题: Spark 2.0, high level concept

Entry point and basic abstraction

For Spark base
main entry point: SparkContext
basic abstraction: RDD

For Spark SQL
main entry point: SparkSession
basic abstraction: DataFrame

For Spark Streaming
Main entry point:
basic abstraction: DStream

For Spark ML
Main entry point:
Core Classes

    Spark base

    pyspark.SparkContext
    Main entry point for Spark functionality.

    pyspark.RDD
    A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

    Spark Streaming

    pyspark.streaming.StreamingContext
    Main entry point for Spark Streaming functionality.

    pyspark.streaming.DStream
    A Discretized Stream (DStream), the basic abstraction in Spark Streaming.

    Spark SQL and DataFrame

    pyspark.sql.SQLContext
    Main entry point for DataFrame and SQL functionality.

    pyspark.sql.DataFrame
    A distributed collection of data grouped into named columns.

Spark running mode
Locally
Cluster
Setup and run/submit job
Locally
Setup
Spark shell and submit job

    ./bin/spark-shell --master local[2]
    OR
    ./bin/pyspark --master local[2]

Submit job

    ./bin/spark-submit examples/src/main/python/pi.py 10
     
    OR
    ./bin/spark-submit examples/src/main/r/dataframe.R

Spark Stand alone cluster
Spark YARN cluster
What ?

:paste
:help

Spark context available as sc.
SQL context available as sqlContext.

Read csv files as Dataframe in Apache Spark with spark-csv package. after loading data to Dataframe save dataframe to parquetfile.

    val df = sqlContext.read
          .format("com.databricks.spark.csv")
          .option("header", "true")
          .option("inferSchema", "true")
          .option("mode", "DROPMALFORMED")
          .load("/home/myuser/data/log/*.csv")
    df.saveAsParquetFile("/home/myuser/data.parquet")

    val df_1 = sqlContext.read.parquet("/Users/user_name/Work/tmp/sample.parquet")
    df.dtypes
    df.show()




欢迎光临 电子技术论坛_中国专业的电子工程师学习交流社区-中电网技术论坛 (http://bbs.eccn.com/) Powered by Discuz! 7.0.0