Add note on DataFrame recomme^Cation over RDD

This commit is contained in:
Donne Martin 2016-02-21 06:21:59 -05:00
parent 34889ce7c8
commit 138cd1054e

View File

@ -458,6 +458,8 @@
"source": [
"## RDDs\n",
"\n",
"Note: RDDs are included for completeness. In Spark 1.3, DataFrames were introduced which are recommended over RDDs. Check out the [DataFrames announcement](https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html) for more info.\n",
"\n",
"Resilient Distributed Datasets (RDDs) are the fundamental unit of data in Spark. RDDs can be created from a file, from data in memory, or from another RDD. RDDs are immutable.\n",
"\n",
"There are two types of RDD operations:\n",