mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
Add note on DataFrame recomme^Cation over RDD
This commit is contained in:
parent
34889ce7c8
commit
138cd1054e
|
@ -458,6 +458,8 @@
|
||||||
"source": [
|
"source": [
|
||||||
"## RDDs\n",
|
"## RDDs\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"Note: RDDs are included for completeness. In Spark 1.3, DataFrames were introduced which are recommended over RDDs. Check out the [DataFrames announcement](https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html) for more info.\n",
|
||||||
|
"\n",
|
||||||
"Resilient Distributed Datasets (RDDs) are the fundamental unit of data in Spark. RDDs can be created from a file, from data in memory, or from another RDD. RDDs are immutable.\n",
|
"Resilient Distributed Datasets (RDDs) are the fundamental unit of data in Spark. RDDs can be created from a file, from data in memory, or from another RDD. RDDs are immutable.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"There are two types of RDD operations:\n",
|
"There are two types of RDD operations:\n",
|
||||||
|
|
Loading…
Reference in New Issue
Block a user