diff --git a/spark/spark.ipynb b/spark/spark.ipynb index 3cfe1dd..e84259e 100644 --- a/spark/spark.ipynb +++ b/spark/spark.ipynb @@ -458,6 +458,8 @@ "source": [ "## RDDs\n", "\n", + "Note: RDDs are included for completeness. In Spark 1.3, DataFrames were introduced which are recommended over RDDs. Check out the [DataFrames announcement](https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html) for more info.\n", + "\n", "Resilient Distributed Datasets (RDDs) are the fundamental unit of data in Spark. RDDs can be created from a file, from data in memory, or from another RDD. RDDs are immutable.\n", "\n", "There are two types of RDD operations:\n",