Added more info about the spark shell, context, and RDDs.

This commit is contained in:
Donne Martin 2015-06-09 17:57:18 -04:00
parent f60e5042b4
commit 2d1f1e8c36

View File

@ -33,7 +33,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Start the pyspark shell:" "Start the pyspark shell (REPL):"
] ]
}, },
{ {
@ -44,14 +44,14 @@
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"pyspark" "!pyspark"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"View the spark context:" "View the spark context, the main entry point to the Spark API:"
] ]
}, },
{ {
@ -69,7 +69,13 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## RDDs" "## RDDs\n",
"\n",
"Resilient Distributed Datasets (RDDs) are the fundamental unit of data in Spark. RDDs can be created from a file, from data in memory, or from another RDD. RDDs are immutable.\n",
"\n",
"There are two types of RDD operations:\n",
"* Actions: Returns values, data is not processed in an RDD until an action is preformed\n",
"* Transformations: Defines a new RDD based on the current\n"
] ]
}, },
{ {
@ -1284,7 +1290,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython2", "pygments_lexer": "ipython2",
"version": "2.7.9" "version": "2.7.10"
} }
}, },
"nbformat": 4, "nbformat": 4,