mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
Added more info about the spark shell, context, and RDDs.
This commit is contained in:
parent
f60e5042b4
commit
2d1f1e8c36
|
@ -33,7 +33,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Start the pyspark shell:"
|
||||
"Start the pyspark shell (REPL):"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -44,14 +44,14 @@
|
|||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pyspark"
|
||||
"!pyspark"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"View the spark context:"
|
||||
"View the spark context, the main entry point to the Spark API:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -69,7 +69,13 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## RDDs"
|
||||
"## RDDs\n",
|
||||
"\n",
|
||||
"Resilient Distributed Datasets (RDDs) are the fundamental unit of data in Spark. RDDs can be created from a file, from data in memory, or from another RDD. RDDs are immutable.\n",
|
||||
"\n",
|
||||
"There are two types of RDD operations:\n",
|
||||
"* Actions: Returns values, data is not processed in an RDD until an action is preformed\n",
|
||||
"* Transformations: Defines a new RDD based on the current\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -1284,7 +1290,7 @@
|
|||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.9"
|
||||
"version": "2.7.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
|
Loading…
Reference in New Issue
Block a user