mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
833 lines
27 KiB
Python
833 lines
27 KiB
Python
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## এক্সপ্লোরেটরি ডেটা অ্যানালাইসিস \n",
|
||
"রিভিশন ৪"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"আসলে আমাদের ডেটার ভেতরে কী আছে সেটা না জানলে এর থেকে প্রেডিকশন বের করবো কী করে? সেকারণে এই এক্সপ্লোরেশন। ডেটা নিয়ে একটু ঘাঁটাঘাঁটি করলে এর ভেতরের অনেক ধারণা পাওয়া যায় যেটা মডেল সিলেকশন অথবা ফীচারগুলো বুঝতে সুবিধা হয়। আগের চ্যাপ্টারের ভেতরে কিছুটা \"এক্সপ্লোরেটরি ডেটা অ্যানালাইসিস\" করলেও এখানে সেটাকে আরেকটু খোলাসা করছি। "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### ডাটার শেপ, মানে কতোটা ইনস্ট্যান্স?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_samples, n_features = iris.data.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"150"
|
||
]
|
||
},
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"n_samples"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"4"
|
||
]
|
||
},
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"n_features"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Shape of data: (150, 4)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(\"Shape of data:\", iris['data'].shape)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"কোন ডাটা মিসিং নেই "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"True"
|
||
]
|
||
},
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(iris.target) == n_samples"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"source": [
|
||
"<img src=\"assets/data5.png\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### ফিচারগুলোর নাম \n",
|
||
"\n",
|
||
"ওপরের ছবিতে চারটা ফিচারের নাম দেখেছি। চলুন দেখি সেগুলো আমাদের ডাটাসেট অবজেক্টে। iris এর পর ডট নোটেশন ব্যবহার করে ডাকি একটা \"কী\" ভ্যালুকে। feature_names হচ্ছে আমাদের iris.keys() থেকে পাওয়া একটা অ্যাট্রিবিউট।"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"['sepal length (cm)',\n",
|
||
" 'sepal width (cm)',\n",
|
||
" 'petal length (cm)',\n",
|
||
" 'petal width (cm)']"
|
||
]
|
||
},
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"iris.feature_names"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(iris['feature_names'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### টার্গেট অর্থাৎ কী প্রেডিক্ট করতে চাই আমরা?\n",
|
||
"\n",
|
||
"অনেকভাবেই করা সম্ভব। তবে print ফরম্যাটিং এ ভালো কাজ করে। "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array(['setosa', 'versicolor', 'virginica'],\n",
|
||
" dtype='<U10')"
|
||
]
|
||
},
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"iris.target_names"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"['setosa' 'versicolor' 'virginica']\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(iris.target_names)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"['setosa', 'versicolor', 'virginica']"
|
||
]
|
||
},
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"list(iris.target_names)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Target names: ['setosa' 'versicolor' 'virginica']\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(\"Target names:\", iris['target_names'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### কি আছে ডাটা অ্যারে আর টার্গেট অ্যারে এর ভেতর?\n",
|
||
"\n",
|
||
"এখানে অ্যারে নিয়ে কাজ হচ্ছে। iris.dataতে সেই চারটা ১. পেটাল দৈর্ঘ্য, ২. পেটাল প্রস্থ, ৩. সিপাল দৈর্ঘ্য, ৪. সিপাল প্রস্থ মাপগুলো পাশাপাশি দেয়া আছে। শুরুতে দেখি প্রথম রেকর্ড। এরপর পুরো রেকর্ড। "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([ 5.1, 3.5, 1.4, 0.2])"
|
||
]
|
||
},
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"iris.data[0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([[ 5.1, 3.5, 1.4, 0.2],\n",
|
||
" [ 4.9, 3. , 1.4, 0.2],\n",
|
||
" [ 4.7, 3.2, 1.3, 0.2],\n",
|
||
" [ 4.6, 3.1, 1.5, 0.2],\n",
|
||
" [ 5. , 3.6, 1.4, 0.2],\n",
|
||
" [ 5.4, 3.9, 1.7, 0.4],\n",
|
||
" [ 4.6, 3.4, 1.4, 0.3],\n",
|
||
" [ 5. , 3.4, 1.5, 0.2],\n",
|
||
" [ 4.4, 2.9, 1.4, 0.2],\n",
|
||
" [ 4.9, 3.1, 1.5, 0.1],\n",
|
||
" [ 5.4, 3.7, 1.5, 0.2],\n",
|
||
" [ 4.8, 3.4, 1.6, 0.2],\n",
|
||
" [ 4.8, 3. , 1.4, 0.1],\n",
|
||
" [ 4.3, 3. , 1.1, 0.1],\n",
|
||
" [ 5.8, 4. , 1.2, 0.2],\n",
|
||
" [ 5.7, 4.4, 1.5, 0.4],\n",
|
||
" [ 5.4, 3.9, 1.3, 0.4],\n",
|
||
" [ 5.1, 3.5, 1.4, 0.3],\n",
|
||
" [ 5.7, 3.8, 1.7, 0.3],\n",
|
||
" [ 5.1, 3.8, 1.5, 0.3],\n",
|
||
" [ 5.4, 3.4, 1.7, 0.2],\n",
|
||
" [ 5.1, 3.7, 1.5, 0.4],\n",
|
||
" [ 4.6, 3.6, 1. , 0.2],\n",
|
||
" [ 5.1, 3.3, 1.7, 0.5],\n",
|
||
" [ 4.8, 3.4, 1.9, 0.2],\n",
|
||
" [ 5. , 3. , 1.6, 0.2],\n",
|
||
" [ 5. , 3.4, 1.6, 0.4],\n",
|
||
" [ 5.2, 3.5, 1.5, 0.2],\n",
|
||
" [ 5.2, 3.4, 1.4, 0.2],\n",
|
||
" [ 4.7, 3.2, 1.6, 0.2],\n",
|
||
" [ 4.8, 3.1, 1.6, 0.2],\n",
|
||
" [ 5.4, 3.4, 1.5, 0.4],\n",
|
||
" [ 5.2, 4.1, 1.5, 0.1],\n",
|
||
" [ 5.5, 4.2, 1.4, 0.2],\n",
|
||
" [ 4.9, 3.1, 1.5, 0.1],\n",
|
||
" [ 5. , 3.2, 1.2, 0.2],\n",
|
||
" [ 5.5, 3.5, 1.3, 0.2],\n",
|
||
" [ 4.9, 3.1, 1.5, 0.1],\n",
|
||
" [ 4.4, 3. , 1.3, 0.2],\n",
|
||
" [ 5.1, 3.4, 1.5, 0.2],\n",
|
||
" [ 5. , 3.5, 1.3, 0.3],\n",
|
||
" [ 4.5, 2.3, 1.3, 0.3],\n",
|
||
" [ 4.4, 3.2, 1.3, 0.2],\n",
|
||
" [ 5. , 3.5, 1.6, 0.6],\n",
|
||
" [ 5.1, 3.8, 1.9, 0.4],\n",
|
||
" [ 4.8, 3. , 1.4, 0.3],\n",
|
||
" [ 5.1, 3.8, 1.6, 0.2],\n",
|
||
" [ 4.6, 3.2, 1.4, 0.2],\n",
|
||
" [ 5.3, 3.7, 1.5, 0.2],\n",
|
||
" [ 5. , 3.3, 1.4, 0.2],\n",
|
||
" [ 7. , 3.2, 4.7, 1.4],\n",
|
||
" [ 6.4, 3.2, 4.5, 1.5],\n",
|
||
" [ 6.9, 3.1, 4.9, 1.5],\n",
|
||
" [ 5.5, 2.3, 4. , 1.3],\n",
|
||
" [ 6.5, 2.8, 4.6, 1.5],\n",
|
||
" [ 5.7, 2.8, 4.5, 1.3],\n",
|
||
" [ 6.3, 3.3, 4.7, 1.6],\n",
|
||
" [ 4.9, 2.4, 3.3, 1. ],\n",
|
||
" [ 6.6, 2.9, 4.6, 1.3],\n",
|
||
" [ 5.2, 2.7, 3.9, 1.4],\n",
|
||
" [ 5. , 2. , 3.5, 1. ],\n",
|
||
" [ 5.9, 3. , 4.2, 1.5],\n",
|
||
" [ 6. , 2.2, 4. , 1. ],\n",
|
||
" [ 6.1, 2.9, 4.7, 1.4],\n",
|
||
" [ 5.6, 2.9, 3.6, 1.3],\n",
|
||
" [ 6.7, 3.1, 4.4, 1.4],\n",
|
||
" [ 5.6, 3. , 4.5, 1.5],\n",
|
||
" [ 5.8, 2.7, 4.1, 1. ],\n",
|
||
" [ 6.2, 2.2, 4.5, 1.5],\n",
|
||
" [ 5.6, 2.5, 3.9, 1.1],\n",
|
||
" [ 5.9, 3.2, 4.8, 1.8],\n",
|
||
" [ 6.1, 2.8, 4. , 1.3],\n",
|
||
" [ 6.3, 2.5, 4.9, 1.5],\n",
|
||
" [ 6.1, 2.8, 4.7, 1.2],\n",
|
||
" [ 6.4, 2.9, 4.3, 1.3],\n",
|
||
" [ 6.6, 3. , 4.4, 1.4],\n",
|
||
" [ 6.8, 2.8, 4.8, 1.4],\n",
|
||
" [ 6.7, 3. , 5. , 1.7],\n",
|
||
" [ 6. , 2.9, 4.5, 1.5],\n",
|
||
" [ 5.7, 2.6, 3.5, 1. ],\n",
|
||
" [ 5.5, 2.4, 3.8, 1.1],\n",
|
||
" [ 5.5, 2.4, 3.7, 1. ],\n",
|
||
" [ 5.8, 2.7, 3.9, 1.2],\n",
|
||
" [ 6. , 2.7, 5.1, 1.6],\n",
|
||
" [ 5.4, 3. , 4.5, 1.5],\n",
|
||
" [ 6. , 3.4, 4.5, 1.6],\n",
|
||
" [ 6.7, 3.1, 4.7, 1.5],\n",
|
||
" [ 6.3, 2.3, 4.4, 1.3],\n",
|
||
" [ 5.6, 3. , 4.1, 1.3],\n",
|
||
" [ 5.5, 2.5, 4. , 1.3],\n",
|
||
" [ 5.5, 2.6, 4.4, 1.2],\n",
|
||
" [ 6.1, 3. , 4.6, 1.4],\n",
|
||
" [ 5.8, 2.6, 4. , 1.2],\n",
|
||
" [ 5. , 2.3, 3.3, 1. ],\n",
|
||
" [ 5.6, 2.7, 4.2, 1.3],\n",
|
||
" [ 5.7, 3. , 4.2, 1.2],\n",
|
||
" [ 5.7, 2.9, 4.2, 1.3],\n",
|
||
" [ 6.2, 2.9, 4.3, 1.3],\n",
|
||
" [ 5.1, 2.5, 3. , 1.1],\n",
|
||
" [ 5.7, 2.8, 4.1, 1.3],\n",
|
||
" [ 6.3, 3.3, 6. , 2.5],\n",
|
||
" [ 5.8, 2.7, 5.1, 1.9],\n",
|
||
" [ 7.1, 3. , 5.9, 2.1],\n",
|
||
" [ 6.3, 2.9, 5.6, 1.8],\n",
|
||
" [ 6.5, 3. , 5.8, 2.2],\n",
|
||
" [ 7.6, 3. , 6.6, 2.1],\n",
|
||
" [ 4.9, 2.5, 4.5, 1.7],\n",
|
||
" [ 7.3, 2.9, 6.3, 1.8],\n",
|
||
" [ 6.7, 2.5, 5.8, 1.8],\n",
|
||
" [ 7.2, 3.6, 6.1, 2.5],\n",
|
||
" [ 6.5, 3.2, 5.1, 2. ],\n",
|
||
" [ 6.4, 2.7, 5.3, 1.9],\n",
|
||
" [ 6.8, 3. , 5.5, 2.1],\n",
|
||
" [ 5.7, 2.5, 5. , 2. ],\n",
|
||
" [ 5.8, 2.8, 5.1, 2.4],\n",
|
||
" [ 6.4, 3.2, 5.3, 2.3],\n",
|
||
" [ 6.5, 3. , 5.5, 1.8],\n",
|
||
" [ 7.7, 3.8, 6.7, 2.2],\n",
|
||
" [ 7.7, 2.6, 6.9, 2.3],\n",
|
||
" [ 6. , 2.2, 5. , 1.5],\n",
|
||
" [ 6.9, 3.2, 5.7, 2.3],\n",
|
||
" [ 5.6, 2.8, 4.9, 2. ],\n",
|
||
" [ 7.7, 2.8, 6.7, 2. ],\n",
|
||
" [ 6.3, 2.7, 4.9, 1.8],\n",
|
||
" [ 6.7, 3.3, 5.7, 2.1],\n",
|
||
" [ 7.2, 3.2, 6. , 1.8],\n",
|
||
" [ 6.2, 2.8, 4.8, 1.8],\n",
|
||
" [ 6.1, 3. , 4.9, 1.8],\n",
|
||
" [ 6.4, 2.8, 5.6, 2.1],\n",
|
||
" [ 7.2, 3. , 5.8, 1.6],\n",
|
||
" [ 7.4, 2.8, 6.1, 1.9],\n",
|
||
" [ 7.9, 3.8, 6.4, 2. ],\n",
|
||
" [ 6.4, 2.8, 5.6, 2.2],\n",
|
||
" [ 6.3, 2.8, 5.1, 1.5],\n",
|
||
" [ 6.1, 2.6, 5.6, 1.4],\n",
|
||
" [ 7.7, 3. , 6.1, 2.3],\n",
|
||
" [ 6.3, 3.4, 5.6, 2.4],\n",
|
||
" [ 6.4, 3.1, 5.5, 1.8],\n",
|
||
" [ 6. , 3. , 4.8, 1.8],\n",
|
||
" [ 6.9, 3.1, 5.4, 2.1],\n",
|
||
" [ 6.7, 3.1, 5.6, 2.4],\n",
|
||
" [ 6.9, 3.1, 5.1, 2.3],\n",
|
||
" [ 5.8, 2.7, 5.1, 1.9],\n",
|
||
" [ 6.8, 3.2, 5.9, 2.3],\n",
|
||
" [ 6.7, 3.3, 5.7, 2.5],\n",
|
||
" [ 6.7, 3. , 5.2, 2.3],\n",
|
||
" [ 6.3, 2.5, 5. , 1.9],\n",
|
||
" [ 6.5, 3. , 5.2, 2. ],\n",
|
||
" [ 6.2, 3.4, 5.4, 2.3],\n",
|
||
" [ 5.9, 3. , 5.1, 1.8]])"
|
||
]
|
||
},
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"iris.data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
|
||
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
|
||
" 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
|
||
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
|
||
" 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
|
||
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
|
||
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])"
|
||
]
|
||
},
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"iris.target"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"আমাদের \"ফিচার\" আর \"রেসপন্স\" অর্থাৎ \"টার্গেট\" কি ধরণের কন্টেইনারে আছে, সেটা জানতে চাইলাম এখানে। ঠিক ধরেছেন। \"নামপাই অ্যারে\"।"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'numpy.ndarray'>\n",
|
||
"<class 'numpy.ndarray'>\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(type(iris.data))\n",
|
||
"print(type(iris.target))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"ফিচারের ম্যাট্রিক্স কি? (১ম ডাইমেনশন = অবজার্ভেশনের সংখ্যা, ২য় = ফিচারের সংখ্যা)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"(150, 4)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(iris.data.shape)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"টার্গেট ম্যাট্রিক্স কি? (১ম ডাইমেনশন = লেবেল, টার্গেট, রেসপন্স)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"(150,)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(iris.target.shape)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Shape of target: (150,)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(\"Shape of target:\", iris['target'].shape)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### সাইকিট-লার্ন এ ডাটা হ্যান্ডলিং এর নিয়ম \n",
|
||
"\n",
|
||
"১. এখানে \"ফিচার\" এবং \"রেসপন্স\" দুটো আলাদা অবজেক্ট \n",
|
||
"(আমাদের এখানে দেখুন, \"ফিচার\" এবং \"রেসপন্স\" মানে \"টার্গেট\" আলাদা অবজেক্ট)\n",
|
||
"\n",
|
||
"২. \"ফিচার\" এবং \"রেসপন্স\" দুটোকেই সংখ্যা হতে হবে \n",
|
||
"(আমাদের এখানে দুটোই সংখ্যার, দুটোর ম্যাট্রিক্স ডাইমেনশন হচ্ছে (১৫০ x ৪) এবং (১৫০ x ১)\n",
|
||
"\n",
|
||
"৩. \"ফিচার\" এবং \"রেসপন্স\" দুটোকেই \"নামপাই অ্যারে\" হতে হবে। \n",
|
||
"(আমাদের দুটো ফিচারই আছে \"নামপাই অ্যারে\"তে, বাকি ডাটা ডাটাসেট দরকার হলে সেটাকেও লোড করে নিতে হবে \"নামপাই অ্যারে\"তে)\n",
|
||
"\n",
|
||
"৪. \"ফিচার\" এবং \"রেসপন্স\" দুটোকেই স্পেসিফিক shape হতে হবে \n",
|
||
"\n",
|
||
"* ১৫০ x ৪ -> পুরো ডাটাসেট \n",
|
||
"* ১৫০ x ১ টার্গেটের জন্য \n",
|
||
"* ৪ x ১ ফিচারের জন্য \n",
|
||
"* আমরা ইচ্ছা করলে যেকোন ম্যাট্রিক্স পাল্টে নিতে পারি আমাদের দরকার মতো। যেমন np.tile(a, [4, 1]), মানে a হচ্ছে ম্যাট্রিক্স আর [4, 1] হচ্ছে ইনডেন্ট ম্যাট্রিক্স আরেক ডাইমেনশনে। "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# ফিচার ম্যাট্রিক্স স্টোর করছি বড় \"X\"এ, মনে আছে f(x)=y কথা? x ইনপুট হলে y আউটপুট \n",
|
||
"X = iris.data\n",
|
||
"\n",
|
||
"# রেসপন্স ভেক্টর রাখছি \"y\" তে \n",
|
||
"y = iris.target"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([[ 5.1, 3.5, 1.4, 0.2],\n",
|
||
" [ 4.9, 3. , 1.4, 0.2],\n",
|
||
" [ 4.7, 3.2, 1.3, 0.2],\n",
|
||
" [ 4.6, 3.1, 1.5, 0.2],\n",
|
||
" [ 5. , 3.6, 1.4, 0.2],\n",
|
||
" [ 5.4, 3.9, 1.7, 0.4],\n",
|
||
" [ 4.6, 3.4, 1.4, 0.3],\n",
|
||
" [ 5. , 3.4, 1.5, 0.2],\n",
|
||
" [ 4.4, 2.9, 1.4, 0.2],\n",
|
||
" [ 4.9, 3.1, 1.5, 0.1],\n",
|
||
" [ 5.4, 3.7, 1.5, 0.2],\n",
|
||
" [ 4.8, 3.4, 1.6, 0.2],\n",
|
||
" [ 4.8, 3. , 1.4, 0.1],\n",
|
||
" [ 4.3, 3. , 1.1, 0.1],\n",
|
||
" [ 5.8, 4. , 1.2, 0.2],\n",
|
||
" [ 5.7, 4.4, 1.5, 0.4],\n",
|
||
" [ 5.4, 3.9, 1.3, 0.4],\n",
|
||
" [ 5.1, 3.5, 1.4, 0.3],\n",
|
||
" [ 5.7, 3.8, 1.7, 0.3],\n",
|
||
" [ 5.1, 3.8, 1.5, 0.3],\n",
|
||
" [ 5.4, 3.4, 1.7, 0.2],\n",
|
||
" [ 5.1, 3.7, 1.5, 0.4],\n",
|
||
" [ 4.6, 3.6, 1. , 0.2],\n",
|
||
" [ 5.1, 3.3, 1.7, 0.5],\n",
|
||
" [ 4.8, 3.4, 1.9, 0.2],\n",
|
||
" [ 5. , 3. , 1.6, 0.2],\n",
|
||
" [ 5. , 3.4, 1.6, 0.4],\n",
|
||
" [ 5.2, 3.5, 1.5, 0.2],\n",
|
||
" [ 5.2, 3.4, 1.4, 0.2],\n",
|
||
" [ 4.7, 3.2, 1.6, 0.2],\n",
|
||
" [ 4.8, 3.1, 1.6, 0.2],\n",
|
||
" [ 5.4, 3.4, 1.5, 0.4],\n",
|
||
" [ 5.2, 4.1, 1.5, 0.1],\n",
|
||
" [ 5.5, 4.2, 1.4, 0.2],\n",
|
||
" [ 4.9, 3.1, 1.5, 0.1],\n",
|
||
" [ 5. , 3.2, 1.2, 0.2],\n",
|
||
" [ 5.5, 3.5, 1.3, 0.2],\n",
|
||
" [ 4.9, 3.1, 1.5, 0.1],\n",
|
||
" [ 4.4, 3. , 1.3, 0.2],\n",
|
||
" [ 5.1, 3.4, 1.5, 0.2],\n",
|
||
" [ 5. , 3.5, 1.3, 0.3],\n",
|
||
" [ 4.5, 2.3, 1.3, 0.3],\n",
|
||
" [ 4.4, 3.2, 1.3, 0.2],\n",
|
||
" [ 5. , 3.5, 1.6, 0.6],\n",
|
||
" [ 5.1, 3.8, 1.9, 0.4],\n",
|
||
" [ 4.8, 3. , 1.4, 0.3],\n",
|
||
" [ 5.1, 3.8, 1.6, 0.2],\n",
|
||
" [ 4.6, 3.2, 1.4, 0.2],\n",
|
||
" [ 5.3, 3.7, 1.5, 0.2],\n",
|
||
" [ 5. , 3.3, 1.4, 0.2],\n",
|
||
" [ 7. , 3.2, 4.7, 1.4],\n",
|
||
" [ 6.4, 3.2, 4.5, 1.5],\n",
|
||
" [ 6.9, 3.1, 4.9, 1.5],\n",
|
||
" [ 5.5, 2.3, 4. , 1.3],\n",
|
||
" [ 6.5, 2.8, 4.6, 1.5],\n",
|
||
" [ 5.7, 2.8, 4.5, 1.3],\n",
|
||
" [ 6.3, 3.3, 4.7, 1.6],\n",
|
||
" [ 4.9, 2.4, 3.3, 1. ],\n",
|
||
" [ 6.6, 2.9, 4.6, 1.3],\n",
|
||
" [ 5.2, 2.7, 3.9, 1.4],\n",
|
||
" [ 5. , 2. , 3.5, 1. ],\n",
|
||
" [ 5.9, 3. , 4.2, 1.5],\n",
|
||
" [ 6. , 2.2, 4. , 1. ],\n",
|
||
" [ 6.1, 2.9, 4.7, 1.4],\n",
|
||
" [ 5.6, 2.9, 3.6, 1.3],\n",
|
||
" [ 6.7, 3.1, 4.4, 1.4],\n",
|
||
" [ 5.6, 3. , 4.5, 1.5],\n",
|
||
" [ 5.8, 2.7, 4.1, 1. ],\n",
|
||
" [ 6.2, 2.2, 4.5, 1.5],\n",
|
||
" [ 5.6, 2.5, 3.9, 1.1],\n",
|
||
" [ 5.9, 3.2, 4.8, 1.8],\n",
|
||
" [ 6.1, 2.8, 4. , 1.3],\n",
|
||
" [ 6.3, 2.5, 4.9, 1.5],\n",
|
||
" [ 6.1, 2.8, 4.7, 1.2],\n",
|
||
" [ 6.4, 2.9, 4.3, 1.3],\n",
|
||
" [ 6.6, 3. , 4.4, 1.4],\n",
|
||
" [ 6.8, 2.8, 4.8, 1.4],\n",
|
||
" [ 6.7, 3. , 5. , 1.7],\n",
|
||
" [ 6. , 2.9, 4.5, 1.5],\n",
|
||
" [ 5.7, 2.6, 3.5, 1. ],\n",
|
||
" [ 5.5, 2.4, 3.8, 1.1],\n",
|
||
" [ 5.5, 2.4, 3.7, 1. ],\n",
|
||
" [ 5.8, 2.7, 3.9, 1.2],\n",
|
||
" [ 6. , 2.7, 5.1, 1.6],\n",
|
||
" [ 5.4, 3. , 4.5, 1.5],\n",
|
||
" [ 6. , 3.4, 4.5, 1.6],\n",
|
||
" [ 6.7, 3.1, 4.7, 1.5],\n",
|
||
" [ 6.3, 2.3, 4.4, 1.3],\n",
|
||
" [ 5.6, 3. , 4.1, 1.3],\n",
|
||
" [ 5.5, 2.5, 4. , 1.3],\n",
|
||
" [ 5.5, 2.6, 4.4, 1.2],\n",
|
||
" [ 6.1, 3. , 4.6, 1.4],\n",
|
||
" [ 5.8, 2.6, 4. , 1.2],\n",
|
||
" [ 5. , 2.3, 3.3, 1. ],\n",
|
||
" [ 5.6, 2.7, 4.2, 1.3],\n",
|
||
" [ 5.7, 3. , 4.2, 1.2],\n",
|
||
" [ 5.7, 2.9, 4.2, 1.3],\n",
|
||
" [ 6.2, 2.9, 4.3, 1.3],\n",
|
||
" [ 5.1, 2.5, 3. , 1.1],\n",
|
||
" [ 5.7, 2.8, 4.1, 1.3],\n",
|
||
" [ 6.3, 3.3, 6. , 2.5],\n",
|
||
" [ 5.8, 2.7, 5.1, 1.9],\n",
|
||
" [ 7.1, 3. , 5.9, 2.1],\n",
|
||
" [ 6.3, 2.9, 5.6, 1.8],\n",
|
||
" [ 6.5, 3. , 5.8, 2.2],\n",
|
||
" [ 7.6, 3. , 6.6, 2.1],\n",
|
||
" [ 4.9, 2.5, 4.5, 1.7],\n",
|
||
" [ 7.3, 2.9, 6.3, 1.8],\n",
|
||
" [ 6.7, 2.5, 5.8, 1.8],\n",
|
||
" [ 7.2, 3.6, 6.1, 2.5],\n",
|
||
" [ 6.5, 3.2, 5.1, 2. ],\n",
|
||
" [ 6.4, 2.7, 5.3, 1.9],\n",
|
||
" [ 6.8, 3. , 5.5, 2.1],\n",
|
||
" [ 5.7, 2.5, 5. , 2. ],\n",
|
||
" [ 5.8, 2.8, 5.1, 2.4],\n",
|
||
" [ 6.4, 3.2, 5.3, 2.3],\n",
|
||
" [ 6.5, 3. , 5.5, 1.8],\n",
|
||
" [ 7.7, 3.8, 6.7, 2.2],\n",
|
||
" [ 7.7, 2.6, 6.9, 2.3],\n",
|
||
" [ 6. , 2.2, 5. , 1.5],\n",
|
||
" [ 6.9, 3.2, 5.7, 2.3],\n",
|
||
" [ 5.6, 2.8, 4.9, 2. ],\n",
|
||
" [ 7.7, 2.8, 6.7, 2. ],\n",
|
||
" [ 6.3, 2.7, 4.9, 1.8],\n",
|
||
" [ 6.7, 3.3, 5.7, 2.1],\n",
|
||
" [ 7.2, 3.2, 6. , 1.8],\n",
|
||
" [ 6.2, 2.8, 4.8, 1.8],\n",
|
||
" [ 6.1, 3. , 4.9, 1.8],\n",
|
||
" [ 6.4, 2.8, 5.6, 2.1],\n",
|
||
" [ 7.2, 3. , 5.8, 1.6],\n",
|
||
" [ 7.4, 2.8, 6.1, 1.9],\n",
|
||
" [ 7.9, 3.8, 6.4, 2. ],\n",
|
||
" [ 6.4, 2.8, 5.6, 2.2],\n",
|
||
" [ 6.3, 2.8, 5.1, 1.5],\n",
|
||
" [ 6.1, 2.6, 5.6, 1.4],\n",
|
||
" [ 7.7, 3. , 6.1, 2.3],\n",
|
||
" [ 6.3, 3.4, 5.6, 2.4],\n",
|
||
" [ 6.4, 3.1, 5.5, 1.8],\n",
|
||
" [ 6. , 3. , 4.8, 1.8],\n",
|
||
" [ 6.9, 3.1, 5.4, 2.1],\n",
|
||
" [ 6.7, 3.1, 5.6, 2.4],\n",
|
||
" [ 6.9, 3.1, 5.1, 2.3],\n",
|
||
" [ 5.8, 2.7, 5.1, 1.9],\n",
|
||
" [ 6.8, 3.2, 5.9, 2.3],\n",
|
||
" [ 6.7, 3.3, 5.7, 2.5],\n",
|
||
" [ 6.7, 3. , 5.2, 2.3],\n",
|
||
" [ 6.3, 2.5, 5. , 1.9],\n",
|
||
" [ 6.5, 3. , 5.2, 2. ],\n",
|
||
" [ 6.2, 3.4, 5.4, 2.3],\n",
|
||
" [ 5.9, 3. , 5.1, 1.8]])"
|
||
]
|
||
},
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"X"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
|
||
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
|
||
" 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
|
||
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
|
||
" 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
|
||
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
|
||
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])"
|
||
]
|
||
},
|
||
"execution_count": 53,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"y"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.6.3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|