Iris-Species

This commit is contained in:
Md Imam Ahasan 2021-11-05 02:15:02 +06:00 committed by GitHub
parent 5b3c00d462
commit 1fe550b7cd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 8045 additions and 0 deletions

View File

@ -0,0 +1,450 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# কোডে প্রথম মডেল এবং প্রেডিকশন "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এই পুরো জুপিটার স্ক্রিপ্টটা পাওয়া যাবে এই লিংকে \n",
"https://github.com/raqueeb/ml-python/blob/master/1st-model.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এস্টিমেটরের কাজের ধাপের পুরো কোড এখানে। না বুঝলে আবার ফিরে যান \"এস্টিমেটরের কাজের ধাপ\" চ্যাপ্টারে। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"মনে রাখুন এই ধাপগুলো, দরকার হবে সবসময় "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"প্রথমে কিছু লাইব্রেরি ইমপোর্ট করে নেই "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ডাটাসেটগুলো ইমপোর্ট করে নিয়ে আসি "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn import datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"আইরিস ডাটাসেট লোড করে নেই "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"iris = datasets.load_iris()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ফিচার ম্যাট্রিক্স স্টোর করছি বড় \"X\"এ, রেসপন্স ভেক্টর রাখছি \"y\" তে "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X = iris.data\n",
"y = iris.target"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ক্লাসিফায়ার ইমপোর্ট করে নিয়ে আসছি "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsClassifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"আমাদের নেইবার সংখ্যা ১"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"knn = KNeighborsClassifier(n_neighbors=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"প্রথম মডেল তৈরি "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=1, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knn.fit(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"প্রথম প্রেডিকশন, আমাদের ডাটাসেটের বাইরের ডাটা দিয়ে "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knn.predict([[3, 5, 4, 2]])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted target name: ['virginica']\n"
]
}
],
"source": [
"print(\"Predicted target name:\",\n",
" iris['target_names'][knn.predict([[3, 5, 4, 2]])])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"অথবা আমরা এভাবে করতে পারি, আপনার মতো করে তৈরি করুন ইচ্ছেমতো "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_new.shape: (1, 4)\n"
]
}
],
"source": [
"X_new = np.array([[3, 5, 4, 2]])\n",
"print(\"X_new.shape:\", X_new.shape)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mypredict = knn.predict(X_new)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction: [2]\n",
"Predicted target name: ['virginica']\n"
]
}
],
"source": [
"print(\"Prediction:\", mypredict)\n",
"print(\"Predicted target name:\",\n",
" iris['target_names'][mypredict])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ধাপগুলো নিয়ে ধারণা পরিষ্কার তো? এখন যদি আমরা দুটো \"আউট অফ স্যাম্পল\" ডেটা নিয়ে কাজ করতাম, তাহলে কি করতাম আমরা?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 1])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_new = [[3, 5, 4, 2], [5, 4, 3, 2]]\n",
"knn.predict(X_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এখানে আমরা ব্যবহার করেছি \"কে নিয়ারেস্ট নেইবার্স\" ক্লাসিফায়ার। আচ্ছা, আমাদের যদি নেইবার ৩ হয়? তাহলে আগের সিস্টেমে পাল্টে দিলাম n_neighbors=3"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"knn = KNeighborsClassifier(n_neighbors=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"মডেলে ফিট করি ডাটা "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=3, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knn.fit(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"প্রেডিক্ট করি আগের ভ্যালুগুলোকে "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 1])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knn.predict(X_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"দেখেছেন কী অবস্থা? পাল্টে গেছে প্রেডিকশন ভ্যালু। নিশ্চয়ই ক্লাসিফায়ারের কোন ভ্যালুতে মডেল ভালো কাজ করবে সেটা জানলে ব্যাপারটা আরো ভালো হতো। সেটা জানতেই তো এতো গল্প। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"কেমন হয় অন্য ক্লাসিফায়ার দিয়ে দেখলে? \"লিনিয়ার রিগ্রেশন\" কাজ করে কন্টিনিউয়াস ভাল্যুর (যেমন, আমাদের বয়স বা বেতন) ওপর। সে হিসেবে লজিস্টিক (হ্যাঁ অথবা না, তিন ক্যাটেগরির ফুল) রিগ্রেশন ব্যবহার করা যেতে পারে এখানে। আগের মতোই একই জিনিস করবো আমরা। মডেল হিসেবে ব্যবহার করবো LogisticRegressionকে। সেটা ইমপোর্ট হবে sklearn.linear_model মডিউল থেকে। LogisticRegression() ক্লাসকে পাঠিয়ে দিচ্ছি lr অবজেক্ট। আপনার ঈচ্ছেমতো নাম দিন এই অবজেক্ট হিসেবে। "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 0])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"lr = LogisticRegression()\n",
"lr.fit(X, y)\n",
"lr.predict(X_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"দেখেছেন কী অবস্থা? পাল্টে গেছে প্রেডিকশন। এর মানে হচ্ছে আমাদের ব্যবহৃত ক্লাসিফায়ারগুলোর কাজের মধ্যে অনেক ফারাক আছে। সেকারণে আউটকামও ভিন্ন। কোন কাজে কোন ক্লাসিফায়ার ভালো, সেটার ধারণায় আসবে আস্তে আস্তে।"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,182 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Age</th>\n",
" <th>Location</th>\n",
" <th>Name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24</td>\n",
" <td>রাজশাহী</td>\n",
" <td>জসিম</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>13</td>\n",
" <td>ঢাকা</td>\n",
" <td>করিম</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>53</td>\n",
" <td>রংপুর</td>\n",
" <td>মিতা</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33</td>\n",
" <td>কুষ্টিয়া</td>\n",
" <td>অন্তরা</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Age Location Name\n",
"0 24 রাজশাহী জসিম\n",
"1 13 ঢাকা করিম\n",
"2 53 রংপুর মিতা\n",
"3 33 কুষ্টিয়া অন্তরা"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"\n",
"# create a simple dataset of people\n",
"data = {'Name': [\"জসিম\", \"করিম\", \"মিতা\", \"অন্তরা\"],\n",
" 'Location' : [\"রাজশাহী\", \"ঢাকা\", \"রংপুর\", \"কুষ্টিয়া\"],\n",
" 'Age' : [24, 13, 53, 33]\n",
" }\n",
"\n",
"frame = pd.DataFrame(data)\n",
"# ডেটাফ্রেম দেখলেই আপনার মন ভালো হয়ে যাবে, একদম এক্সেল \n",
"display(frame)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"কোয়েরি চালাই একটু "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Age</th>\n",
" <th>Location</th>\n",
" <th>Name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>53</td>\n",
" <td>রংপুর</td>\n",
" <td>মিতা</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33</td>\n",
" <td>কুষ্টিয়া</td>\n",
" <td>অন্তরা</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Age Location Name\n",
"2 53 রংপুর মিতা\n",
"3 33 কুষ্টিয়া অন্তরা"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(frame[frame.Age > 30])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,202 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "dimension-feature.ipynb",
"version": "0.3.2",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/raqueeb/ml-python/blob/master/dimension_feature.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"metadata": {
"id": "JUpH47tG97Kb",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# ডাইমেনশনালিটি রিডাকশন, ফীচার সিলেকশন, ফীচার ইম্পর্ট্যান্স\n",
"\n",
"রিয়েল ওয়ার্ল্ড সিনারিওতে যে কোন মেশিন লার্নিং সমস্যা আরো বেশি ঝামেলায় পড়ে - যখন তার ফিচার সংখ্যা অনেক হয়ে যায়। ফিচার সংখ্যা অনেক হওয়া সমস্যা নয়, সমস্যা হচ্ছে ফিচারগুলোর ভেতরে সব ফিচার কিন্তু মডেলের পারফরম্যান্স ভালো করে না। এছাড়াও এতো এতো ফিচার নিয়ে মডেলকে ট্রেনিং করানোটাও অনেক সময় সাপেক্ষ ব্যাপার। আবার ট্রেনিং করালাম, কিন্তু আউটকাম যা আশা করেছিলাম সেটা হলো না, তখন পুরো মডেলটাই বিপদে পড়ে। এই সমস্যাটা নাম হচ্ছে “দ্য কার্স অফ ডাইমেনশনালিটি।” অর্থাৎ বেশি ডাইমেনশনের বিপদ। সেজন্য দরকার ওই ফীচারগুলো, যা মডেল পারফরম্যান্সে সবচেয়ে বেশি ‘কন্ট্রিবিউট’ করে। \n",
"\n",
"এই সমস্যা থেকে উদ্ধার পাবার উপায় কি? সোজা হিসেবে বলা যায় ফিচার সিলেকশন এবং ফিচার ইম্পর্টেন্স। আমি অন্য গল্পে গেলাম না, কারণ এটা একটা বেসিক ধারণার বই। আমরা যদি দরকারি ফিচারগুলোকে ঠিকমতো সিলেক্ট করতে পারি তাদের ইম্পর্টেন্স অনুযায়ী, তাহলে কিন্তু ঝামেলা অনেকটাই কমে যায়। এই যে ধরুন, আইরিস ডেটাসেটে চারটা ফিচার। (রিয়েল ওয়ার্ল্ড সমস্যায় মিলিয়ন ফিচার নিয়ে কাজ করা এখন ‘কমনপ্লেস’ হয়ে যাচ্ছে)। এই চারটা ফিচারের মধ্যে কোন ফিচারগুলো আসলে আমাদের মডেলকে ভালো পারফর্মেন্স বুষ্ট দেবে, সেটা একটু দেখে আসি। \n",
"\n",
"মনে আছে, ডিসিশন ট্রি’র ছবিটার কথা? ইম্পরট্যান্ট ফীচারগুলো কিন্তু ডিসিশন ট্রি’র রুট নোডের আশপাশেই থাকে। ডেপ্থ , ১ এবং ২তে পেটাল দৈর্ঘ্যের জয়জয়কার। ডেপ্থগুলোকে গড় করলেই বোঝা যাবে। এর পাশাপাশি feature_importances_অ্যাট্রিবিউটের এর কাজ দেখে আসি। \n"
]
},
{
"metadata": {
"id": "lrdRtIsFE99V",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "5987f195-22c2-4749-c2ed-286316a9adc6"
},
"cell_type": "code",
"source": [
"from sklearn.datasets import load_iris\n",
"iris = load_iris()\n",
"X, y = iris.data, iris.target\n",
"X.shape"
],
"execution_count": 1,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(150, 4)"
]
},
"metadata": {
"tags": []
},
"execution_count": 1
}
]
},
{
"metadata": {
"id": "dpVHgr88GqgG",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "56f2bf03-b719-4a67-b42e-c89aecd440fe"
},
"cell_type": "code",
"source": [
"from sklearn.tree import DecisionTreeClassifier\n",
"tree_clf = DecisionTreeClassifier(max_depth=2, random_state=42)\n",
"tree_clf.fit(X, y)\n",
"tree_clf.feature_importances_ "
],
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([0. , 0. , 0.56199095, 0.43800905])"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"metadata": {
"id": "FkREKL0ZICu_",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"বোঝা যাচ্ছে চারটার মধ্যে দুটো ইম্পর্ট্যান্ট। জানা যাবে কোন দুটো?"
]
},
{
"metadata": {
"id": "G9h12FEAIkYL",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 85
},
"outputId": "bec34dbe-ddce-4d37-999c-c99e28ede257"
},
"cell_type": "code",
"source": [
"for name, score in zip(iris[\"feature_names\"], tree_clf.feature_importances_):\n",
" print(name, score)"
],
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": [
"sepal length (cm) 0.0\n",
"sepal width (cm) 0.0\n",
"petal length (cm) 0.5619909502262443\n",
"petal width (cm) 0.4380090497737556\n"
],
"name": "stdout"
}
]
},
{
"metadata": {
"id": "uIV6_W3uIvDI",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"শেষের দুটো। এর মধ্যে পেটাল দৈর্ঘ্যের মান বেশি। "
]
},
{
"metadata": {
"id": "bTdaApw0HuFR",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "4a391bdd-a3b3-4e88-ea2a-5a3f451a493f"
},
"cell_type": "code",
"source": [
"from sklearn.feature_selection import SelectFromModel\n",
"model = SelectFromModel(tree_clf, prefit=True)\n",
"X_new = model.transform(X)\n",
"X_new.shape "
],
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(150, 2)"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"metadata": {
"id": "MaXc85SeJdw8",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"শেষমেশ চারটার মধ্যে দুটোই আমাদের দরকারি ফিচার। সাধারণতঃ চারটা ফিচার থেকে কমাইনা আমরা। পরীক্ষা করে দেখলাম। "
]
}
]
}

View File

@ -0,0 +1,832 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## এক্সপ্লোরেটরি ডেটা অ্যানালাইসিস \n",
"রিভিশন "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"আসলে আমাদের ডেটার ভেতরে কী আছে সেটা না জানলে এর থেকে প্রেডিকশন বের করবো কী করে? সেকারণে এই এক্সপ্লোরেশন। ডেটা নিয়ে একটু ঘাঁটাঘাঁটি করলে এর ভেতরের অনেক ধারণা পাওয়া যায় যেটা মডেল সিলেকশন অথবা ফীচারগুলো বুঝতে সুবিধা হয়। আগের চ্যাপ্টারের ভেতরে কিছুটা \"এক্সপ্লোরেটরি ডেটা অ্যানালাইসিস\" করলেও এখানে সেটাকে আরেকটু খোলাসা করছি। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ডাটার শেপ, মানে কতোটা ইনস্ট্যান্স?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"n_samples, n_features = iris.data.shape"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"150"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n_samples"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n_features"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of data: (150, 4)\n"
]
}
],
"source": [
"print(\"Shape of data:\", iris['data'].shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"কোন ডাটা মিসিং নেই "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(iris.target) == n_samples"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"<img src=\"assets/data5.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ফিচারগুলোর নাম \n",
"\n",
"ওপরের ছবিতে চারটা ফিচারের নাম দেখেছি। চলুন দেখি সেগুলো আমাদের ডাটাসেট অবজেক্টে। iris এর পর ডট নোটেশন ব্যবহার করে ডাকি একটা \"কী\" ভ্যালুকে। feature_names হচ্ছে আমাদের iris.keys() থেকে পাওয়া একটা অ্যাট্রিবিউট।"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['sepal length (cm)',\n",
" 'sepal width (cm)',\n",
" 'petal length (cm)',\n",
" 'petal width (cm)']"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.feature_names"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
]
}
],
"source": [
"print(iris['feature_names'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### টার্গেট অর্থাৎ কী প্রেডিক্ট করতে চাই আমরা?\n",
"\n",
"অনেকভাবেই করা সম্ভব। তবে print ফরম্যাটিং এ ভালো কাজ করে। "
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['setosa', 'versicolor', 'virginica'],\n",
" dtype='<U10')"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.target_names"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['setosa' 'versicolor' 'virginica']\n"
]
}
],
"source": [
"print(iris.target_names)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['setosa', 'versicolor', 'virginica']"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(iris.target_names)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Target names: ['setosa' 'versicolor' 'virginica']\n"
]
}
],
"source": [
"print(\"Target names:\", iris['target_names'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### কি আছে ডাটা অ্যারে আর টার্গেট অ্যারে এর ভেতর?\n",
"\n",
"এখানে অ্যারে নিয়ে কাজ হচ্ছে। iris.dataতে সেই চারটা ১. পেটাল দৈর্ঘ্য, ২. পেটাল প্রস্থ, ৩. সিপাল দৈর্ঘ্য, . সিপাল প্রস্থ মাপগুলো পাশাপাশি দেয়া আছে। শুরুতে দেখি প্রথম রেকর্ড। এরপর পুরো রেকর্ড। "
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 5.1, 3.5, 1.4, 0.2])"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.data[0]"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 5.1, 3.5, 1.4, 0.2],\n",
" [ 4.9, 3. , 1.4, 0.2],\n",
" [ 4.7, 3.2, 1.3, 0.2],\n",
" [ 4.6, 3.1, 1.5, 0.2],\n",
" [ 5. , 3.6, 1.4, 0.2],\n",
" [ 5.4, 3.9, 1.7, 0.4],\n",
" [ 4.6, 3.4, 1.4, 0.3],\n",
" [ 5. , 3.4, 1.5, 0.2],\n",
" [ 4.4, 2.9, 1.4, 0.2],\n",
" [ 4.9, 3.1, 1.5, 0.1],\n",
" [ 5.4, 3.7, 1.5, 0.2],\n",
" [ 4.8, 3.4, 1.6, 0.2],\n",
" [ 4.8, 3. , 1.4, 0.1],\n",
" [ 4.3, 3. , 1.1, 0.1],\n",
" [ 5.8, 4. , 1.2, 0.2],\n",
" [ 5.7, 4.4, 1.5, 0.4],\n",
" [ 5.4, 3.9, 1.3, 0.4],\n",
" [ 5.1, 3.5, 1.4, 0.3],\n",
" [ 5.7, 3.8, 1.7, 0.3],\n",
" [ 5.1, 3.8, 1.5, 0.3],\n",
" [ 5.4, 3.4, 1.7, 0.2],\n",
" [ 5.1, 3.7, 1.5, 0.4],\n",
" [ 4.6, 3.6, 1. , 0.2],\n",
" [ 5.1, 3.3, 1.7, 0.5],\n",
" [ 4.8, 3.4, 1.9, 0.2],\n",
" [ 5. , 3. , 1.6, 0.2],\n",
" [ 5. , 3.4, 1.6, 0.4],\n",
" [ 5.2, 3.5, 1.5, 0.2],\n",
" [ 5.2, 3.4, 1.4, 0.2],\n",
" [ 4.7, 3.2, 1.6, 0.2],\n",
" [ 4.8, 3.1, 1.6, 0.2],\n",
" [ 5.4, 3.4, 1.5, 0.4],\n",
" [ 5.2, 4.1, 1.5, 0.1],\n",
" [ 5.5, 4.2, 1.4, 0.2],\n",
" [ 4.9, 3.1, 1.5, 0.1],\n",
" [ 5. , 3.2, 1.2, 0.2],\n",
" [ 5.5, 3.5, 1.3, 0.2],\n",
" [ 4.9, 3.1, 1.5, 0.1],\n",
" [ 4.4, 3. , 1.3, 0.2],\n",
" [ 5.1, 3.4, 1.5, 0.2],\n",
" [ 5. , 3.5, 1.3, 0.3],\n",
" [ 4.5, 2.3, 1.3, 0.3],\n",
" [ 4.4, 3.2, 1.3, 0.2],\n",
" [ 5. , 3.5, 1.6, 0.6],\n",
" [ 5.1, 3.8, 1.9, 0.4],\n",
" [ 4.8, 3. , 1.4, 0.3],\n",
" [ 5.1, 3.8, 1.6, 0.2],\n",
" [ 4.6, 3.2, 1.4, 0.2],\n",
" [ 5.3, 3.7, 1.5, 0.2],\n",
" [ 5. , 3.3, 1.4, 0.2],\n",
" [ 7. , 3.2, 4.7, 1.4],\n",
" [ 6.4, 3.2, 4.5, 1.5],\n",
" [ 6.9, 3.1, 4.9, 1.5],\n",
" [ 5.5, 2.3, 4. , 1.3],\n",
" [ 6.5, 2.8, 4.6, 1.5],\n",
" [ 5.7, 2.8, 4.5, 1.3],\n",
" [ 6.3, 3.3, 4.7, 1.6],\n",
" [ 4.9, 2.4, 3.3, 1. ],\n",
" [ 6.6, 2.9, 4.6, 1.3],\n",
" [ 5.2, 2.7, 3.9, 1.4],\n",
" [ 5. , 2. , 3.5, 1. ],\n",
" [ 5.9, 3. , 4.2, 1.5],\n",
" [ 6. , 2.2, 4. , 1. ],\n",
" [ 6.1, 2.9, 4.7, 1.4],\n",
" [ 5.6, 2.9, 3.6, 1.3],\n",
" [ 6.7, 3.1, 4.4, 1.4],\n",
" [ 5.6, 3. , 4.5, 1.5],\n",
" [ 5.8, 2.7, 4.1, 1. ],\n",
" [ 6.2, 2.2, 4.5, 1.5],\n",
" [ 5.6, 2.5, 3.9, 1.1],\n",
" [ 5.9, 3.2, 4.8, 1.8],\n",
" [ 6.1, 2.8, 4. , 1.3],\n",
" [ 6.3, 2.5, 4.9, 1.5],\n",
" [ 6.1, 2.8, 4.7, 1.2],\n",
" [ 6.4, 2.9, 4.3, 1.3],\n",
" [ 6.6, 3. , 4.4, 1.4],\n",
" [ 6.8, 2.8, 4.8, 1.4],\n",
" [ 6.7, 3. , 5. , 1.7],\n",
" [ 6. , 2.9, 4.5, 1.5],\n",
" [ 5.7, 2.6, 3.5, 1. ],\n",
" [ 5.5, 2.4, 3.8, 1.1],\n",
" [ 5.5, 2.4, 3.7, 1. ],\n",
" [ 5.8, 2.7, 3.9, 1.2],\n",
" [ 6. , 2.7, 5.1, 1.6],\n",
" [ 5.4, 3. , 4.5, 1.5],\n",
" [ 6. , 3.4, 4.5, 1.6],\n",
" [ 6.7, 3.1, 4.7, 1.5],\n",
" [ 6.3, 2.3, 4.4, 1.3],\n",
" [ 5.6, 3. , 4.1, 1.3],\n",
" [ 5.5, 2.5, 4. , 1.3],\n",
" [ 5.5, 2.6, 4.4, 1.2],\n",
" [ 6.1, 3. , 4.6, 1.4],\n",
" [ 5.8, 2.6, 4. , 1.2],\n",
" [ 5. , 2.3, 3.3, 1. ],\n",
" [ 5.6, 2.7, 4.2, 1.3],\n",
" [ 5.7, 3. , 4.2, 1.2],\n",
" [ 5.7, 2.9, 4.2, 1.3],\n",
" [ 6.2, 2.9, 4.3, 1.3],\n",
" [ 5.1, 2.5, 3. , 1.1],\n",
" [ 5.7, 2.8, 4.1, 1.3],\n",
" [ 6.3, 3.3, 6. , 2.5],\n",
" [ 5.8, 2.7, 5.1, 1.9],\n",
" [ 7.1, 3. , 5.9, 2.1],\n",
" [ 6.3, 2.9, 5.6, 1.8],\n",
" [ 6.5, 3. , 5.8, 2.2],\n",
" [ 7.6, 3. , 6.6, 2.1],\n",
" [ 4.9, 2.5, 4.5, 1.7],\n",
" [ 7.3, 2.9, 6.3, 1.8],\n",
" [ 6.7, 2.5, 5.8, 1.8],\n",
" [ 7.2, 3.6, 6.1, 2.5],\n",
" [ 6.5, 3.2, 5.1, 2. ],\n",
" [ 6.4, 2.7, 5.3, 1.9],\n",
" [ 6.8, 3. , 5.5, 2.1],\n",
" [ 5.7, 2.5, 5. , 2. ],\n",
" [ 5.8, 2.8, 5.1, 2.4],\n",
" [ 6.4, 3.2, 5.3, 2.3],\n",
" [ 6.5, 3. , 5.5, 1.8],\n",
" [ 7.7, 3.8, 6.7, 2.2],\n",
" [ 7.7, 2.6, 6.9, 2.3],\n",
" [ 6. , 2.2, 5. , 1.5],\n",
" [ 6.9, 3.2, 5.7, 2.3],\n",
" [ 5.6, 2.8, 4.9, 2. ],\n",
" [ 7.7, 2.8, 6.7, 2. ],\n",
" [ 6.3, 2.7, 4.9, 1.8],\n",
" [ 6.7, 3.3, 5.7, 2.1],\n",
" [ 7.2, 3.2, 6. , 1.8],\n",
" [ 6.2, 2.8, 4.8, 1.8],\n",
" [ 6.1, 3. , 4.9, 1.8],\n",
" [ 6.4, 2.8, 5.6, 2.1],\n",
" [ 7.2, 3. , 5.8, 1.6],\n",
" [ 7.4, 2.8, 6.1, 1.9],\n",
" [ 7.9, 3.8, 6.4, 2. ],\n",
" [ 6.4, 2.8, 5.6, 2.2],\n",
" [ 6.3, 2.8, 5.1, 1.5],\n",
" [ 6.1, 2.6, 5.6, 1.4],\n",
" [ 7.7, 3. , 6.1, 2.3],\n",
" [ 6.3, 3.4, 5.6, 2.4],\n",
" [ 6.4, 3.1, 5.5, 1.8],\n",
" [ 6. , 3. , 4.8, 1.8],\n",
" [ 6.9, 3.1, 5.4, 2.1],\n",
" [ 6.7, 3.1, 5.6, 2.4],\n",
" [ 6.9, 3.1, 5.1, 2.3],\n",
" [ 5.8, 2.7, 5.1, 1.9],\n",
" [ 6.8, 3.2, 5.9, 2.3],\n",
" [ 6.7, 3.3, 5.7, 2.5],\n",
" [ 6.7, 3. , 5.2, 2.3],\n",
" [ 6.3, 2.5, 5. , 1.9],\n",
" [ 6.5, 3. , 5.2, 2. ],\n",
" [ 6.2, 3.4, 5.4, 2.3],\n",
" [ 5.9, 3. , 5.1, 1.8]])"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.data"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.target"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"আমাদের \"ফিচার\" আর \"রেসপন্স\" অর্থাৎ \"টার্গেট\" কি ধরণের কন্টেইনারে আছে, সেটা জানতে চাইলাম এখানে। ঠিক ধরেছেন। \"নামপাই অ্যারে\""
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'numpy.ndarray'>\n",
"<class 'numpy.ndarray'>\n"
]
}
],
"source": [
"print(type(iris.data))\n",
"print(type(iris.target))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ফিচারের ম্যাট্রিক্স কি? (১ম ডাইমেনশন = অবজার্ভেশনের সংখ্যা, ২য় = ফিচারের সংখ্যা)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(150, 4)\n"
]
}
],
"source": [
"print(iris.data.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"টার্গেট ম্যাট্রিক্স কি? (১ম ডাইমেনশন = লেবেল, টার্গেট, রেসপন্স)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(150,)\n"
]
}
],
"source": [
"print(iris.target.shape)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of target: (150,)\n"
]
}
],
"source": [
"print(\"Shape of target:\", iris['target'].shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### সাইকিট-লার্ন এ ডাটা হ্যান্ডলিং এর নিয়ম \n",
"\n",
"১. এখানে \"ফিচার\" এবং \"রেসপন্স\" দুটো আলাদা অবজেক্ট \n",
"(আমাদের এখানে দেখুন, \"ফিচার\" এবং \"রেসপন্স\" মানে \"টার্গেট\" আলাদা অবজেক্ট)\n",
"\n",
"২. \"ফিচার\" এবং \"রেসপন্স\" দুটোকেই সংখ্যা হতে হবে \n",
"(আমাদের এখানে দুটোই সংখ্যার, দুটোর ম্যাট্রিক্স ডাইমেনশন হচ্ছে (১৫০ x ) এবং (১৫০ x ১)\n",
"\n",
"৩. \"ফিচার\" এবং \"রেসপন্স\" দুটোকেই \"নামপাই অ্যারে\" হতে হবে। \n",
"(আমাদের দুটো ফিচারই আছে \"নামপাই অ্যারে\"তে, বাকি ডাটা ডাটাসেট দরকার হলে সেটাকেও লোড করে নিতে হবে \"নামপাই অ্যারে\"তে)\n",
"\n",
". \"ফিচার\" এবং \"রেসপন্স\" দুটোকেই স্পেসিফিক shape হতে হবে \n",
"\n",
"* ১৫০ x -> পুরো ডাটাসেট \n",
"* ১৫০ x ১ টার্গেটের জন্য \n",
"* x ১ ফিচারের জন্য \n",
"* আমরা ইচ্ছা করলে যেকোন ম্যাট্রিক্স পাল্টে নিতে পারি আমাদের দরকার মতো। যেমন np.tile(a, [4, 1]), মানে a হচ্ছে ম্যাট্রিক্স আর [4, 1] হচ্ছে ইনডেন্ট ম্যাট্রিক্স আরেক ডাইমেনশনে। "
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# ফিচার ম্যাট্রিক্স স্টোর করছি বড় \"X\"এ, মনে আছে f(x)=y কথা? x ইনপুট হলে y আউটপুট \n",
"X = iris.data\n",
"\n",
"# রেসপন্স ভেক্টর রাখছি \"y\" তে \n",
"y = iris.target"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 5.1, 3.5, 1.4, 0.2],\n",
" [ 4.9, 3. , 1.4, 0.2],\n",
" [ 4.7, 3.2, 1.3, 0.2],\n",
" [ 4.6, 3.1, 1.5, 0.2],\n",
" [ 5. , 3.6, 1.4, 0.2],\n",
" [ 5.4, 3.9, 1.7, 0.4],\n",
" [ 4.6, 3.4, 1.4, 0.3],\n",
" [ 5. , 3.4, 1.5, 0.2],\n",
" [ 4.4, 2.9, 1.4, 0.2],\n",
" [ 4.9, 3.1, 1.5, 0.1],\n",
" [ 5.4, 3.7, 1.5, 0.2],\n",
" [ 4.8, 3.4, 1.6, 0.2],\n",
" [ 4.8, 3. , 1.4, 0.1],\n",
" [ 4.3, 3. , 1.1, 0.1],\n",
" [ 5.8, 4. , 1.2, 0.2],\n",
" [ 5.7, 4.4, 1.5, 0.4],\n",
" [ 5.4, 3.9, 1.3, 0.4],\n",
" [ 5.1, 3.5, 1.4, 0.3],\n",
" [ 5.7, 3.8, 1.7, 0.3],\n",
" [ 5.1, 3.8, 1.5, 0.3],\n",
" [ 5.4, 3.4, 1.7, 0.2],\n",
" [ 5.1, 3.7, 1.5, 0.4],\n",
" [ 4.6, 3.6, 1. , 0.2],\n",
" [ 5.1, 3.3, 1.7, 0.5],\n",
" [ 4.8, 3.4, 1.9, 0.2],\n",
" [ 5. , 3. , 1.6, 0.2],\n",
" [ 5. , 3.4, 1.6, 0.4],\n",
" [ 5.2, 3.5, 1.5, 0.2],\n",
" [ 5.2, 3.4, 1.4, 0.2],\n",
" [ 4.7, 3.2, 1.6, 0.2],\n",
" [ 4.8, 3.1, 1.6, 0.2],\n",
" [ 5.4, 3.4, 1.5, 0.4],\n",
" [ 5.2, 4.1, 1.5, 0.1],\n",
" [ 5.5, 4.2, 1.4, 0.2],\n",
" [ 4.9, 3.1, 1.5, 0.1],\n",
" [ 5. , 3.2, 1.2, 0.2],\n",
" [ 5.5, 3.5, 1.3, 0.2],\n",
" [ 4.9, 3.1, 1.5, 0.1],\n",
" [ 4.4, 3. , 1.3, 0.2],\n",
" [ 5.1, 3.4, 1.5, 0.2],\n",
" [ 5. , 3.5, 1.3, 0.3],\n",
" [ 4.5, 2.3, 1.3, 0.3],\n",
" [ 4.4, 3.2, 1.3, 0.2],\n",
" [ 5. , 3.5, 1.6, 0.6],\n",
" [ 5.1, 3.8, 1.9, 0.4],\n",
" [ 4.8, 3. , 1.4, 0.3],\n",
" [ 5.1, 3.8, 1.6, 0.2],\n",
" [ 4.6, 3.2, 1.4, 0.2],\n",
" [ 5.3, 3.7, 1.5, 0.2],\n",
" [ 5. , 3.3, 1.4, 0.2],\n",
" [ 7. , 3.2, 4.7, 1.4],\n",
" [ 6.4, 3.2, 4.5, 1.5],\n",
" [ 6.9, 3.1, 4.9, 1.5],\n",
" [ 5.5, 2.3, 4. , 1.3],\n",
" [ 6.5, 2.8, 4.6, 1.5],\n",
" [ 5.7, 2.8, 4.5, 1.3],\n",
" [ 6.3, 3.3, 4.7, 1.6],\n",
" [ 4.9, 2.4, 3.3, 1. ],\n",
" [ 6.6, 2.9, 4.6, 1.3],\n",
" [ 5.2, 2.7, 3.9, 1.4],\n",
" [ 5. , 2. , 3.5, 1. ],\n",
" [ 5.9, 3. , 4.2, 1.5],\n",
" [ 6. , 2.2, 4. , 1. ],\n",
" [ 6.1, 2.9, 4.7, 1.4],\n",
" [ 5.6, 2.9, 3.6, 1.3],\n",
" [ 6.7, 3.1, 4.4, 1.4],\n",
" [ 5.6, 3. , 4.5, 1.5],\n",
" [ 5.8, 2.7, 4.1, 1. ],\n",
" [ 6.2, 2.2, 4.5, 1.5],\n",
" [ 5.6, 2.5, 3.9, 1.1],\n",
" [ 5.9, 3.2, 4.8, 1.8],\n",
" [ 6.1, 2.8, 4. , 1.3],\n",
" [ 6.3, 2.5, 4.9, 1.5],\n",
" [ 6.1, 2.8, 4.7, 1.2],\n",
" [ 6.4, 2.9, 4.3, 1.3],\n",
" [ 6.6, 3. , 4.4, 1.4],\n",
" [ 6.8, 2.8, 4.8, 1.4],\n",
" [ 6.7, 3. , 5. , 1.7],\n",
" [ 6. , 2.9, 4.5, 1.5],\n",
" [ 5.7, 2.6, 3.5, 1. ],\n",
" [ 5.5, 2.4, 3.8, 1.1],\n",
" [ 5.5, 2.4, 3.7, 1. ],\n",
" [ 5.8, 2.7, 3.9, 1.2],\n",
" [ 6. , 2.7, 5.1, 1.6],\n",
" [ 5.4, 3. , 4.5, 1.5],\n",
" [ 6. , 3.4, 4.5, 1.6],\n",
" [ 6.7, 3.1, 4.7, 1.5],\n",
" [ 6.3, 2.3, 4.4, 1.3],\n",
" [ 5.6, 3. , 4.1, 1.3],\n",
" [ 5.5, 2.5, 4. , 1.3],\n",
" [ 5.5, 2.6, 4.4, 1.2],\n",
" [ 6.1, 3. , 4.6, 1.4],\n",
" [ 5.8, 2.6, 4. , 1.2],\n",
" [ 5. , 2.3, 3.3, 1. ],\n",
" [ 5.6, 2.7, 4.2, 1.3],\n",
" [ 5.7, 3. , 4.2, 1.2],\n",
" [ 5.7, 2.9, 4.2, 1.3],\n",
" [ 6.2, 2.9, 4.3, 1.3],\n",
" [ 5.1, 2.5, 3. , 1.1],\n",
" [ 5.7, 2.8, 4.1, 1.3],\n",
" [ 6.3, 3.3, 6. , 2.5],\n",
" [ 5.8, 2.7, 5.1, 1.9],\n",
" [ 7.1, 3. , 5.9, 2.1],\n",
" [ 6.3, 2.9, 5.6, 1.8],\n",
" [ 6.5, 3. , 5.8, 2.2],\n",
" [ 7.6, 3. , 6.6, 2.1],\n",
" [ 4.9, 2.5, 4.5, 1.7],\n",
" [ 7.3, 2.9, 6.3, 1.8],\n",
" [ 6.7, 2.5, 5.8, 1.8],\n",
" [ 7.2, 3.6, 6.1, 2.5],\n",
" [ 6.5, 3.2, 5.1, 2. ],\n",
" [ 6.4, 2.7, 5.3, 1.9],\n",
" [ 6.8, 3. , 5.5, 2.1],\n",
" [ 5.7, 2.5, 5. , 2. ],\n",
" [ 5.8, 2.8, 5.1, 2.4],\n",
" [ 6.4, 3.2, 5.3, 2.3],\n",
" [ 6.5, 3. , 5.5, 1.8],\n",
" [ 7.7, 3.8, 6.7, 2.2],\n",
" [ 7.7, 2.6, 6.9, 2.3],\n",
" [ 6. , 2.2, 5. , 1.5],\n",
" [ 6.9, 3.2, 5.7, 2.3],\n",
" [ 5.6, 2.8, 4.9, 2. ],\n",
" [ 7.7, 2.8, 6.7, 2. ],\n",
" [ 6.3, 2.7, 4.9, 1.8],\n",
" [ 6.7, 3.3, 5.7, 2.1],\n",
" [ 7.2, 3.2, 6. , 1.8],\n",
" [ 6.2, 2.8, 4.8, 1.8],\n",
" [ 6.1, 3. , 4.9, 1.8],\n",
" [ 6.4, 2.8, 5.6, 2.1],\n",
" [ 7.2, 3. , 5.8, 1.6],\n",
" [ 7.4, 2.8, 6.1, 1.9],\n",
" [ 7.9, 3.8, 6.4, 2. ],\n",
" [ 6.4, 2.8, 5.6, 2.2],\n",
" [ 6.3, 2.8, 5.1, 1.5],\n",
" [ 6.1, 2.6, 5.6, 1.4],\n",
" [ 7.7, 3. , 6.1, 2.3],\n",
" [ 6.3, 3.4, 5.6, 2.4],\n",
" [ 6.4, 3.1, 5.5, 1.8],\n",
" [ 6. , 3. , 4.8, 1.8],\n",
" [ 6.9, 3.1, 5.4, 2.1],\n",
" [ 6.7, 3.1, 5.6, 2.4],\n",
" [ 6.9, 3.1, 5.1, 2.3],\n",
" [ 5.8, 2.7, 5.1, 1.9],\n",
" [ 6.8, 3.2, 5.9, 2.3],\n",
" [ 6.7, 3.3, 5.7, 2.5],\n",
" [ 6.7, 3. , 5.2, 2.3],\n",
" [ 6.3, 2.5, 5. , 1.9],\n",
" [ 6.5, 3. , 5.2, 2. ],\n",
" [ 6.2, 3.4, 5.4, 2.3],\n",
" [ 5.9, 3. , 5.1, 1.8]])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,351 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# মডেলের কার্যকারীতা (ইভ্যালুয়েশন)\n",
"জুপিটার নোটবুকের এর লিংক https://github.com/raqueeb/ml-python/blob/master/model-evaluation1.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এর আগে আলাপ করেছিলাম, আমাদের জানা দরকার - কোন ধরণের মডেল নিয়ে আমাদের কাজ ভালো হবে। পাশাপাশি ক্লাসিফায়ারের কোন টিউনিং প্যারামিটার নিয়ে কাজ করলে সবচেয়ে বেশি অ্যাক্যুরেসি আসবে, সেটা নিয়ে আলাপ করা দরকার। নিজের ডেটা দিয়ে ট্রেনিং করে 'আউট অফ স্যাম্পল ডেটা' (যেটা দিয়ে ট্রেনিং করাইনি) এর জন্য আমাদের মডেল কতটুকু তৈরি?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ১. এক ডেটাসেট দিয়ে ট্রেনিং এবং ইভাল্যুয়েট করানো (বর্জনীয়)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"১. পুরো আইরিস ডেটাসেট দিয়ে মডেলকে ট্রেনিং করি।\n",
"\n",
"২. একই ডেটাসেট দিয়ে ইভ্যালুয়েট করে দেখি কী হয় তার অ্যাক্যুরেসির অবস্থা। "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# আইরিস ডেটাসেটকে লোড করে নিচ্ছি\n",
"from sklearn.datasets import load_iris\n",
"iris = load_iris()\n",
"\n",
"# X এ ফীচার আর y এ রেসপন্স রাখছি \n",
"X = iris.data\n",
"y = iris.target"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### যদি \"কে-নিয়ারেস্ট নেইবার্স\" ক্লাসিফায়ারের নেইবার ৩ হয় "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
" 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# আগের মতো KNeighborsClassifier ইমপোর্ট করি \n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"# মডেলকে ইনস্ট্যানশিয়েট করলাম \n",
"knn = KNeighborsClassifier(n_neighbors=5)\n",
"# মডেলের মধ্যে সম্পৰ্ক তৈরি করি \n",
"knn.fit(X, y)\n",
"# X এর মধ্যে যে ভ্যালুগুলো আছে সেগুলোর রেসপন্স ভ্যালু প্রেডিক্ট করি \n",
"knn.predict(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"অনেক ভ্যালু, তাই না? আচ্ছা, প্রথম পাঁচটা ভ্যালু দেখি। "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0, 0])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# প্রথম পাঁচটা প্রেডিকশন \n",
"knn.predict(X)[0:5]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"150"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# y_pred তে প্রেডিক্টেড রেসপন্স ভ্যালুগুলোকে স্টোর করি \n",
"y_pred = knn.predict(X)\n",
"\n",
"# আমরা কতগুলো আইটেম প্রেডিক্ট করলাম?\n",
"len(y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"প্রেডিকশনের কতোটুকু অ্যাক্যুরেসি এসেছে? এটা কিন্তু ইন্টারনাল ক্যালকুলেশন। পুরো ডেটাসেটের ওপর। এখানে score ফাংশন ব্যবহার করছি ফীচার আর টার্গেট রেসপন্সগুলোকে পাঠিয়ে। "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.96666666666666667"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knn.score(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এখানে একটু গল্প করি। এমুহুর্তে আমাদের মডেল প্রেডিক্ট করেছি জানা উত্তরের সাথে। ১৫০টা রেকর্ডের ১৫০টা টার্গেট ভ্যারিয়েবল (উত্তর) দেয়া আছে ডেটাসেটের সাথে। এখন knn.predict(X) দিয়ে বের করা প্রেডিক্টেড উত্তর মেলাতে হবে আসল উত্তরের সাথে। মেশিন লার্নিং কনভেনশন অনুযায়ী প্রেডিক্টেড উত্তরকে আমরা বলি \"y_pred\"। আচ্ছা, আমাদের আসল উত্তর স্টোর করা আছে কোথায়? ঠিক ধরেছেন \"y\" এ। মডেলের অ্যাক্যুরেসি জানবো কিভাবে? \"y\" এর সাথে \"y_pred\" তুলনা করলেই বোঝা যাবে। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"আরেকটা গল্প করি। এটা পাইথন মেশিন লার্নিং গুরু সেবাস্টিয়ান রাখশা'এর একটা উত্তর। প্রিয় সাইট \"কোৱা\" থেকে নেয়া। এখানে y_true হচ্ছে সত্যি উত্তর আর y_pred হচ্ছে প্রেডিক্টেড উত্তর। y_pred এ স্টোর করছি আমাদের ক্লাস প্রেডিকশন। প্রতিটা ক্লাসের অ্যাক্যুরেসি বের করার জন্য দুটো মেথড ব্যবহার করা যেতে পারে। একটা হচ্ছে ক্লাসিফায়ারের স্কোর মেথড মানে knn.score(X, y) আরেকটা accuracy_score(X, y)। নিচের উদাহরণে y_true হচ্ছে আসল উত্তর, আর y_pred হচ্ছে প্রেডিকশন। নিচের উদাহরণটা দেখুন। y_true সত্যিকারের ডেটা থেকে প্রেডিক্টেড y_pred এর মধ্যে ১০টা ভ্যালুর মধ্যে একটাই ভুল হয়েছে। সেকারণে accuracy_score হচ্ছে ৯০%"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.90000000000000002"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import accuracy_score\n",
"import numpy as np\n",
"y_true = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])\n",
"y_pred1 = np.array([0, 0, 0, 1, 1, 1, 2, 2 , 2, 0])\n",
"accuracy_score(y_true, y_pred1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এখন আসি আমাদের আইরিস ডেটাসেটের অ্যাক্যুরেসিতে। এটা আসবে আমাদের কতো শতাংশ প্রেডিকশন (y_pred) সত্যিকারের ভ্যালু (y) এর সাথে মিলেছে। এখানে আমরা metrics মডিউল ইমপোর্ট করে নিয়ে আসছি sklearn থেকে। এরপর y, y_pred ক্লাসকে পাঠিয়ে দিচ্ছি accuracy_score এর কাছে ক্লাসিফায়ারের কার্যকারীতা মানে অ্যাক্যুরেসি বের করার জন্য। "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.966666666667\n"
]
}
],
"source": [
"# compute classification accuracy for the logistic regression model\n",
"from sklearn import metrics\n",
"print(metrics.accuracy_score(y, y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"তাই বলে কী এটা হবে না? যেহেতু ট্রেনিং এবং টেস্ট একই ডেটাসেটে, আমরা এই জিনিষকে বলতে পারি \"ট্রেনিং অ্যাক্যুরেসি\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.96666666666666667"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"# print(\"Test set score: {:.2f}\".format(np.mean(y_pred == y)))\n",
"np.mean(y_pred == y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### যদি \"কে-নিয়ারেস্ট নেইবার্স\" ক্লাসিফায়ারের নেইবার ১ হয় "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.0\n"
]
}
],
"source": [
"from sklearn.neighbors import KNeighborsClassifier\n",
"knn = KNeighborsClassifier(n_neighbors=1)\n",
"knn.fit(X, y)\n",
"y_pred = knn.predict(X)\n",
"print(metrics.accuracy_score(y, y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এখানে চিন্তার অংকে খোরাক আছে। অ্যাক্যুরেসি ১ মানে ১০০% ঠিক প্রেডিক্ট করতে পেরেছে মডেল। এটা প্রশ্ন ফাঁসের মতো জিনিস। সেটা আমরা চাইবো না। চাইবো এমন একটা জেনারেলাইজড মডেল, যেটা যেকোন নতুন ডেটা দিয়ে কাজ করতে পারবে ভালো অ্যাক্যুরেসি দিয়ে। এগুলো ট্রেনিং ডেটা দিয়ে \"ওভারফিটিং\" হয়ে যায়।"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এখন একটা কাজ করি। বলুনতো এখানে কী ভুল আছে? আপনার সামনে রয়েছে ইন্টারনেট ব্রাউজার। গুগল করে দেখুন, কী বলতে চেয়েছি এখানে? নতুন রাস্তা দেখতে হবে কনফিউশন ম্যাট্রিক্স নিয়ে। কনফিউশন ম্যাট্রিক্স কেন দরকার? এখানে পুরোটাই ট্রেনিং ডেটা। "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[50, 0, 0],\n",
" [ 0, 50, 0],\n",
" [ 0, 0, 50]], dtype=int64)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#import confusion_matrix\n",
"from sklearn.metrics import confusion_matrix\n",
"confusion_matrix(y,y_pred)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@ -0,0 +1,539 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## করণীয় ইভ্যালুয়েশন প্রসেস: ট্রেইন/টেস্ট ভাগ \n",
"জুপিটার নোটবুকের লিংক https://github.com/raqueeb/ml-python/blob/master/model-evaluation2.ipynb\n",
"\n",
"ডাউনলোড করে নিন নিজের ব্যবহারের জন্য, ধারণার জন্য ধন্যবাদ কেভিন মার্কামকে। ডেটাস্কুল। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\"মডেল ইভাল্যুয়েশনের ধারণা\" চ্যাপ্টারের দ্বিতীয় প্রস্তাবনা দেখুন। \n",
"\n",
"১. পুরো ডেটাসেটকে ভাগ করে ফেলি দুভাগে। ক. ট্রেনিং সেট খ. টেস্ট সেট।\n",
"\n",
"২. মডেলকে ট্রেনিং করাবো \"ট্রেনিং সেট\" দিয়ে। \n",
"\n",
"৩. মডেলকে টেস্ট করবো \"টেস্ট সেট\" দিয়ে। সেটাই ইভ্যালুয়েট করবে কেমন করছে মডেলটা। \n",
"\n",
". আমাদের সাইকিট-লার্নে এই কাজ করার জন্য train_test_split নামে একটা ফাংশন তৈরি করে দেয়া হয়েছে কাজের সুবিধার্থে। শুধুমাত্র কনভেনশনটা জানলেই চলবে। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"আইরিস ডেটাসেট নিয়ে কাজ করার আগে একটা উদাহরণ দেখি। সাইকিট লার্ন ডকুমেন্টেশন থেকে নেয়া। আগে আপনাদেরকে দেখিয়ে নিয়ে আসি X এবং y এর ভেতরে কী আছে? "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 1],\n",
" [2, 3],\n",
" [4, 5],\n",
" [6, 7],\n",
" [8, 9]])"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"# ভুলেও বোঝার দরকার নেই কিভাবে আমরা X, y জেনারেট করলাম \n",
"X, y = np.arange(10).reshape((5, 2)), range(5)\n",
"# আমাদের দেখতে হবে কি আছে X এর ভেতরে?\n",
"X"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"range(0, 5)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# এখন দেখি কি আছে y এর ভেতর। \n",
"y"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0, 1, 2, 3, 4]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# এর মানে থেকে ৫টা সংখ্যা, লিস্ট কমান্ড দিয়ে দেখি বরং \n",
"list(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এখন আসি কাজের কাজে। কষ্ট করে X, y ম্যানুয়ালি আলাদা না করে ডেকে নিয়ে আসি train_test_split ফাংশনকে। সাইকিট লার্নের model_selection মডিউল থেকে। আমি যদি আলাদা করে কিছু না বলি, তাহলে সে আমাদের এই ৫ লাইনের ডেটাকে ৭৫% ট্রেনিং আর ২৫% টেস্ট ডেটাসেটে ভাগ করবে। "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"একটু ভালো করে লক্ষ্য করলেই দেখবেন নিচের কমান্ডটা একটা সাইকিট লার্ন কনভেনশন। এই স্টাইলে ফলো করে সবাই। এটাই ব্যবহার করবো আমরা। শুরুতে কপি করে চালাবো এই কনভেনশন। train_test_split পুরো ডেটাকে ট্রেনিং আর টেস্ট সেটে ভাগ করার আগে দৈবচয়নের মাধ্যমে (random_state) শাফল করে নেয় কাজের সুবিধার্থে। মনে আছে শুরুতে টার্গেট ভেক্টর 0,0,0 এর পর 1,1,1 অথবা 2,2,2 হওয়ার কারণে শাফল জরুরি। তবে, random_state=? ভ্যালু হিসেবে যা ব্যবহার করবেন সেটাকে এক রাখতে হবে পুরো এক্সারসাইজে। মনে রাখুন X ভাগ হবে X_train, X_test দুভাগে। সেখানে y হবে y_train, y_test দুভাগে। "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"চলুন দেখি X_train, X_test, y_train এবং y_test মধ্যে কী আছে? খেয়াল করুন কিভাবে পুরো ডেটাসেট ভাগ হয়েছে?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2, 3],\n",
" [8, 9],\n",
" [4, 5]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# ৫টা রেকর্ডের মধ্যে ৩টা এসেছে এখানে \n",
"X_train"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 4, 2]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# টার্গেট ভেক্টর আসতে হবে ওই ৩টাই \n",
"y_train"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 1],\n",
" [6, 7]])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_test"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0, 3]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_test"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[[0, 1, 2], [3, 4]]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_test_split(y, shuffle=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"দেখেছেন তো কিভাবে পুরো ডেটাসেট ভাগ হয়ে গেছে? এখন আসি আইরিস ডেটাসেটে। শুরুতে আগের গল্প। পপুলেট করে নেই ফিচার আর টার্গেট রেসপন্স। "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ধাপ ১"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# শুরুতে লোড করে নেই আইরিস ডেটাসেট \n",
"from sklearn.datasets import load_iris\n",
"iris = load_iris()\n",
"\n",
"# ফিচার আর টার্গেট রেসপন্স চলে যাচ্ছে X এবং y\n",
"X = iris.data\n",
"y = iris.target"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(150, 4)\n",
"(150,)\n"
]
}
],
"source": [
"# train_test_split চালানোর আগে অ্যারেগুলোর সংখ্যা দেখে রাখি \n",
"print(X.shape)\n",
"print(y.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ধাপ ২"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# ইমপোর্ট করছি train_test_split ফাংশনকে \n",
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"এই জিনিস থেকে কী পাবো আমরা?\n",
"\n",
"১. আলাদা আলাদা ডেটা দিয়ে মডেলকে ট্রেইন এবং টেস্ট করানো যাবে।\n",
"\n",
"২. টেস্ট সেটের 'রেসপন্স ভ্যালু' আমরা যেহেতু জানি, সেজন্য সেটার পারফরম্যান্স জানা যাবে। \n",
"\n",
"৩. টেস্টিং অ্যাক্যুরেসি ভালো হবে যখন দুটো আলাদা আলাদা ডেটাসেট। মডেলটা 'জেনারেলাইজড' হলো নতুন আউট অফ স্যাম্পল ডেটা নিয়ে কাজ করার জন্য।\n",
"\n",
". ডিফল্ট সেটিংস ধরে রেকর্ডকে ভাগ করে ৭৫% ডেটাকে ট্রেনিং আর ২৫% ডেটাকে টেস্ট ডেটাসেটে ভাগ হয়ে যাবে। ৭৫% হচ্ছে ১১২টা রেকর্ড। ২৫% হচ্ছে ৩৮টা রেকর্ড।"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(112, 4)\n",
"(38, 4)\n"
]
}
],
"source": [
"# নতুন X অবজেক্টগুলোর রেকর্ড সংখ্যা \n",
"print(X_train.shape)\n",
"print(X_test.shape)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(112,)\n",
"(38,)\n"
]
}
],
"source": [
"# নতুন y অবজেক্টগুলোর রেকর্ড সংখ্যা \n",
"print(y_train.shape)\n",
"print(y_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ধরুন, আপনার বন্ধু নাছোড়বান্দা। সে ডিফল্ট সেটিংস নিয়ে সন্তুষ্ট নয়। তার কথা হচ্ছে ট্রেনিং আর টেস্ট সেট ভাগ করতে চায় ৬০-% ভাগে। তার জন্য আপনাকে যোগ করতে হবে test_size=0.4 মানে %"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"দেখে নেই নতুন ভাগ। "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(60, 4)\n",
"(60,)\n"
]
}
],
"source": [
"# নতুন X অবজেক্টগুলোর রেকর্ড সংখ্যা \n",
"print(X_test.shape)\n",
"print(y_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ধাপ ৩"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=3, p=2,\n",
" weights='uniform')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# আগের মতো KNeighborsClassifier ইমপোর্ট করি \n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"# মডেলকে ইনস্ট্যানশিয়েট করলাম \n",
"# যদি \"কে-নিয়ারেস্ট নেইবার্স\" ক্লাসিফায়ারের নেইবার ৩ হয়\n",
"knn = KNeighborsClassifier(n_neighbors=3)\n",
"# মডেলের মধ্যে সম্পৰ্ক তৈরি করি X_train এবং y_train দিয়ে\n",
"knn.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ধাপ "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.966666666667\n"
]
}
],
"source": [
"# প্রেডিকশন করছি টেস্ট সেট ধরে \n",
"y_pred = knn.predict(X_test)\n",
"# প্রেডিক্টেড রেসপন্স ভ্যালুর (y_pred) সাথে তুলনা করছি \n",
"# আসল রেসপন্স ভ্যালু (y_test)কে \n",
"# আগের মতো ইমপোর্ট করলাম metricsকে \n",
"from sklearn import metrics\n",
"print(metrics.accuracy_score(y_test, y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### যদি \"কে-নিয়ারেস্ট নেইবার্স\" ক্লাসিফায়ারের নেইবার ৫ হয়"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.966666666667\n"
]
}
],
"source": [
"knn = KNeighborsClassifier(n_neighbors=5)\n",
"knn.fit(X_train, y_train)\n",
"y_pred = knn.predict(X_test)\n",
"print(metrics.accuracy_score(y_test, y_pred))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,697 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"iris = sns.load_dataset(\"iris\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(iris)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5.4</td>\n",
" <td>3.9</td>\n",
" <td>1.7</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>4.6</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>5.0</td>\n",
" <td>3.4</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4.4</td>\n",
" <td>2.9</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>4.9</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.4</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>4.8</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>4.3</td>\n",
" <td>3.0</td>\n",
" <td>1.1</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>5.8</td>\n",
" <td>4.0</td>\n",
" <td>1.2</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>5.7</td>\n",
" <td>4.4</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>5.4</td>\n",
" <td>3.9</td>\n",
" <td>1.3</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>5.7</td>\n",
" <td>3.8</td>\n",
" <td>1.7</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>5.1</td>\n",
" <td>3.8</td>\n",
" <td>1.5</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>5.4</td>\n",
" <td>3.4</td>\n",
" <td>1.7</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>5.1</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>4.6</td>\n",
" <td>3.6</td>\n",
" <td>1.0</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>5.1</td>\n",
" <td>3.3</td>\n",
" <td>1.7</td>\n",
" <td>0.5</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.9</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>5.0</td>\n",
" <td>3.0</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>5.0</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>5.2</td>\n",
" <td>3.5</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>5.2</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120</th>\n",
" <td>6.9</td>\n",
" <td>3.2</td>\n",
" <td>5.7</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121</th>\n",
" <td>5.6</td>\n",
" <td>2.8</td>\n",
" <td>4.9</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122</th>\n",
" <td>7.7</td>\n",
" <td>2.8</td>\n",
" <td>6.7</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123</th>\n",
" <td>6.3</td>\n",
" <td>2.7</td>\n",
" <td>4.9</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124</th>\n",
" <td>6.7</td>\n",
" <td>3.3</td>\n",
" <td>5.7</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>125</th>\n",
" <td>7.2</td>\n",
" <td>3.2</td>\n",
" <td>6.0</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>126</th>\n",
" <td>6.2</td>\n",
" <td>2.8</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>127</th>\n",
" <td>6.1</td>\n",
" <td>3.0</td>\n",
" <td>4.9</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128</th>\n",
" <td>6.4</td>\n",
" <td>2.8</td>\n",
" <td>5.6</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>129</th>\n",
" <td>7.2</td>\n",
" <td>3.0</td>\n",
" <td>5.8</td>\n",
" <td>1.6</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>7.4</td>\n",
" <td>2.8</td>\n",
" <td>6.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>131</th>\n",
" <td>7.9</td>\n",
" <td>3.8</td>\n",
" <td>6.4</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>132</th>\n",
" <td>6.4</td>\n",
" <td>2.8</td>\n",
" <td>5.6</td>\n",
" <td>2.2</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133</th>\n",
" <td>6.3</td>\n",
" <td>2.8</td>\n",
" <td>5.1</td>\n",
" <td>1.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134</th>\n",
" <td>6.1</td>\n",
" <td>2.6</td>\n",
" <td>5.6</td>\n",
" <td>1.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>135</th>\n",
" <td>7.7</td>\n",
" <td>3.0</td>\n",
" <td>6.1</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>136</th>\n",
" <td>6.3</td>\n",
" <td>3.4</td>\n",
" <td>5.6</td>\n",
" <td>2.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>137</th>\n",
" <td>6.4</td>\n",
" <td>3.1</td>\n",
" <td>5.5</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>138</th>\n",
" <td>6.0</td>\n",
" <td>3.0</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.4</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>140</th>\n",
" <td>6.7</td>\n",
" <td>3.1</td>\n",
" <td>5.6</td>\n",
" <td>2.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.1</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>142</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143</th>\n",
" <td>6.8</td>\n",
" <td>3.2</td>\n",
" <td>5.9</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144</th>\n",
" <td>6.7</td>\n",
" <td>3.3</td>\n",
" <td>5.7</td>\n",
" <td>2.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa\n",
"5 5.4 3.9 1.7 0.4 setosa\n",
"6 4.6 3.4 1.4 0.3 setosa\n",
"7 5.0 3.4 1.5 0.2 setosa\n",
"8 4.4 2.9 1.4 0.2 setosa\n",
"9 4.9 3.1 1.5 0.1 setosa\n",
"10 5.4 3.7 1.5 0.2 setosa\n",
"11 4.8 3.4 1.6 0.2 setosa\n",
"12 4.8 3.0 1.4 0.1 setosa\n",
"13 4.3 3.0 1.1 0.1 setosa\n",
"14 5.8 4.0 1.2 0.2 setosa\n",
"15 5.7 4.4 1.5 0.4 setosa\n",
"16 5.4 3.9 1.3 0.4 setosa\n",
"17 5.1 3.5 1.4 0.3 setosa\n",
"18 5.7 3.8 1.7 0.3 setosa\n",
"19 5.1 3.8 1.5 0.3 setosa\n",
"20 5.4 3.4 1.7 0.2 setosa\n",
"21 5.1 3.7 1.5 0.4 setosa\n",
"22 4.6 3.6 1.0 0.2 setosa\n",
"23 5.1 3.3 1.7 0.5 setosa\n",
"24 4.8 3.4 1.9 0.2 setosa\n",
"25 5.0 3.0 1.6 0.2 setosa\n",
"26 5.0 3.4 1.6 0.4 setosa\n",
"27 5.2 3.5 1.5 0.2 setosa\n",
"28 5.2 3.4 1.4 0.2 setosa\n",
"29 4.7 3.2 1.6 0.2 setosa\n",
".. ... ... ... ... ...\n",
"120 6.9 3.2 5.7 2.3 virginica\n",
"121 5.6 2.8 4.9 2.0 virginica\n",
"122 7.7 2.8 6.7 2.0 virginica\n",
"123 6.3 2.7 4.9 1.8 virginica\n",
"124 6.7 3.3 5.7 2.1 virginica\n",
"125 7.2 3.2 6.0 1.8 virginica\n",
"126 6.2 2.8 4.8 1.8 virginica\n",
"127 6.1 3.0 4.9 1.8 virginica\n",
"128 6.4 2.8 5.6 2.1 virginica\n",
"129 7.2 3.0 5.8 1.6 virginica\n",
"130 7.4 2.8 6.1 1.9 virginica\n",
"131 7.9 3.8 6.4 2.0 virginica\n",
"132 6.4 2.8 5.6 2.2 virginica\n",
"133 6.3 2.8 5.1 1.5 virginica\n",
"134 6.1 2.6 5.6 1.4 virginica\n",
"135 7.7 3.0 6.1 2.3 virginica\n",
"136 6.3 3.4 5.6 2.4 virginica\n",
"137 6.4 3.1 5.5 1.8 virginica\n",
"138 6.0 3.0 4.8 1.8 virginica\n",
"139 6.9 3.1 5.4 2.1 virginica\n",
"140 6.7 3.1 5.6 2.4 virginica\n",
"141 6.9 3.1 5.1 2.3 virginica\n",
"142 5.8 2.7 5.1 1.9 virginica\n",
"143 6.8 3.2 5.9 2.3 virginica\n",
"144 6.7 3.3 5.7 2.5 virginica\n",
"145 6.7 3.0 5.2 2.3 virginica\n",
"146 6.3 2.5 5.0 1.9 virginica\n",
"147 6.5 3.0 5.2 2.0 virginica\n",
"148 6.2 3.4 5.4 2.3 virginica\n",
"149 5.9 3.0 5.1 1.8 virginica\n",
"\n",
"[150 rows x 5 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,810 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.datasets import load_iris"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"iris = load_iris()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"iris_dataframe = pd.DataFrame(data= np.c_[iris['data'], iris['target']],\n",
" columns= iris['feature_names'] + ['target'])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"iris_dataframe['species'] = pd.Categorical.from_codes(iris.target, \n",
" iris.target_names)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>target</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5.4</td>\n",
" <td>3.9</td>\n",
" <td>1.7</td>\n",
" <td>0.4</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>4.6</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>5.0</td>\n",
" <td>3.4</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4.4</td>\n",
" <td>2.9</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>4.9</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.1</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.4</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>4.8</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.1</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>4.3</td>\n",
" <td>3.0</td>\n",
" <td>1.1</td>\n",
" <td>0.1</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>5.8</td>\n",
" <td>4.0</td>\n",
" <td>1.2</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>5.7</td>\n",
" <td>4.4</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>5.4</td>\n",
" <td>3.9</td>\n",
" <td>1.3</td>\n",
" <td>0.4</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>5.7</td>\n",
" <td>3.8</td>\n",
" <td>1.7</td>\n",
" <td>0.3</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>5.1</td>\n",
" <td>3.8</td>\n",
" <td>1.5</td>\n",
" <td>0.3</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>5.4</td>\n",
" <td>3.4</td>\n",
" <td>1.7</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>5.1</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>4.6</td>\n",
" <td>3.6</td>\n",
" <td>1.0</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>5.1</td>\n",
" <td>3.3</td>\n",
" <td>1.7</td>\n",
" <td>0.5</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.9</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>5.0</td>\n",
" <td>3.0</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>5.0</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.4</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>5.2</td>\n",
" <td>3.5</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>5.2</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>0.0</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120</th>\n",
" <td>6.9</td>\n",
" <td>3.2</td>\n",
" <td>5.7</td>\n",
" <td>2.3</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121</th>\n",
" <td>5.6</td>\n",
" <td>2.8</td>\n",
" <td>4.9</td>\n",
" <td>2.0</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122</th>\n",
" <td>7.7</td>\n",
" <td>2.8</td>\n",
" <td>6.7</td>\n",
" <td>2.0</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123</th>\n",
" <td>6.3</td>\n",
" <td>2.7</td>\n",
" <td>4.9</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124</th>\n",
" <td>6.7</td>\n",
" <td>3.3</td>\n",
" <td>5.7</td>\n",
" <td>2.1</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>125</th>\n",
" <td>7.2</td>\n",
" <td>3.2</td>\n",
" <td>6.0</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>126</th>\n",
" <td>6.2</td>\n",
" <td>2.8</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>127</th>\n",
" <td>6.1</td>\n",
" <td>3.0</td>\n",
" <td>4.9</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128</th>\n",
" <td>6.4</td>\n",
" <td>2.8</td>\n",
" <td>5.6</td>\n",
" <td>2.1</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>129</th>\n",
" <td>7.2</td>\n",
" <td>3.0</td>\n",
" <td>5.8</td>\n",
" <td>1.6</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>7.4</td>\n",
" <td>2.8</td>\n",
" <td>6.1</td>\n",
" <td>1.9</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>131</th>\n",
" <td>7.9</td>\n",
" <td>3.8</td>\n",
" <td>6.4</td>\n",
" <td>2.0</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>132</th>\n",
" <td>6.4</td>\n",
" <td>2.8</td>\n",
" <td>5.6</td>\n",
" <td>2.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133</th>\n",
" <td>6.3</td>\n",
" <td>2.8</td>\n",
" <td>5.1</td>\n",
" <td>1.5</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134</th>\n",
" <td>6.1</td>\n",
" <td>2.6</td>\n",
" <td>5.6</td>\n",
" <td>1.4</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>135</th>\n",
" <td>7.7</td>\n",
" <td>3.0</td>\n",
" <td>6.1</td>\n",
" <td>2.3</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>136</th>\n",
" <td>6.3</td>\n",
" <td>3.4</td>\n",
" <td>5.6</td>\n",
" <td>2.4</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>137</th>\n",
" <td>6.4</td>\n",
" <td>3.1</td>\n",
" <td>5.5</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>138</th>\n",
" <td>6.0</td>\n",
" <td>3.0</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.4</td>\n",
" <td>2.1</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>140</th>\n",
" <td>6.7</td>\n",
" <td>3.1</td>\n",
" <td>5.6</td>\n",
" <td>2.4</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.1</td>\n",
" <td>2.3</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>142</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.9</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143</th>\n",
" <td>6.8</td>\n",
" <td>3.2</td>\n",
" <td>5.9</td>\n",
" <td>2.3</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144</th>\n",
" <td>6.7</td>\n",
" <td>3.3</td>\n",
" <td>5.7</td>\n",
" <td>2.5</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
"5 5.4 3.9 1.7 0.4 \n",
"6 4.6 3.4 1.4 0.3 \n",
"7 5.0 3.4 1.5 0.2 \n",
"8 4.4 2.9 1.4 0.2 \n",
"9 4.9 3.1 1.5 0.1 \n",
"10 5.4 3.7 1.5 0.2 \n",
"11 4.8 3.4 1.6 0.2 \n",
"12 4.8 3.0 1.4 0.1 \n",
"13 4.3 3.0 1.1 0.1 \n",
"14 5.8 4.0 1.2 0.2 \n",
"15 5.7 4.4 1.5 0.4 \n",
"16 5.4 3.9 1.3 0.4 \n",
"17 5.1 3.5 1.4 0.3 \n",
"18 5.7 3.8 1.7 0.3 \n",
"19 5.1 3.8 1.5 0.3 \n",
"20 5.4 3.4 1.7 0.2 \n",
"21 5.1 3.7 1.5 0.4 \n",
"22 4.6 3.6 1.0 0.2 \n",
"23 5.1 3.3 1.7 0.5 \n",
"24 4.8 3.4 1.9 0.2 \n",
"25 5.0 3.0 1.6 0.2 \n",
"26 5.0 3.4 1.6 0.4 \n",
"27 5.2 3.5 1.5 0.2 \n",
"28 5.2 3.4 1.4 0.2 \n",
"29 4.7 3.2 1.6 0.2 \n",
".. ... ... ... ... \n",
"120 6.9 3.2 5.7 2.3 \n",
"121 5.6 2.8 4.9 2.0 \n",
"122 7.7 2.8 6.7 2.0 \n",
"123 6.3 2.7 4.9 1.8 \n",
"124 6.7 3.3 5.7 2.1 \n",
"125 7.2 3.2 6.0 1.8 \n",
"126 6.2 2.8 4.8 1.8 \n",
"127 6.1 3.0 4.9 1.8 \n",
"128 6.4 2.8 5.6 2.1 \n",
"129 7.2 3.0 5.8 1.6 \n",
"130 7.4 2.8 6.1 1.9 \n",
"131 7.9 3.8 6.4 2.0 \n",
"132 6.4 2.8 5.6 2.2 \n",
"133 6.3 2.8 5.1 1.5 \n",
"134 6.1 2.6 5.6 1.4 \n",
"135 7.7 3.0 6.1 2.3 \n",
"136 6.3 3.4 5.6 2.4 \n",
"137 6.4 3.1 5.5 1.8 \n",
"138 6.0 3.0 4.8 1.8 \n",
"139 6.9 3.1 5.4 2.1 \n",
"140 6.7 3.1 5.6 2.4 \n",
"141 6.9 3.1 5.1 2.3 \n",
"142 5.8 2.7 5.1 1.9 \n",
"143 6.8 3.2 5.9 2.3 \n",
"144 6.7 3.3 5.7 2.5 \n",
"145 6.7 3.0 5.2 2.3 \n",
"146 6.3 2.5 5.0 1.9 \n",
"147 6.5 3.0 5.2 2.0 \n",
"148 6.2 3.4 5.4 2.3 \n",
"149 5.9 3.0 5.1 1.8 \n",
"\n",
" target species \n",
"0 0.0 setosa \n",
"1 0.0 setosa \n",
"2 0.0 setosa \n",
"3 0.0 setosa \n",
"4 0.0 setosa \n",
"5 0.0 setosa \n",
"6 0.0 setosa \n",
"7 0.0 setosa \n",
"8 0.0 setosa \n",
"9 0.0 setosa \n",
"10 0.0 setosa \n",
"11 0.0 setosa \n",
"12 0.0 setosa \n",
"13 0.0 setosa \n",
"14 0.0 setosa \n",
"15 0.0 setosa \n",
"16 0.0 setosa \n",
"17 0.0 setosa \n",
"18 0.0 setosa \n",
"19 0.0 setosa \n",
"20 0.0 setosa \n",
"21 0.0 setosa \n",
"22 0.0 setosa \n",
"23 0.0 setosa \n",
"24 0.0 setosa \n",
"25 0.0 setosa \n",
"26 0.0 setosa \n",
"27 0.0 setosa \n",
"28 0.0 setosa \n",
"29 0.0 setosa \n",
".. ... ... \n",
"120 2.0 virginica \n",
"121 2.0 virginica \n",
"122 2.0 virginica \n",
"123 2.0 virginica \n",
"124 2.0 virginica \n",
"125 2.0 virginica \n",
"126 2.0 virginica \n",
"127 2.0 virginica \n",
"128 2.0 virginica \n",
"129 2.0 virginica \n",
"130 2.0 virginica \n",
"131 2.0 virginica \n",
"132 2.0 virginica \n",
"133 2.0 virginica \n",
"134 2.0 virginica \n",
"135 2.0 virginica \n",
"136 2.0 virginica \n",
"137 2.0 virginica \n",
"138 2.0 virginica \n",
"139 2.0 virginica \n",
"140 2.0 virginica \n",
"141 2.0 virginica \n",
"142 2.0 virginica \n",
"143 2.0 virginica \n",
"144 2.0 virginica \n",
"145 2.0 virginica \n",
"146 2.0 virginica \n",
"147 2.0 virginica \n",
"148 2.0 virginica \n",
"149 2.0 virginica \n",
"\n",
"[150 rows x 6 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris_dataframe"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff