Matplotlib for visualizing data

Just as an example to help us getting started lets try to build a basic plot based on two data-sets that we are going to import.

First we have to install scikit-learn by running the following command in the terminal:

pip install scikit-learn

First load the datasets (first two lines).
Import the matplotlib ploting module(lines 3 and 4).

Load the iris dataset (line 5), print the description of the dataset (line 6), and plot column 1 (sepal length) as x and column 2 (sepal width) as y:

from sklearn.datasets import load_iris
from sklearn.datasets import load_boston

from matplotlib import pyplot as plt
%matplotlib inline

iris = load_iris()
print(iris.DESCR)
iris = data.iris
plt.plot(data[:,0],data[:,1],".")

Here is the output:

Iris Plants Database
====================

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

matplotlib iris dataset

We can do the same for the boston dataset.

from sklearn.datasets import load_iris
from sklearn.datasets import load_boston

from matplotlib import pyplot as plt
%matplotlib inline

boston = load_boston()
print(boston.DESCR)
data = boston.data
plt.plot(data[:,2],data[:,4],"+")

The description will be:

Boston House Prices dataset
===========================

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

And the output plot will be:

matplotlib boston dataset

Leave a Reply