Just as an example to help us getting started lets try to build a basic plot based on two data-sets that we are going to import.
First we have to install
scikit-learn by running the following command in the terminal:
pip install scikit-learn
First load the datasets (first two lines).
matplotlib ploting module(lines 3 and 4).
iris dataset (line 5), print the description of the dataset (line 6), and plot column 1 (sepal length) as x and column 2 (sepal width) as y:
from sklearn.datasets import load_iris from sklearn.datasets import load_boston from matplotlib import pyplot as plt %matplotlib inline iris = load_iris() print(iris.DESCR) iris = data.iris plt.plot(data[:,0],data[:,1],".")
Here is the output:
Iris Plants Database ==================== Notes ----- Data Set Characteristics: :Number of Instances: 150 (50 in each of three classes) :Number of Attributes: 4 numeric, predictive attributes and the class :Attribute Information: - sepal length in cm - sepal width in cm - petal length in cm - petal width in cm - class: - Iris-Setosa - Iris-Versicolour - Iris-Virginica :Summary Statistics: ============== ==== ==== ======= ===== ==================== Min Max Mean SD Class Correlation ============== ==== ==== ======= ===== ==================== sepal length: 4.3 7.9 5.84 0.83 0.7826 sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) ============== ==== ==== ======= ===== ==================== :Missing Attribute Values: None :Class Distribution: 33.3% for each of 3 classes. :Creator: R.A. Fisher :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) :Date: July, 1988
We can do the same for the
from sklearn.datasets import load_iris from sklearn.datasets import load_boston from matplotlib import pyplot as plt %matplotlib inline boston = load_boston() print(boston.DESCR) data = boston.data plt.plot(data[:,2],data[:,4],"+")
The description will be:
Boston House Prices dataset =========================== Notes ------ Data Set Characteristics: :Number of Instances: 506 :Number of Attributes: 13 numeric/categorical predictive :Median Value (attribute 14) is usually the target :Attribute Information (in order): - CRIM per capita crime rate by town - ZN proportion of residential land zoned for lots over 25,000 sq.ft. - INDUS proportion of non-retail business acres per town - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) - NOX nitric oxides concentration (parts per 10 million) - RM average number of rooms per dwelling - AGE proportion of owner-occupied units built prior to 1940 - DIS weighted distances to five Boston employment centres - RAD index of accessibility to radial highways - TAX full-value property-tax rate per $10,000 - PTRATIO pupil-teacher ratio by town - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town - LSTAT % lower status of the population - MEDV Median value of owner-occupied homes in $1000's
And the output plot will be: