Easily Build and Deploy Predictive Analytics Solutions
Azure Machine Learning Studio is a browser-based drag and drop environment where no coding is necessary for simple experiments. Azure Machine Learning can publish the finished models, that can then be incorporated into web APIs.
To develop the model, a data source is chosen, the data is cleaned and tidied up as necessary, some statistical functions carried out on the data which will then return the machine learning model.
This blog will run through creating a simple machine learning model, and what the desired results are for an effective model. Azure Machine Learning Studio gives some sample datasets; however, you can also import your own datasets for specific problems.
This blog will use the sample dataset Iris Two Class Data dataset. This contains data on two classes of iris (flower), where it can predict the class of iris on just the sepal length, sepal width, petal length and petal width. This is a well-known test case used for many statistical classification techniques.
10 steps to the perfect predictive model
To begin the steps to predicting the class of Iris, a new project is created in the Azure Machine Learning Studio.
Click Blank Experiment to open a new blank experiment.
All of the datasets are located on the left, as well as analysis modules, and other analysis tools that can be carried out on the data to create a predictive model. The centre is where the modules will be dragged and dropped onto to build up the predictive model.
2. Under saved datasets and samples drag and drop the Iris Two Class Data dataset onto the experiment area.
Right clicking the module -> dataset -> Visualise to see the data within the dataset. This is a good time to see if anything unusual stands out in the data. Common thigs to look for here are missing values or hugely skewed numerical values.
3. This displays the dataset. It has 100 rows and 5 columns of data.
On the right displays some statistics, which can be displayed for each column when its clicked on. For petal-width the mean, min, max is shown and it also tells us if we have any missing values, which is important during the data preparation stage. In this data set there are no missing values in any columns.
Also displayed on the right-hand side is a graph, which can either be a bar chart, or a box plot. This is a bar chart of the different values in each column. Again, clicking through the columns will show the chart for each one. Here, in the class column, there are only two values, 1 and 0 – this is important to know when we are choosing the statistic method.
4. Data Manipulation
Now would be the point that any data manipulation would happen, however, this dataset is good to move on with, so we can move onto choosing the method to predict the outcome. As we are predicting two and only two outcomes (class 0 or class 1) we can use a two-class module. We will use two class logistic regression. On the left-hand panel, under Machine Learning -> Initialize Model -> Two-Class Logistic Regression. Drag and drop this onto the experiment area.
Each module has an input or output point. This is how they will be connected to create the entire model.
5. Test & Training
Next, the data is split into test and training set. The training set will be used to train the dataset using the two-class logistic regression on the portion of data that has been allocated. The testing data will allow us to test the data on independent data.
From the list of items, drag and drop the split data module from Data Transformation -> Sample and Split -> Split Data. Connect the output from the dataset to the input of the Split Data module. In the properties window on the right, we will split our data 70/30 where 70% will be used for training 30% will be used for testing. The left side output on the split data module is the trained set and the other is the test set.
6. Now we need a module to train the data.
Under Machine Learning -> Train -> Train Model. Drop this onto the experiment area. This has two inputs. Hovering over each input (and output) will give some information on what it is expecting. For the Train Model, the left input is the output from the Two Class Logistic Regression module. The right input Is the Trained set of data. Clicking on the Train Model module, the properties section we need to specify what column or feature we want to be predicted. Here, we want the class of Iris predicted. Click Launch Column Selector.
7. Here, we specify the column Class. At any point during this, we can select Run at the end of the screen to run the model. It’s not completed yet, so we can continue for now.
To score the trained set the output from the train model and the test set of data must now be input into a Score Model module. This is found under Machine Learning -> Score -> Score Model. When this is added to the experiment and the input attached, we can run the experiment to see the scoring of the test set.
8. Scoring of Test Data Set
Right clicking the Score Model -> Scored dataset -> Visualise. This will show the scoring of the test data set. There are two new columns, Scored Labels and Scored Probabilities. The Scored Probabilities shows the probability that the flower will belong to Class 1. So, looking at the first row, there is a 0.868 probability that the flower will belong to Class 1. The Scored Label column then predicts the class for each flower, again in row 1, this is Class 1.
Finally, to see how accurate our prediction model is, we can add an Evaluation Model to the experiment. This is found under Machine Learning -> Evaluate -> Evaluate Model. Add this to the experiment, join the input with the output from the score model. Run the experiment.
Right click the Evaluation model and Evaluation Results -> Visualise. This will display the accuracy etc of the model we have created.
For the perfect predictive model, the blue line on the ROC chart should be up the left side and along the of the chart – this is exactly how the chart is here. Also, we can see the number of false negatives and false positive, which here, there are none. This looks like a perfect predictive model.
There are a huge number of modules that this blog hasn’t touched. This is purely a guide to show how Azure Machine Learning works, how it can be used, and can be used without any coding at all. This can be deployed as a web service, and then implemented into a web API with .Net using an API Key.
To stay up to date with Dataworks Limited news and events, connect with us via the links below, or call us on 051 878 555 today: