– Hey, let’s learn how

to make faster and better 3D plots in Python. And while we’re at it, let’s

explore some Cost Functions for Machine Learning Models. Hello everybody my name is Dan, welcome to Machine Learning and Data

Science Open Source Spotlight. Today I’m gonna cover Plotly, which is an interactive

graphing package for Python. It’s based on JavaScript

very similar to Bokeh which I covered last week. But I like using Plotly

specifically for 3D plots. In addition today, I’m going

to analyze some cost functions for machine learning models,

and I’m going to use Plotly in order to visualize them

and understand them better. I’m going to work with

very simple data today. It’s a classic Linear Regression task, we want to predict the maximum

temperature for a certain day given the minimum

temperature of the same day. So if we plot it seems like

this, and you can clearly see we can fit a line here. So in machine learning, we

want to find the best line, the optimal line, meaning

the error should be minimum. So we choose a Cost

function or error function. In this case, the most popular one will be the mean square error. And we want to minimize this distance between the actual points and

the predictions of our model. I’m sure you can solve this easily, but let’s visualize this problem. So I’m defining the cost

function here, the MSE and I’m defining here the linear model. This will take an intercept and the slope, and we’ll apply it on all the data points to make predictions. And now I’m gonna do a

Brute-force grid search for a couple of intercept-slope pairs. So I’m gonna run between -200 and 200 with a step of 5. So I’m gonna take a look

of all these combinations, and I’m gonna try them

as my line parameters. I’m gonna calculate

the cost of these lines and let’s visualize the

pattern of this cost function. So as I said before, I’m choosing Plotly to do these 3D plots and

Matplotlib has a native solution, the MPL toolkits, I think it’s

slow and cumbersome to use and Plotly is a much better alternative. The main differences is in

how you construct the grid in order to plot, for example, in Matplotlib if you

want to make a 3D plot, you have to make some kind of mesh grid of your original X and Y axis

and then apply some function on it to get your corresponding Z values and then you pass X

and Y as these matrices and Z the actual value is

the result as an array. But this becomes a real problem

when Z can’t be computed from nice element-wise

operations, which are conveniently broadcasted by NumPy on these matrices. In Plotly, it works a bit differently and this is why I think the

API here is much better, because in Plotly, you

pass Z, the actual values as the mesh grid. So Z is a matrix containing

all the relevant values on the surface plot, so

you can actually plot the surface plot, just

with Z and you pass X and Y as the original values, just

as the ticks, just as the legend to the axis, so you can know

what were the original values. So this is much more

convenient, in my opinion, when that Z computed in a

rather more complicated way. Another nice feature here is,

you can add a Contour plot to your Surface plot to

get even more information out of your charts and

this is a very nice touch in my opinion. And as with all plotting

libraries, you can update the layout of the figure

afterwards, change the title, the axis titles,

the shape of the figure and let’s look at the result. So you can see it was

computed very quickly. We can zoom in and out

and it’s very responsive and looks nice. We can see the Contour plot on the bottom, we can see it up as

well and we see the axis the corresponding slope and

intercept, and their combination and the resulting cost of this model. And so you see this tooltip

here, we can see for each intercept and slope,

what was the corresponding cost value for that model,

and we have, you see the green loop, so we

have some kind of equi- potential marker to see what

kind of other coordinates have the same value of the cost function. And we can see from this cost

function, that when we do Gradient Descent, if we

start from a random point of certain parameters, we

take steps in the direction of the gradient in order

to reach the minimum amount of cost possible, so for in

this case, it’s this point here, and this is the optimal solution, which has minimized the cost. As a bonus, let’s try and

visualize the cost function of a neural network. So, I designed, Mininet. Mininet (chuckles) is the

simplest neural network you can think of, because we can only

plot 3D charts like this when we have two parameters. So we are creatures,

which you can only see in three dimensions. So we need two dimensions as the parameters and one dimension as the cost. So we need a very simple neural network which has only two parameters. So what we do here, so I define Mininet. So we make one linear

operation on our inputs, we make an activation

transformation with Tanh. And then make another

linear transformation, I didn’t add a bias. So it’s not a perfect

network, but this is the best we can do with two parameters. Okay, so we repeat the same process, we are gonna compute

the cost for this model for Brute-force search on different pairs of the parameters available here. And then we’re gonna plot it with Plotly. And let’s take a look at the surface plot that is the result. So this looks quite a bit

different, it’s not the same bowl-like convex shape we

had for the linear model, we have two minima and you see

this green mark really helps to see that it’s the same value in both of these minimum points. And this is a very interesting result. It happens because of

the activation function. And you may know that

Saddle points are problems when optimizing neural

networks and we can really see it here, that for a lot of points in this cost function surface, there are points where

going in the direction of one parameter doesn’t change

the cost function at all. So a specific parameter

is in a local minima. And this is because the thresholding

effect of the Tanh. So for very negative values,

you will get -1 and for very positive

values, we get +1. And even if you change

the value a little bit, you still don’t change your result. So I think it’s nice to

see these kind of effects. And I’m a firm believer

of “seeing is believing”, and I hope it sheds a little bit of light on how the cost function

saddle points look like. And you can also see the

need for gradient clipping. So in many cases, we can

be in a certain point and very tiny step can throw

you off to a very big cost. And if the learning rate

or gradient clipping is not applied, you can get thrown off and have a lot of noise

in your optimization. So I think it’s nice to see

here in this very, very simple, neural network visualization. I hope you’ve enjoyed this video on Plotly and Cost

Function visualizations. If you’ve learned something

new, please help me reach more people by liking, commenting or sharing this video. I do this completely for free. And the only thing I want to

achieve here is reaching out to more people getting to

know more data professionals and see how I can

contribute in this field. I do this every week. So I hope you will follow

and I’ll see you then.