# Plotly & Cost Function Visualizations | Machine Learning & Data Science Open-source Spotlight #4 – Hey, let’s learn how
to make faster and better 3D plots in Python. And while we’re at it, let’s
explore some Cost Functions for Machine Learning Models. Hello everybody my name is Dan, welcome to Machine Learning and Data
Science Open Source Spotlight. Today I’m gonna cover Plotly, which is an interactive
graphing package for Python. It’s based on JavaScript
very similar to Bokeh which I covered last week. But I like using Plotly
specifically for 3D plots. In addition today, I’m going
to analyze some cost functions for machine learning models,
and I’m going to use Plotly in order to visualize them
and understand them better. I’m going to work with
very simple data today. It’s a classic Linear Regression task, we want to predict the maximum
temperature for a certain day given the minimum
temperature of the same day. So if we plot it seems like
this, and you can clearly see we can fit a line here. So in machine learning, we
want to find the best line, the optimal line, meaning
the error should be minimum. So we choose a Cost
function or error function. In this case, the most popular one will be the mean square error. And we want to minimize this distance between the actual points and
the predictions of our model. I’m sure you can solve this easily, but let’s visualize this problem. So I’m defining the cost
function here, the MSE and I’m defining here the linear model. This will take an intercept and the slope, and we’ll apply it on all the data points to make predictions. And now I’m gonna do a
Brute-force grid search for a couple of intercept-slope pairs. So I’m gonna run between -200 and 200 with a step of 5. So I’m gonna take a look
of all these combinations, and I’m gonna try them
as my line parameters. I’m gonna calculate
the cost of these lines and let’s visualize the
pattern of this cost function. So as I said before, I’m choosing Plotly to do these 3D plots and
Matplotlib has a native solution, the MPL toolkits, I think it’s
slow and cumbersome to use and Plotly is a much better alternative. The main differences is in
how you construct the grid in order to plot, for example, in Matplotlib if you
want to make a 3D plot, you have to make some kind of mesh grid of your original X and Y axis
and then apply some function on it to get your corresponding Z values and then you pass X
and Y as these matrices and Z the actual value is
the result as an array. But this becomes a real problem
when Z can’t be computed from nice element-wise
operations, which are conveniently broadcasted by NumPy on these matrices. In Plotly, it works a bit differently and this is why I think the
API here is much better, because in Plotly, you
pass Z, the actual values as the mesh grid. So Z is a matrix containing
all the relevant values on the surface plot, so
you can actually plot the surface plot, just
with Z and you pass X and Y as the original values, just
as the ticks, just as the legend to the axis, so you can know
what were the original values. So this is much more
convenient, in my opinion, when that Z computed in a
rather more complicated way. Another nice feature here is,
this is a very nice touch in my opinion. And as with all plotting
libraries, you can update the layout of the figure
afterwards, change the title, the axis titles,
the shape of the figure and let’s look at the result. So you can see it was
computed very quickly. We can zoom in and out
and it’s very responsive and looks nice. We can see the Contour plot on the bottom, we can see it up as
well and we see the axis the corresponding slope and
intercept, and their combination and the resulting cost of this model. And so you see this tooltip
here, we can see for each intercept and slope,
what was the corresponding cost value for that model,
and we have, you see the green loop, so we
have some kind of equi- potential marker to see what
kind of other coordinates have the same value of the cost function. And we can see from this cost
function, that when we do Gradient Descent, if we
start from a random point of certain parameters, we
take steps in the direction of the gradient in order
to reach the minimum amount of cost possible, so for in
this case, it’s this point here, and this is the optimal solution, which has minimized the cost. As a bonus, let’s try and
visualize the cost function of a neural network. So, I designed, Mininet. Mininet (chuckles) is the
simplest neural network you can think of, because we can only
plot 3D charts like this when we have two parameters. So we are creatures,
which you can only see in three dimensions. So we need two dimensions as the parameters and one dimension as the cost. So we need a very simple neural network which has only two parameters. So what we do here, so I define Mininet. So we make one linear
operation on our inputs, we make an activation
transformation with Tanh. And then make another
linear transformation, I didn’t add a bias. So it’s not a perfect
network, but this is the best we can do with two parameters. Okay, so we repeat the same process, we are gonna compute
the cost for this model for Brute-force search on different pairs of the parameters available here. And then we’re gonna plot it with Plotly. And let’s take a look at the surface plot that is the result. So this looks quite a bit
different, it’s not the same bowl-like convex shape we
had for the linear model, we have two minima and you see
this green mark really helps to see that it’s the same value in both of these minimum points. And this is a very interesting result. It happens because of
the activation function. And you may know that
Saddle points are problems when optimizing neural
networks and we can really see it here, that for a lot of points in this cost function surface, there are points where
going in the direction of one parameter doesn’t change
the cost function at all. So a specific parameter
is in a local minima. And this is because the thresholding
effect of the Tanh. So for very negative values,
you will get -1 and for very positive
values, we get +1. And even if you change
the value a little bit, you still don’t change your result. So I think it’s nice to
see these kind of effects. And I’m a firm believer
of “seeing is believing”, and I hope it sheds a little bit of light on how the cost function
saddle points look like. And you can also see the
need for gradient clipping. So in many cases, we can
be in a certain point and very tiny step can throw
you off to a very big cost. And if the learning rate
or gradient clipping is not applied, you can get thrown off and have a lot of noise
in your optimization. So I think it’s nice to see
here in this very, very simple, neural network visualization. I hope you’ve enjoyed this video on Plotly and Cost
Function visualizations. If you’ve learned something