IllustratorsLeak
3blue1brown
3blue1brown

patreon


New video! (Early view)

Hey everyone,

I've got a new video for you, part 1 of what has turned into a two-video project on neuron networks.  I was hoping to get this done by yesterday, since I usually try to post on Fridays, but things went a little long.  This means I'm now sharing with you a sneak peek of the video before publishing it.  Maybe I'll wait until next Friday, or maybe I'll do something earlier in the week, we'll see.

This is a good opportunity, though, for you to catch errors and give any feedback on what you might want changed or tweaked before the public version goes live.  Also, feel free to share what you'd like to see in the follow on video, though I may not incorporate all requests as there's been enough scope creep already :)

Thanks, both for the feedback and for the support!
-Grant

New video!  (Early view)

Comments

Nice video to get me thinking about Neural networks. Might be helpful to elaborate on why and how networks can be used for signal processing. Also, how might networks be used for information storage and mapping and problem solving.

Tom LaFleur

timestamp: 6:00

I think it's a little confusing to make the neurons either black or white in some shots and give them a certain gray color in others. All neurons hold a value between 0 and 1, right, and not a boolean?

Thanks so much! Great feedback.

3blue1brown

Ack. Speling erur! Search for "mabye".

Burt Humburg

Hi Grant, some thoughts: 1:41, it looks odd to have "learning" floating on its own without "structure" over the other video. Maybe put it inside the box? It might be the sort of thing to play around with and there's not better solution. Around 11:38, have we even discussed how many neurons should be in the second layer? Is that another parameter that makes hand-tuning the network even more implausible? -- oh wait, you say 16 neurons right afterwords. How did we decide that? Can we do more or less? And why only two hidden layers? 13:11 - the pi creature is surrounded by a darker rectangle and it looks weird. 14:49 - You assign to a in each loop iteration, and then return it. Shouldn't you append to or mutate a? Currently you're wasting the computation in all but the last iteration. 16:30ish "I've been a little flow"?? Might want to rerecord that line. Hope that helps. Keep up the great work!

Max Goldstein

This is awesome!

Hi Grant. You use red and green for the weights grid. I think red-green color blindness is pretty common, maybe even the most common form, and placing shades in proximity in a grid structure may not be the clearest illustration of contrast for some people. Otherwise- awesome!

Awesome!! Grant, you've done it again. Looking forward to the next video!

Nicholas Sterling

Or perhaps "from 0 to 9" would be better.

Nicholas Sterling

@59: "between 0 and 10" should be "between 0 and 9" (presumably).

Nicholas Sterling

Gasp! I was hoping you'd do a video on this for a while :) I can't wait to share with my club members ^.^ (when you officially release it on YouTube of course)

Great point, I'll throw in something about that.

3blue1brown

Thanks!

3blue1brown

This is a point that's really driven home once you see the proof for the "universality" of neural networks, in that they can approximate any function arbitrarily well. See for example Michael Nielsen's book on the topic (I think chapter 4 is where he covers this). I'll think on what good motivations there might be here.

3blue1brown

Right, definitely something worthy to bring up, but to do it right you'd want to show why nonlinearity is required. And for that, it's nice to set up by talking about separating data points with a line, which itself requires a little something for connecting the idea of recognizing digits to separating data points in a high-dimensional space. Not unworthy, and maybe worth digging into in a later video, but I ended up deciding it would add too much time to this video given what I was going for.

3blue1brown

Good catches, you are correct on the matrix indexing. I was somehow very careless in actually putting the symbols to that part.

3blue1brown

Hmm, I hoped the section on what the sigmoid function was doing was clear. Maybe I should have more concrete examples up on screen so that one can see how negative values end up in the range (0, 0.5)?

3blue1brown

:)

3blue1brown

I definitely plan to mention how ReLU works better for deeper networks at the end of the second video. Also, good point on how somewhere the arbitrariness of the specific choices should be brought up.

3blue1brown

If I would explain it I would show it with 2D input features and 2 neurons per layer. Then I would visualize the transformations and how the spaces are partitioned by the (hyper-) planes on the xor problem. I would animate the evolution of the partitions during backprop

That is debatable, I would say. Global/local minimum finding and the noisyness of the shape of the error/loss landscape in large neural networks is a very lively area of research. See this, for example: <a href="https://www.youtube.com/watch?v=bLqJHjXihK8&amp;feature=youtu.be" rel="nofollow noopener" target="_blank">Information Theory of Deep Learning. Naftali Tishby</a>. One of the findings seems to be that for larger networks and larger problem sets the overal landscape starts to approximate a bowl again, where simply rolling down-hill does not get you stuck in local minima. :)

* screams out of joy *

Richárd Nagyfi

Might be good to talk about how neural networks are really just finding local maxima/minima instead of global, tho u might have that in mind already for part 2

V

Great video! In terms of error-catching, at around 6:03, when talking about breaking numbers down into their geometric components, it looks like you misspelled 'maybe' as 'mabye.' Also, at the very end, and I could be totally wrong here, when talking about more efficient matrix notation around 14:09, if we have n input nodes, and k second-layer nodes, should the matrix not be k by n? The last row should be the k-th row, and the last column the n-th column, giving the bottom right element as the kn-th element? In the first row, the last column is n, but then you switch to k in the second. I think this only holds if there are as many second-layer nodes as first-layer nodes. Following from that, should the last bias not be b_k? My confusion could of course stem from a simple typo in the first column where you meant to write k instead of n, making it a n by k matrix, or from ignorance, in which case please ignore everything I just said.

Ben Granger

Awesome! I do have a question / clarification you may want to add: how do negative weights work if it's a weighted sum and the activation values are non negative? You talk about how you want the weighted sum to be bigger if the cell is darker with negative weights - can you elaborate more on how this math would work? A negative weight multiplied by a positive activation value would lower the sum, no? Thanks for all the great work.

Josh B.

This is very good. Great content and pacing. thanks.

Really good, as always. My only complaint is that I now have to wait for the next in the series ...

I enjoyed the video. It might be worth being explicit that different layers can use different non-linear functions (ReLU is common; along with softmax for the outputs). Also, I did feel that the network configuration felt a little arbitrary: you could be more clear that there's a lot of trial and error that goes into figuring out how many layers you need for a problem, and how many nodes in each.

Really cool topic! I enjoyed learning how some of your earlier videos on math topics connect to some other popular subjects in science/tech today.. Also for someone who is not super familiar with the topic the explanation (as typical for this channel) was top notch and did a nice job of approaching it in a way I hadn't thought about.

Optimizing for the YouTube search and notification algorithm is a science in itself. My guess is that it's related to that. Veritasium has made a video or two about the subject.

Grant! One of my all time favorites, way to go! I've spoken to people actually in this field of research who haven't explained it as well.

Jacob Mirra

Really good! I appreciate you mentioning the inner working the hidden layers. Really excited to see more videos about this topic :D

At 0:56, you say 'an output between zero and ten', but it would be a bit more connected to your visuals if you said 'an output between zero and nine'

Awesome! Only thing for me was it wasn't clear why the hidden layers were given 16 neurons each. Could be nice to say why that is -- if it's arbitrary, if it's something you tune, or if there's some way to know how many you should use there. Great video!

Happy times. The only error I noticed was at around one minute in you said the program would output a number between 0 and 10 (rather than 0 through 9)

Probably the best and most polished video introduction to nn I've seen (and I've watched a bunch). Only suggestion i would add is around your explanation of why we need a bias. You're explanation that it's necessary in this example in case we don't want the neuron to "light up" whenever the input is greater than zero might be a bit unclear. First of all if the input is just slightly greater than zero then the activation will only be slightly greater than .5, and you could just make the weights really small to get closer to zero input. My understanding of the necessity of bias is something I've never seen expressed but to me is the only satisfactory justification i could come up with for its existence. That is to simply decide what the output should be when all the weights are zero. Otherwise couldn't we accomplish any other goal by just adjusting weights? Thanks for a beautiful and lucid video!!

Looks good. I had to watch it a second time to see the issues already pointed out. Really enjoyed it. I subscribed to the IBL "International Brain Lab" and IBRO "International Brain Research Organization." AI is the big field. Thanks again for another excellent and current topic video!

Bill Russell

I just can't express myself how much I admire your videos!

Great visualizations and explanation! I'm not sure if this will confuse the viewers, but one thought I had was I think saying that sigmoid will squish the activation values into 0-1 might not make a lot of sense until you also add that the sigmoid is needed to make the entire neural network a nonlinear function and therefore capable of "learning".

I'm curious, why Fridays? Does it get more views?

Alexey Badalov

Just a typo: at 5:58, it should be "maybe" instead of "mabye".

Ooh, good catch, thanks! I guess I didn't look carefully enough at that one.

3blue1brown

One small error I noticed: At 14:20, the indices of the matrix and bias vector are inconsistent. The first row of the matrix goes from w_{0,0} to w_{0,n} while the following rows go up to n in the first index and up to k in the second. Also, the bias vector should have a different dimension from the input (unless both layers have the same number of neurons).


More Creators