3blue1brown

3blue1brown

New video! (Early view)

Added 2017-09-30 21:28:24 +0000 UTC

Hey everyone,

I've got a new video for you, part 1 of what has turned into a two-video project on neuron networks. I was hoping to get this done by yesterday, since I usually try to post on Fridays, but things went a little long. This means I'm now sharing with you a sneak peek of the video before publishing it. Maybe I'll wait until next Friday, or maybe I'll do something earlier in the week, we'll see.

This is a good opportunity, though, for you to catch errors and give any feedback on what you might want changed or tweaked before the public version goes live. Also, feel free to share what you'd like to see in the follow on video, though I may not incorporate all requests as there's been enough scope creep already :)

Thanks, both for the feedback and for the support!
-Grant

New video! (Early view)

Comments

Nice video to get me thinking about Neural networks. Might be helpful to elaborate on why and how networks can be used for signal processing. Also, how might networks be used for information storage and mapping and problem solving.

Tom LaFleur

2017-10-07 13:13:36 +0000 UTC

timestamp: 6:00

2017-10-05 15:19:12 +0000 UTC

I think it's a little confusing to make the neurons either black or white in some shots and give them a certain gray color in others. All neurons hold a value between 0 and 1, right, and not a boolean?

2017-10-05 09:11:11 +0000 UTC

Thanks so much! Great feedback.

3blue1brown

2017-10-04 21:05:22 +0000 UTC

Ack. Speling erur! Search for "mabye".

Burt Humburg

2017-10-04 10:57:03 +0000 UTC

Hi Grant, some thoughts: 1:41, it looks odd to have "learning" floating on its own without "structure" over the other video. Maybe put it inside the box? It might be the sort of thing to play around with and there's not better solution. Around 11:38, have we even discussed how many neurons should be in the second layer? Is that another parameter that makes hand-tuning the network even more implausible? -- oh wait, you say 16 neurons right afterwords. How did we decide that? Can we do more or less? And why only two hidden layers? 13:11 - the pi creature is surrounded by a darker rectangle and it looks weird. 14:49 - You assign to a in each loop iteration, and then return it. Shouldn't you append to or mutate a? Currently you're wasting the computation in all but the last iteration. 16:30ish "I've been a little flow"?? Might want to rerecord that line. Hope that helps. Keep up the great work!

Max Goldstein

2017-10-03 02:35:28 +0000 UTC

This is awesome!

2017-10-03 02:17:55 +0000 UTC

Hi Grant. You use red and green for the weights grid. I think red-green color blindness is pretty common, maybe even the most common form, and placing shades in proximity in a grid structure may not be the clearest illustration of contrast for some people. Otherwise- awesome!

2017-10-02 05:21:38 +0000 UTC

Awesome!! Grant, you've done it again. Looking forward to the next video!

Nicholas Sterling

2017-10-01 22:58:57 +0000 UTC

Or perhaps "from 0 to 9" would be better.

Nicholas Sterling

2017-10-01 22:43:18 +0000 UTC

@59: "between 0 and 10" should be "between 0 and 9" (presumably).

Nicholas Sterling

2017-10-01 22:41:49 +0000 UTC

Gasp! I was hoping you'd do a video on this for a while :) I can't wait to share with my club members ^.^ (when you officially release it on YouTube of course)

2017-10-01 21:36:18 +0000 UTC

Great point, I'll throw in something about that.

3blue1brown

2017-10-01 20:23:44 +0000 UTC

Thanks!

3blue1brown

2017-10-01 20:23:24 +0000 UTC

This is a point that's really driven home once you see the proof for the "universality" of neural networks, in that they can approximate any function arbitrarily well. See for example Michael Nielsen's book on the topic (I think chapter 4 is where he covers this). I'll think on what good motivations there might be here.

3blue1brown

2017-10-01 20:23:19 +0000 UTC

Right, definitely something worthy to bring up, but to do it right you'd want to show why nonlinearity is required. And for that, it's nice to set up by talking about separating data points with a line, which itself requires a little something for connecting the idea of recognizing digits to separating data points in a high-dimensional space. Not unworthy, and maybe worth digging into in a later video, but I ended up deciding it would add too much time to this video given what I was going for.

3blue1brown

2017-10-01 20:21:07 +0000 UTC

Good catches, you are correct on the matrix indexing. I was somehow very careless in actually putting the symbols to that part.

3blue1brown

2017-10-01 20:16:48 +0000 UTC

Hmm, I hoped the section on what the sigmoid function was doing was clear. Maybe I should have more concrete examples up on screen so that one can see how negative values end up in the range (0, 0.5)?

3blue1brown

2017-10-01 20:16:08 +0000 UTC

:)

3blue1brown

2017-10-01 20:15:16 +0000 UTC

I definitely plan to mention how ReLU works better for deeper networks at the end of the second video. Also, good point on how somewhere the arbitrariness of the specific choices should be brought up.

3blue1brown

2017-10-01 20:14:10 +0000 UTC

If I would explain it I would show it with 2D input features and 2 neurons per layer. Then I would visualize the transformations and how the spaces are partitioned by the (hyper-) planes on the xor problem. I would animate the evolution of the partitions during backprop

2017-10-01 17:43:22 +0000 UTC

That is debatable, I would say. Global/local minimum finding and the noisyness of the shape of the error/loss landscape in large neural networks is a very lively area of research. See this, for example: <a href="https://www.youtube.com/watch?v=bLqJHjXihK8&feature=youtu.be" rel="nofollow noopener" target="_blank">Information Theory of Deep Learning. Naftali Tishby</a>. One of the findings seems to be that for larger networks and larger problem sets the overal landscape starts to approximate a bowl again, where simply rolling down-hill does not get you stuck in local minima. :)

2017-10-01 13:23:43 +0000 UTC

* screams out of joy *

Richárd Nagyfi

2017-10-01 10:54:40 +0000 UTC

Might be good to talk about how neural networks are really just finding local maxima/minima instead of global, tho u might have that in mind already for part 2

V

2017-10-01 09:16:34 +0000 UTC

Great video! In terms of error-catching, at around 6:03, when talking about breaking numbers down into their geometric components, it looks like you misspelled 'maybe' as 'mabye.' Also, at the very end, and I could be totally wrong here, when talking about more efficient matrix notation around 14:09, if we have n input nodes, and k second-layer nodes, should the matrix not be k by n? The last row should be the k-th row, and the last column the n-th column, giving the bottom right element as the kn-th element? In the first row, the last column is n, but then you switch to k in the second. I think this only holds if there are as many second-layer nodes as first-layer nodes. Following from that, should the last bias not be b_k? My confusion could of course stem from a simple typo in the first column where you meant to write k instead of n, making it a n by k matrix, or from ignorance, in which case please ignore everything I just said.

Ben Granger

2017-10-01 08:00:39 +0000 UTC

Awesome! I do have a question / clarification you may want to add: how do negative weights work if it's a weighted sum and the activation values are non negative? You talk about how you want the weighted sum to be bigger if the cell is darker with negative weights - can you elaborate more on how this math would work? A negative weight multiplied by a positive activation value would lower the sum, no? Thanks for all the great work.

Josh B.

2017-10-01 06:27:46 +0000 UTC

This is very good. Great content and pacing. thanks.

2017-10-01 04:11:15 +0000 UTC

Really good, as always. My only complaint is that I now have to wait for the next in the series ...

2017-10-01 04:08:24 +0000 UTC

I enjoyed the video. It might be worth being explicit that different layers can use different non-linear functions (ReLU is common; along with softmax for the outputs). Also, I did feel that the network configuration felt a little arbitrary: you could be more clear that there's a lot of trial and error that goes into figuring out how many layers you need for a problem, and how many nodes in each.

2017-10-01 03:59:50 +0000 UTC

Really cool topic! I enjoyed learning how some of your earlier videos on math topics connect to some other popular subjects in science/tech today.. Also for someone who is not super familiar with the topic the explanation (as typical for this channel) was top notch and did a nice job of approaching it in a way I hadn't thought about.

2017-10-01 03:57:21 +0000 UTC

Optimizing for the YouTube search and notification algorithm is a science in itself. My guess is that it's related to that. Veritasium has made a video or two about the subject.

2017-10-01 03:49:03 +0000 UTC

Grant! One of my all time favorites, way to go! I've spoken to people actually in this field of research who haven't explained it as well.

Jacob Mirra

2017-10-01 03:00:50 +0000 UTC

Really good! I appreciate you mentioning the inner working the hidden layers. Really excited to see more videos about this topic :D

2017-10-01 01:06:59 +0000 UTC

At 0:56, you say 'an output between zero and ten', but it would be a bit more connected to your visuals if you said 'an output between zero and nine'

2017-09-30 22:57:08 +0000 UTC

Awesome! Only thing for me was it wasn't clear why the hidden layers were given 16 neurons each. Could be nice to say why that is -- if it's arbitrary, if it's something you tune, or if there's some way to know how many you should use there. Great video!

2017-09-30 22:43:37 +0000 UTC

Happy times. The only error I noticed was at around one minute in you said the program would output a number between 0 and 10 (rather than 0 through 9)

2017-09-30 22:41:33 +0000 UTC

Probably the best and most polished video introduction to nn I've seen (and I've watched a bunch). Only suggestion i would add is around your explanation of why we need a bias. You're explanation that it's necessary in this example in case we don't want the neuron to "light up" whenever the input is greater than zero might be a bit unclear. First of all if the input is just slightly greater than zero then the activation will only be slightly greater than .5, and you could just make the weights really small to get closer to zero input. My understanding of the necessity of bias is something I've never seen expressed but to me is the only satisfactory justification i could come up with for its existence. That is to simply decide what the output should be when all the weights are zero. Otherwise couldn't we accomplish any other goal by just adjusting weights? Thanks for a beautiful and lucid video!!

2017-09-30 22:36:35 +0000 UTC

Looks good. I had to watch it a second time to see the issues already pointed out. Really enjoyed it. I subscribed to the IBL "International Brain Lab" and IBRO "International Brain Research Organization." AI is the big field. Thanks again for another excellent and current topic video!

Bill Russell

2017-09-30 22:32:04 +0000 UTC

I just can't express myself how much I admire your videos!

2017-09-30 22:14:06 +0000 UTC

Great visualizations and explanation! I'm not sure if this will confuse the viewers, but one thought I had was I think saying that sigmoid will squish the activation values into 0-1 might not make a lot of sense until you also add that the sigmoid is needed to make the entire neural network a nonlinear function and therefore capable of "learning".

2017-09-30 22:13:30 +0000 UTC

I'm curious, why Fridays? Does it get more views?

Alexey Badalov

2017-09-30 22:10:51 +0000 UTC

Just a typo: at 5:58, it should be "maybe" instead of "mabye".

2017-09-30 21:59:56 +0000 UTC

Ooh, good catch, thanks! I guess I didn't look carefully enough at that one.

3blue1brown

2017-09-30 21:51:38 +0000 UTC

One small error I noticed: At 14:20, the indices of the matrix and bias vector are inconsistent. The first row of the matrix goes from w_{0,0} to w_{0,n} while the following rows go up to n in the first index and up to k in the second. Also, the bias vector should have a different dimension from the input (unless both layers have the same number of neurons).

2017-09-30 21:50:39 +0000 UTC

More Creators

Wojtek Fus

Wojtek Fus

gumroad

eroneko

eroneko

fanbox

Littlebabycesar

Littlebabycesar

patreon

omiart

omiart

patreon

Sovoy

Sovoy

boosty

Ushiomikaze

Ushiomikaze

fanbox

Skashi

Skashi

patreon

brycecarringto5

brycecarringto5

patreon

Little Fern

Little Fern

patreon

Hitotsumami

Hitotsumami

patreon

teeaboo

teeaboo

patreon

HMC

HMC

patreon

Quazii

Quazii

patreon

MAMOBOT

MAMOBOT

patreon

CrankoWanko

CrankoWanko

patreon

kei-sasuga

kei-sasuga

fanbox

酒乱にゃま

酒乱にゃま

fantia

RaidenMikasa

RaidenMikasa

patreon

Banshou

Banshou

patreon

VRPINGPING

VRPINGPING

patreon

True Crime Couple

True Crime Couple

patreon

SrPoolStrange

SrPoolStrange

patreon

ezysummerscosplay

ezysummerscosplay

patreon

AlphaTsundere

AlphaTsundere

patreon

hydexreator

hydexreator

patreon

ralic_turman

ralic_turman

patreon

léa

léa

gumroad

あんきもザウルス

あんきもザウルス

fanbox

auramargaret

auramargaret

patreon

kooooogami

kooooogami

fanbox

tamergame

tamergame

patreon

Z2G

Z2G

gumroad

パンケーキ

パンケーキ

fanbox

pquagga

pquagga

fanbox

paperwaifu

paperwaifu

patreon

illystray

illystray

boosty

MyMyMind

MyMyMind

fanbox

JTBellyComics

JTBellyComics

patreon

AkikoDrawing

AkikoDrawing

patreon

Moebell

Moebell

gumroad