IllustratorsLeak
3blue1brown
3blue1brown

patreon


Backpropagation early view, and a question

Hey everyone!

I have for you a draft for the upcoming backpropagation video.  I have not yet put in music, and I held off on a small part on the end, because I wanted to ask you guys your thoughts on a certain structural question before more editing.

This video loosely has two parts, one that gives an intuitive overview for what the backpropagation actually does, mechanistically but not symbolically, while the second half dives into how it's all represented in terms of partial derivatives and such.  After putting it all together, I'm wondering if it might be better to divide this into two separate videos.

The only downside I can think of is that when going through the calculus part, it's nice to have the intuitive picture for what's supposed to happen as fresh in ones mind as possible.  But realistically I suppose most people watching part 4 of a series will be coming straight from part 3.

Thoughts?
-Grant

Backpropagation early view, and a question

Comments

Great point. As you say, it's a little cumbersome to introduce, but perhaps once I have some videos on information theory that can be something worth mentioning.

3blue1brown

Explaining backprop very easy for me to understand. Thanks. If you have more unlocked video regarding deep learning like CNN please inform me.

Just noting something which you are probably already aware - real neural nets are often trained with cross-entropy rather than mean-squared error. Cross entropy may be more difficult to understand/explain how it is derived, but the calculation is actual simpler and the result is intuitive.

All made sense to me (I loved how you showed the different influences by moving stuff around) - BUT the actual learning (ie how to change the weights and biases) is still only addressed theoretically.... how about you actually work a _simple_ example (or promise to do so in the next video). Or did I miss something?

You take a tangent into neuroscience around the six-minute mark. If you're worried about length, maybe cut that and spin it into a side video, like you did for 256-bit encryption, perhaps partnering with a real neuroscientist.

Max Goldstein

How about 2 separate than a quickie 3rd that just shows the intuitive outcome with the calculus formula below it?

seerpea

I'm going to echo Jason's recommendation to have a "part 1.5" (I've just started readign Knuth, and he'd think it's important). If you don't, though, I'd keep it a single video. Your subs won't watch them one-after-another unless you post them at the same time, at which point you may as well just post a single video.

Okuno Zankoku

I have studied backpropogation before and this video series has been excellent, but the actual calculus becomes very heavy. Therefore I agree a concrete example would help and that would force a second video. Funny enough I watched it in two parts as well due to my train arriving this morning and it still flowed well in two parts.

I'd say split it into 2 videos. I had to rewatch the calculus half before it totally made sense, and splitting it into 2 videos would make it easier to review. Plus I feel like the pause between videos would improve the pacing of the whole lesson. But I think they should still be released at the same time.

Kevin Strehl

The part I liked the most was between 3 to 6 minutes when you were explaining Back prop. My 1 patreon dollar was well spent. Good job! I like it. About 15 to 17 minutes into the video my eyes start glazing over because you're talking over my head and I can't keep up with what you're saying and you start sounding like charlie brown's teacher: wah waahh wah wah. In my opinion you should trim down on the calculus walkthrough and stick to what you're best at which is the graphics and intuition for non Mathematics people. The length is a bit too long I think, try to keep it down to below 20 minutes. You're losing me to the technical details. I might even choose to separate out the Calculus walkthrough to its own video, so that you can be more assertive that: "If you're not pursuing masters to PHD level mathematics, this video will only be noise you to and you're going to be unhappy after being bashed over the head with walls of glyphs you find lining king tut's 3600 year old casket.

Part 1 of this video was super-helpful in clarifying some longstanding confusions I've had with backprop, particularly in understanding where the overall nudge direction comes from for middle-layer neurons. Thanks for that. Part 2 is nice as well in breaking down how the chain rule applies--though I do find myself getting lost in the symbols and indices. What I think is still missing--and what I've never seen anyone do in any other backprop explanation I've ever encountered--is a "part 1.5" that (at least for a very small, toy network) cranks through the actual arithmetic of it all, with actual values for the weights, etc. Even if it's just a 3 neuron network (2 input neurons and 1 output neuron, no hidden layers), I think that would be helpful in making it all much more concrete before diving into the symbol-laden chain rule business. So often, people learn best from examples and I feel that's what is still missing here: an example-with-numbers to make backprop concrete before going to the full symbolic complexity of the thing.

jason black

also, a financial question (just because i'm poor but still want to support you): if you split in two, am i patronizing twice? i think another consideration may help in answering whether to split: if you split the videos up, would each have roughly the same as the average informational content of all others? it seems to me that just the intuitive explanation of backprop does not...but that the combined intuition + mechanics/"the math" does.

At 6:00 you mention a theory from neuroscience. Maybe spell that out? I hadn't heard of it before and at least just from hearing you pronounce it I did not manage to spell it correctly to be able to look it up. Other than that I'd keep it in one video. Splitting it up invites people to avoid the maths bits—which isn't really what we want them to, is it?

One question that I had is that I don't understand why the short batches are necessary. Computing averages of lange quantities of data is O(n), so why bother "paging" the training samples?

Liz Av

I think either way is good. I personally prefer two videos, but they do work as one.

Liz Av

I suggest a single video.

Keep the videos together. Only mention your calculus course once (at the beginning of the second part). at 15:22 the sequence of the products of the partial derivatives are usually the other way around.

I haven't watched the video yet, but I found this to be a very good and intuitive, slightly mathy explanation: <a href="http://colah.github.io/posts/2015-08-Backprop/" rel="nofollow noopener" target="_blank">http://colah.github.io/posts/2015-08-Backprop/</a>

I also should mention that I lost track of what z meant partway through the multi-weight explanation (with a_jk^(L)). I realize that you have z() onscreen, but I was confused by the ellipses.

Ben Visness

Please keep it together in one video. During the math-heavy portion, I found myself referring back to earlier parts of the video where you covered the intuitive stuff. I know it might get long, but I think it’s best to keep the intuitive and the rigorous explanations in the same video.

Ben Visness

I think that it feels more natural to keep them together. Some chapters are harder than others, but that doesn't strike me as a great argument to split them up. I prefer a 'concept' to be contained in the same video; there may be an intuitive and a symbolic representation of the concept..but they belong together.

If the current length of 21 minutes will not change that much I vote for 1 video to preserve continuity. If you are finding that you want/need to add another 10 minutes to fully explain then I think breaking into 2 is reasonable.

It would be great if you could mention at least in passing that backpropagation doesn't depend critically on the specific form of the neural network, but can actually be used to compute the gradient of pretty much any function you can program. The more general name for the technique is "reverse-mode automatic differentiation". That would clear up a common misconception.

I like how you approached it. I knew all of this before so I ask myself what about a person that doesn't I would separate the calculus. The fact that gradient decent is an algorithm to get to a local minima is incidental to the mechanics of getting to the local minima. Another approach might be to say, what would you do if you did not know about calculus or gradients. Could you still build a neural net? The fact that negative gradients point in the direction of steepest descent is a good optimization on top of the algorithm. For example, backprop in the simplest case just makes sense. Once you accept backprop then the question is what's the best value to change errors by. Then you end up at gradient decent pretty naturally.

I watched the video several hours ago. I would like to go over the 2nd part that contains the math several more times. So it would be great if it would be separate, but a timestamp to jump to the 2nd part in the description would help as well.

Break it up for sure. I love the video. But it's quite long for the complexity of the topic. A break between the intuitive and maths explanations will give people with little to no background knowledge. A way to think about and digest things in your famously well done intuitive sense and forcing a break on them to really take it in before diving head first into the calculus. As well as what someone above said, it allows easier lookup for quick refresher later, you can directly click on a video that focuses on intuitive or maths examples based on what that person needs a brush up on and how.

Giovanni Viscardi

I would keep both parts together. I don't thin it matters too much, but it is narratively easier and less daunting to move from an overview to the actual computations straight away. I also agree with Philip, I think a full computation might make things clearer. The full-on symbolic explanation was great, but I think a few more minutes of explanation, plus an example, could help.

Edan Maor

I think you should do a full computation of a backpropogation of a small neural network (for example 2 hidden layers with 3 neurons each) by hand. For example, a network that takes 3 numbers and is supposed to return (0,0,0). You should also consider spending more time defining each variable, pairing each one visually with what it represents from the network.

I also think it would be better to break it up, provided you upload both parts at the same time. Then most viewers would still watch both parts straight one after the other, but the material would be easier to review in the future, and less taxing on the shorter attention spans.

Andrii Zakharov

I always prefer videos to be broken to smaller pieces if possible (usually it takes more time from the creator). In this case in particular. If the video will be split to parts, one intuitive overview and the other one is focused on the calculus, it's easier for the viewer (me in this case) later to find the part I need to refresh my memory on. All the while not harming the first viewing experience in which I ( and I believe most ) will watch both parts one after the other if they are released together.

One reason I can think of where breaking it up into two parts will help is when someone wants to revisit just 1 part of the video(either the intuitive or mathematical). They would know part-3 is more intuitive and part-4 is more mathematical.

Single videos allow for maximum immersion. Having a continuous narrative for some of these concepts might be better than breaking it up. When you break it up, you need to conclude on the first one part and allude to the second and so on which breaks the continuity imo.

I agree.

Don Sanderson

You can never give too much explanation or graphics when providing a proof. Also if someone does skip ahead you always note at the beginning this is part x of y. So if they already know the subject they continue, or continue then go back to the base, or go straight to the base. You can never provide too much. But you're building on a tree. Don't worry, I'm always promoting your youtube channel to everyone I can. Break it up! With knowledge more is always better.

Bill Russell


More Creators