why are so many problems linear and how would one solve nonlinear problems?

image

i am taking a deep learning in python class this semester and we are basically doing linear algebra.

Last lecture we "invented" linear regression with gradient descent (did least squares the lecture before) from scratch where we talked about defining hypotheses, Loss function, cost function etc.

I got 2 Questions: How does it come that many problems can be looked at as a linear Problem and are basically "just about trying to find a solution for the Equation Ax = b" ? Doing that can be done by stuff like least squares or training a neutral network to find one.

I feel like, "in the real world", most problems are not linear at all. How would one tackle those problems as linear algebra does only apply to linear functions?

@EdM is, of course, correct. And those transformations are very, very flexible.

But let's take a really simple case. One independent variable, one dependent one. And a straight line for the fit (no transformations).

First, it's not a dichotomy between cases where this fits and where it doesn't. Sometimes, this simple straight line is a very good fit to the data; a lot of physics problems are like this. Sometimes this straight line is a terrible fit: Take anything which is sinusoidal, just as one case. If $y = sin(x)$ then a straight line will not work at all.

More often, though, it's sort of an OK fit. Remember, as George Box said "all models are wrong, but some are useful." Even in those physics problems the straight line will ignore some issues (e.g. friction, air resistance, whatever). In other cases, there will be a lot of error in the model, and a better fit would be obtained with a more complex model.

A lot of the art and science of data analysis is figuring out how much complexity is "worth". Should we model a transformation? If so, just a quadratic? Or a spline? Perhaps a fractional polynomial. Maybe we need control variables. Moderators. Mediators. Etc.

Or maybe the straight line is enough.

In my view, this isn't a purely statistical question. We have to consider the context. Again, for me, this is what made being a statistical consultant fun.

As for how one tackles such problems, well, what I do is try to figure out what makes sense. Computers make this playing easy. But I try to be careful to not torture the data too much -- and there are ways to avoid that, too.

You’re right: it’s quite an assumption that the world is so simple that it can be modeled with lines, planes, and hyperplanes. But, the…

STONE-WEIERSTRASS THEOREM

…says that, technicalities aside, “decent” functions can be approximated arbitrarily well by polynomials. If you’ve gone far enough in linear algebra, you know that complicated polynomials like $wxz-x^7y^9-wz^2+9w^5x^3yz^8$ can be viewed as linear combinations of basis elements of a vector space. This gives a way to express that polynomial as a dot product of a vector of basis elements and a vector of weights. Across multiple data points, that becomes the familiar $X\beta$ from linear regression.

This is not limited to polynomials. Any linear combination (weighted sum/difference) of functions of the original data can be represented as a dot product. Fourier series can be represented this way to obtain periodicity in the regression fit. Splines can model curvature and can have advantages over polynomials in doing so. You can interact functions of single variables with something like $\sin(x_1)\cos(x_2)$.

Overall, that seemingly simple formulation of linear regression as $X\beta$ can model an enormous amount of complicated behavior.

Despite what they might tell you, it's not that "so many problems are linear", it's that we settle for linear approximations when the nonlinear versions are too difficult. They don't necessarily answer the exact question we want to ask, but we settle for them because the exact questions we want to ask are too difficult to answer.

For example, with your own example of linear regression, often the quantity you're actually interested in minimizing is absolute deviation (L1 norm) rather than squared deviations (L2 norm). But whereas squared deviations are trivial to minimize (just set the derivative equal to zero and you get a linear equation with a unique solution), absolute deviations don't give you an easy-to-solve equation when you differentiate them. So, traditionally, people have given up and settled for squared deviations.

Of course, nowadays, with computers and better algorithms, minimizing absolute deviations is easier than it used to be, but it's still hard to do, and depending on your particular problem you are still likely to settle for squared deviations.

Ask AI
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70