Machine Learning to the rescue! — Pilot episode
--
Probably based on true stories…..
The cast:
- Alice, experienced technical project manager
- Bob, senior dev, black belt in several framework and languages
- Cody, junior dev, pragmatic and avid learner
INT meeting room 2:
(Alice is about to present to her dev team a new project with a new customer)
Alice: ok guys, we have a new project for a new customer from real estate… they want a web app on their Intranet to suggest a price sell for a house, based on some characteristics. They are not happy with other solutions, so they hope to have a better result with us.
Bob (looking at Alice): sounds like another domain based algorithm to implement. Alice, do we have more info and some test case examples?
Alice: well, they gave us a lot of examples — thousands! but they have nothing formal to establish the price: it’s “gut feeling”, done by senior vendors. They said they focus on how big the house is, its overall condition and things like that, but they want something more “automatic” to obtain similar prices, like in the given examples.
Bob (smiling): bunch of amateurs, I’m sure I can write a super optimized algorithm beating their gut approach
Cody: mmm, I have a bad feeling about this
Bob: don’t worry Cody, I can code anything
Alice: well, let’s start. Demo in two weeks!
INT Bob’s and Cody’s workstations:
(Bob, staring at the csv file containing 5667 rows and 87 columns…)
Bob (thinking…): mmm, ok let’s start bottom up, writing some code and choosing the most important parameters…le’t take square feet, number of rooms, year of construction and overall condition. Should be enough for a first alpha…I’ll prepare some tests to see if the difference between my price and the evaluated one is close enough…
………………………(a couple of days and dozens of if..then..else..switch later)……………………
Bob (talking to Cody): damn, I can’t find how to code this..every time I made some changes, I improve for some house types and make worse for other types. Any idea?
Cody (closing his notebook): I think we have to change completely approach: I made some researches in these days and I read something about Machine Learning, where you use existing data to obtain a way to evaluate new data, without coding the algorithm.
Bob: uh yes, I read something too, but always thought related to computer vision or other esoteric stuff and frankly I can’t see how it can help.
Cody: it’s no esoteric, it’s applied math and some techniques are two centuries old, like regression…I think we can try to apply to this problem!
Bob (doubtful): and how it works exactly?
Cody: well, imagine the price is the result of a certain function where the input are the other house attributes. You use some data, the training data, to find this function and other data to test how good is it.
Bob: ok, but how this function is found?
Cody: you can start to use a linear function where you write the price… ehm in function… of other parameters: this function will have coefficients, one for every parameter, so the goal is to find the best working combination of these coefficients.
Bob (still doubtful): what does this mean?
Cody: you have the input and the prices too — this is called “supervised learning” — so you can check the difference between the price you have and the price evaluated using a specific set of coefficients of this function… and you do this for all the records — called observations — of the training data, summing up all the errors.
The goal is to reduce this sum as much as possible. So the regression algorithm does what you tried to do manually, when you changed the code to find the best solution. But in this case, the algorithm tunes the coefficients and there’s no code: a certain configuration — let’s call it model — is the output
Bob (sketching a line passing among a lot of points): so basically you try to find a straight line with a certain inclination to fit best as possible the input points we have?
Cody (looking at the picture): yes, the intuition is correct but it’s not a straight two dimensional line, because we have 86 parameters here, so it’s something we cannot visualize. And, if works better, you can use a quadratic or higher order function instead of a linear one, so you’ll have “curves” too, but the concept is the same
Bob: and how this reduction is done?
Cody: it’s something related to finding the local minima of a convex function, the cost function, using specific algorithm like Gradient Descent who does partial derivatives with respect of…
Bob (interrupting Cody): ok, ok stop it please…I’m curious but not until this point..If it works, it’s ok for me
Cody: last thing, if the model works very well on the training data it’s not automatic it will work well always, because it could be not good enough to evaluate the price of new houses. This is called “overfitting”: that’s the reason you use other data, the test data, not used to build the model, just to check if this is true or not. The opposite situation is called “underfitting”, meaning the model does not work well even with the training data, because the sum of all errors is very high.
Bob: so you have to find a tradeoff..
Cody (smiling at Bob): exactly! this is called the bias/variance tradeoff! And it’s all measurable
Bob (with a hopeful look and resolute tone): great, I like it, let’s do this!
to be continued…
(tense music) Is really a good idea? Will Bob and Cody prepare the demo in time for the deadline? Don’t miss the next episode of “Machine Learning to the rescue” to find out!
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -
In this episode we learned:
- the concept of “supervised learning”, where, for every observation, you have both input (“features”) and an output (“label”)
- how a simple regression algorithm works and how its effectiveness is evaluated
- part of the data can be used to fit a model and part to test how the model performs
- the concept of “underfitting”/”overfitting” and the importance of finding a good balance between having a model that simply doesn’t work and one that works very well, but only on the training data, meaning it’s not able to generalize on new data
- the existence of an algorithm called “Gradient Descent” to minimize a cost function… for those wishing to deepen the subject, this is a good article
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
I hope you enjoyed this little experiment to treat these arguments a little differently from the tons of available articles and videos. As I wrote in my last post, a wider audience is my primary goal and, often, even the most “gentle introduction to..” is too detailed for a casual or simply curious reader.
There are some inaccuracies, a lot of simplifications and I resisted the urge to insert graphs or formulas but I guess it’s more important to keep it simple…you can always take a deeper look reading some “gentle introduction to…” articles :)
If you liked it, please feel free to share! Alice,Bob and Cody will appreciate :) And of course comments are welcomed too!
Last thing, Alice, Bob and Cody images are not pictures of real persons but are generated using a GAN (generative adversarial network), a kind of neural network (two actually) used to invent new faces starting from real pictures.
Nvidia created this site (https://thispersondoesnotexist.com/) as PoC and, even if extremely minimal, it’s quite impressive, so check it out (just refresh the page to see other pictures).
That’s all….Hope to see you for the next episodes of Machine Learning to the rescue!