Let’s start rough: ML is probably the most difficult subject I ever approached (ok, I’m not a nuclear physicist or a neurosurgeon, but I have an IT technical and academic background, so my threshold about difficulty is, at least, not trivial).
It’s vast in themes, deep in concepts and creative in adoption, is not just a skill you can acquire in a straight, defined way.
There is a funny definition about being a Data Scientist by Professor Shlomo Aragmon saying
“Data Scientist = statistician + programmer + coach + storyteller + artist”
and I think it really captures the essence of what is needed to learn in this field.
About me, when I started more than a year ago to seriously approach ML, my goal was (and still is), in a scale from total ignorance to Prof Ng (deity) level, to be, at least, “competent”.
I can tell there are several reasons for me to accomplish this (usefulness in my job, keep up with tech and so on) but the main reason is that ML is incredibly fun!
Extract valuable info from raw data and use it is like resolve a mystery, where you have some starting clues (the data) and you have to connect the dots, follow tracks until you find a suitable solution and you prove it works in measurable way. It’s science, with a touch of magic.
So I’m writing this article (my first one, btw!) to share my experience.
Don’t get me wrong, it’s not a guide like “do this and you become that in six months” because, beside the fact I’m still far from the level of competence needed to teach about it, it simply doesn’t work this way.
I just hope can be a little useful for everyone doing the same journey or want to start it.
After all, learning from experience is the core of all of this :)
Having a strong technological background, I started really bottom up, meaning from code, because it seemed to me, at that time, the right approach.
There are a lot of high level libraries, with few lines of code (literally!) you can crunch data and see results, particularly satisfactory when you do this in your free time.
So I saw tutorials, started to read tons of books and articles, trying to grasp the concepts behind supervised and unsupervised learning, the different models, the metrics behind them, the way to visualize data and…everything in between.
After the initial enthusiasm and deep dive in “sponge mode”, I started to realize the vastness of the goal, as dozens of concepts suddenly comes out behind that few lines of code.
Everytime I learned something, I encountered something new.
Mean, median? Oook… correlation, covariance? Got it…. precision, recall? Yep… skewness, kurtosis? Whaaaat?!? And the list went on and on…
But, as Churchill said “If you’re going through hell, keep going”, I kept going, continued to study, often the same concepts but from different sources, hoping to get all the nuances and eventually I felt I was improving.
After several weeks, I considered myself “ready” and started to get my hands dirty (there are plenty of place where to do it, such as Kaggle) and…….
Reality bit me hard
It was like trying to drive a car for the first time in a heavy traffic situation only with basic knowledge such as “to brake push that pedal” or “to turn use the steering wheel”.
My notebooks seemed “competent” : “Yeah, let’s try an ensemble method to have better results”, “Ok, time to do some PCA” but something important was missing and the overall approach was clumsy.
So I took a break and tried to assess my approach and found that:
- A workflow to follow is essential to approach whatever ML problem you’re facing. It can be very basic, it can be improved over time and adapted, but you have to follow one
- Studying a lot is not proportional to the knowledge you acquire, is just a distorted feeling that let you think you are gaining it
- Writing a couple of lines of code to fit and train a logistic regression model is not equivalent to know about how logistic regression works
- The knowledge about retrieving and “wrangling” data is the most important asset, not the number of ML algorithms you know (or, worst, you just how to code).
I was missing point 1, wasting a lot of time in point 2 (often overdoing it), definitely not competent in point 3 and very weak (at least with Python) in point 4.
Not good! But if see the problem is the first step to solve it, I was able to move in the right direction.
So I parked the car, bought a map and searched for a drive school…
Back to school
There is a huge difference between read some books or follow tutorials and a structured course, where the contents are professionally crafted and there is the possibility to test what learned with quiz, practical exercises and assignments.
I’m not saying the former are not useful but the latter are more efficient, especially for beginners and can justify a reasonable cost for it.
So I did some research and found two complementary courses, one with a more theoretical approach (from Professor Ng himself) and another explicity for Python, not only on ML, but about everything you have to know to deal with data in any form and from any source (among many other topics).
I finished the first and I’m nearly to complete the second and I’m satisfied.
I gained a lot of knowledge about different topics (Python pipelines are awesome!), I learned what mathematically happens under the hood and acquired a general confidence, so I can say it really deserved the countless hours spent.
Let’s say I can drive now, knowing more about how a car works internally and how to drive, very slow but safer.
It’s time to go full ahead on practice (the most important action to gain experience) and see what will happen..
I’ll let you know how it goes :)
In the meantime, feel free to contact me and share you experience, I’ll appreciate it