technically is not a frame by frame comparison but a comparison between movements of the body parts captured during the exercise but your question is absolutely correct: this is one of the points to resolve because now it works probably only for me and with the camera in that position.
My best guess is to perform some feature engineering with the collected data, for example, thinking to the correctness of the exercise in terms of the magnitude of difference between a starting and an ending point for the various body parts instead of just the coordinates.
Not sure about it but the only way should be to try and see if it works
I thought too to feed a NN, building a classifier and train it accordingly.
The drawback is the necessity to have a lot of data and a classifier for every kind of exercise, while a spatial approach with thresholds could be a one for all solution.
Ideas are welcome :)