The problem with images and other sensor data is just that the volume of data is so much greater than the amount of information you need to solve a particular problem. i.e. “what are the dimensions of this room” will still give you a whole metric load of point-cloud points. Feature extraction is about finding the right ways to “reduce” the information provided by the sensors and extract the useful bits.