Clear prompt structure (do you have this?) - Reliable use cases where you consistently get good results (do you have these?) - Difficulty adapting when encountering new task types (is this still true?)