I don't know of a way to detect that except for creating a separate label for each spam rule where you add the same rule to the label to see what label those Reddit replies that get marked as spam get tagged with.
Instead of creating labels for all spam rules, you could start with ones you suspect (one's that use regex for example).