I'd like to see the default behavior be to raise an exception if eval fails on any annotation.
I think it's reasonable to provide a way to find out which specific keys have problems, but I don't think that should be the default. Wouldn't it be good enough to have a flag which says "just return me a dict which only has keys for items which don't contain errors"? Let's call it silence_errors for discussion sake. You could then figure out which ones contain errors by:
getattr(obj, "__annotations__", {}).keys() - get_annotations(obj, silence_errors=True).keys()
That is, silence_errors works on a per-key basis, not the call to get_annotations() as a whole.