Fixed Py_DECREF and Py_CLEAR as well.
Added tests for Py_INCREF and Py_XINCREF (if somebody has a better idea how to tests that INCREF doesn't leak - please, let me know).
Removed comment that Py_DECREF evaluate it's argument multiple times as not relevant anymore.
About considerations from performance point of view - I've made toy example (only this defines and main function) to test how gcc optimizer behaves in different cases - from what I see, if expression is like this (which is majority of cases in the code):
PyObject* obj = Foo();
Py_XDECREF(obj)
assembly code that will be produced (with -O3) is the same before and after patch.