This matches a similar optimisation done for math.floor in
python#21072
Before:
```
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)'
20000000 loops, best of 11: 13.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)'
20000000 loops, best of 11: 13.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)'
10000000 loops, best of 11: 35.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)'
10000000 loops, best of 11: 21.8 nsec per loop
```
After:
```
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)'
20000000 loops, best of 11: 11.8 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)'
20000000 loops, best of 11: 11.7 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)'
10000000 loops, best of 11: 32.7 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)'
10000000 loops, best of 11: 20.1 nsec per loop
```